Tooling for expressive TTS

Over the last several years I’ve been working on a number of ways to make computer generated speech more expressive.
Not just a bit, but radically more so.

A Christmas Carol - audiobook excerpt

Several years ago before the Cambrian explosion of neural speech models – I spent a lot of time working on parametric synthesis. The thought being that one could animate speech in the same way that Pixar animators manipulate wireframe models to create the illusion of life in CG characters. This work resulted in a couple of patents which have since been outpaced by the rapid development in machine learning and neural speech synthesis.

Once spline control of parameters became obsolete, I switch my focus to developing better tools for managing large scripts and multiple takes of a large cast of characters. The most recent incarnation of this is tied to the open source TTS package “Chatterbox Turbo” from Resemble.ai.

As a proof of concept, I wanted to use these tools to create a long form excerpt from “A Christmas Carol”. I designed a pleasing narrator voice, and an appropriately grumpy one for Scrooge.

NYC Ferry – Google Custom Action

A little while ago, I hacked together a custom action for the Google Assistant that would take the publicly available time tables for the NYC Ferry and answer questions about the routes and schedules.

I’ve never released it publicly. So far, I’ve just kept it for personal use as it’s often faster than pulling out my phone, loading the NYC Ferry app and navigating to the right sub-page for schedules. Voice search can be powerful for the right kinds of tasks. If you can simply ask “When’s the next boat from Dumbo to South Williamsburg?” that can flatten a lot of otherwise tedious menus.

Drum Toy

Drum Toy uses an admittedly weird drum machine architecture and concept. You create basic loops using Drum Toy’s Every and Offset knobs, and then feather in the secret ingredient: Probability!

In a way, Drum Toy ‘thinks’ the way drummers do. Find the pocket, lay down the main groove and sprinkle lightly with a few tasty fills or variations here and there to keep things fresh. Drum Toy is set up to force certain beats while leaving others to just the right amount of chance. This simple mechanism yields a surprising array of personality.

Update: Feb 2018
I’ve recently created a Logic/MainStage MidiFX Scripter hack that is really useful as it combines my favorite features from DrumToy & MidiBot!

Ambient Sound Toy

Here’s an incarnation of a graphic interface for creating and controlling probability-based ambient audio soundscapes.

WHAT IS THIS?
This is a set of 8 rack units. Each one contains a preset bank with a handful of sounds. You can trigger them as a “single shot”, by pressing the play button. A sound will loop indefinitely when the “Loop” mode button is selected.

RANDOM TRIGGERS
On the left hand side of each unit is a “random trigger generator”. Clicking the button marked RND engages the random trigger for that channel. The frequency knob controls the “rate” at which new random numbers are generated. That is, every time the LED goes on or off, a random number between 0-100 is generated.

If the random number is less than the value of “Amount” then the sound will play. The “Prob” knob adjusts the “amount of probability” or the threshold below which the random number will trigger the sound.

It’s best to test each sound by just playing it once with the play button. Some sounds are quite long. These long sounds are best triggered with either a very slow rate of chance or with a low probability of occurring.

MASTER: NEW, LOAD & SAVE
I’ve added the ability to save configurations for the full rack of eight units. You could tweak an existing patch and click “Save” and it will be updated. Tweak an existing patch but type in a new name is equivalent to “Save As”.