Over the last several years I’ve been working on a number of ways to make computer generated speech more expressive. Not just a bit, but radically more so.
A Christmas Carol - audiobook excerpt
Several years ago before the Cambrian explosion of neural speech models – I spent a lot of time working on parametric synthesis. The thought being that one could animate speech in the same way that Pixar animators manipulate wireframe models to create the illusion of life in CG characters. This work resulted in a couple of patents which have since been outpaced by the rapid development in machine learning and neural speech synthesis.
Once spline control of parameters became obsolete, I switch my focus to developing better tools for managing large scripts and multiple takes of a large cast of characters. The most recent incarnation of this is tied to the open source TTS package “Chatterbox Turbo” from Resemble.ai.
As a proof of concept, I wanted to use these tools to create a long form excerpt from “A Christmas Carol”. I designed a pleasing narrator voice, and an appropriately grumpy one for Scrooge.
Recently I’ve gotten fascinated with a game called Werewolf – which is based a similar social deduction game “Mafia”. Players have limited information and have to talk amongst themselves to try and figure out who did what to whom, and who knew what when, before time runs out.
This is all an outgrowth of my decades-long obsession with speech driven entertainment. Modern players are increasingly drawn to games that talk back. With the advent of real-time text-to-speech rendering and emotionally adaptive voices, a new class of voice-forward games can deliver immersion and intimacy unmatched by traditional click-through dialog trees. This framework delivers up-to-the-minute world-aware, character-centric dialog, allowing each NPC to feel alive, distinct, and self-aware.
Social Deduction Games are interesting. A number of studies have been released that look at Werewolf and games like it as a measure of how effectively LLMs can keep track of imperfect information and reason out solutions amongst themselves. Games like Werewolf, Mafia, and Spyfall depend on the core tension between what players know and what they pretend they know. Modeling this requires support for misinformation, asymmetric perception, and role-aware deception. This enables each character to operate within their own worldview – capable of lying convincingly or expressing genuine uncertainty – all while maintaining logical consistency with prior actions. They’re not perfect, but the debate can take surprising turns on occasion.
“The Harbor House Affair” – a 1930’s Who done it?
The beauty of the simple Werewolf mechanic is you can map any number of settings and mappings to the framework.
1. Arctic Research Outpost (Cold War, 1962)
Setting: An isolated American research station on the Greenland ice sheet. Long nights. Radio static. Everyone’s trapped.
Dr. Elaine Mercer – Station director. Calm, controlled, visibly exhausted. Career scientist whose last posting ended abruptly after an “equipment failure.”
Walter Briggs – Mechanical engineer. Gruff, practical, distrustful of theory. Knows the station’s systems better than anyone.
Anne Carlisle – Young meteorologist. Nervous, defensive, eager to please. Keeps meticulous logs, sometimes too meticulous.
Father Thomas Keene – Chaplain and morale officer. Warm, articulate, oddly perceptive. Claims to be here for “spiritual support,” but asks very pointed questions.
Killer reskin: Saboteur / infiltrator
Seer reskin: Analyst / profiler / confessor
Swapper reskin: Technician rerouting access, logs, or credentials
2. Luxury Transatlantic Liner (1935)
Setting: Mid-Atlantic, three days from New York. First-class glamour above, locked doors and rumors below.
Margaret Ashcroft – Wealthy widow traveling with too much luggage and no clear destination. Socially adept, emotionally opaque.
Henry Bell – Ship’s purser. Impeccably polite, knows everyone’s business. Keeps records others assume are private.
Lucien Moreau – European art dealer. Smooth, charming, evasive about his past clients.
Clara Finch – Young governess escorting a child who never seems to appear. Easily flustered, deeply observant.
Samuel Reed – Retired naval officer. Straightforward, rigid, uncomfortable with ambiguity. Watches routines closely.
Killer reskin: Assassin / blackmailer
Seer reskin: Information broker / reader of people
Swapper reskin: Thief switching cabins, papers, or identities
3. Desert Cult Compound (1978)
Setting: A remote spiritual commune in the California desert. Solar panels, white robes, whispered doubts.
Evelyn Cross – Charismatic leader. Soft-spoken, reassuring, never raises her voice. Knows everyone’s weaknesses.
Jonah Pike – Head of security. Protective, suspicious of outsiders, deeply loyal to Evelyn.
Maribel Santos – Recent arrival. Idealistic, conflicted, struggling with doubts she tries to hide.
Caleb Wright – Accountant and logistics manager. Quiet, precise, uncomfortable with improvisation.
Ruth Holloway – Elder member. Maternal, observant, remembers “how things used to be” before the group grew.
Killer reskin: Enforcer / true believer
Seer reskin: Confessor / spiritual guide
Swapper reskin: Manipulator of assignments, rituals, or blame
4. Deep-Space Salvage Ship (Late 22nd Century)
Setting: A long-range salvage vessel towing a derelict alien craft. Corporate contracts, minimal oversight.
Captain Rhea Solano – Pragmatic, mission-focused, under pressure from corporate HQ. Hides how much she knows.
Ishaan Patel – Xenotech specialist. Brilliant, distracted, emotionally detached. Obsessed with the alien hull.
Mara Vance – Security officer. Cynical, alert, assumes the worst of everyone—including herself.
Leo Kincaid – Systems operator. Friendly, talkative, eager to help. Knows the ship’s internals intimately.
Dr. Yelena Orlov – Medical officer. Calm, incisive, unsettlingly perceptive. Notices changes others miss.
Killer reskin: Infected crew / alien agent
Seer reskin: Diagnostic expert
Swapper reskin: Systems tech rerouting permissions or sensor data
5. Small-Town Political Campaign (Modern Day)
Setting: Final week of a heated mayoral race in a dying Midwestern town.
Diane Keller – Candidate. Polished, relentlessly composed. Keeps her real thoughts well hidden.
Mark Feldman – Campaign manager. Tense, sleep-deprived, constantly calculating risk.
Rachel Nguyen – Volunteer coordinator. Earnest, observant, hears things people assume don’t matter.
Tom Wilkes – Local radio host. Folksy, probing, knows the town’s secrets.
Eddie Morales – Data analyst. Quiet, precise, socially awkward. Knows far more than he lets on.
Killer reskin: Saboteur / opposition plant
Seer reskin: Journalist / data analyst
Swapper reskin: Operative manipulating information flow
A little while ago, I hacked together a custom action for the Google Assistant that would take the publicly available time tables for the NYC Ferry and answer questions about the routes and schedules.
I’ve never released it publicly. So far, I’ve just kept it for personal use as it’s often faster than pulling out my phone, loading the NYC Ferry app and navigating to the right sub-page for schedules. Voice search can be powerful for the right kinds of tasks. If you can simply ask “When’s the next boat from Dumbo to South Williamsburg?” that can flatten a lot of otherwise tedious menus.
In 2011, I was fortunate enough to lead the design team on this Nuance / Intel partnership to develop the first intelligent assistant for ultrabooks. The result was Dragon Desktop Assistant. Here’s a pretty good demo of some of its functionality.
A closer look at the audio animation engine developed for interactive fiction.Ever wonder what happened to those text adventure games? You know the ones like “Zork” and “A Hitchhiker’s Guide To The Galaxy”. A paragraph of text would come up on screen. You type in what you want to do, and then you get to read what happens next. Imagine a mobile device with speech in, and multiple channels of audio out. A hands free, eyes free, audiobook that you can push around with speech.
What follows is a short video of the actual working prototype I coded for iOS. The “story” is nothing special, it’s only intended to illustrate the capabilities of the platform – namely the delivery of interactive fiction through complex audio and narration with speech input. Everything you hear in the demo is made up of hundreds of individual sounds being mixed on-the-fly based around what the player/reader does within the story.
I led the design team for Ford Sync 2. Shot this in Montreal while testing the design on a prototype. Click the image to see a video of some prototype testing I was doing in Montreal in March, 2010.
Drum Toy uses an admittedly weird drum machine architecture and concept. You create basic loops using Drum Toy’s Every and Offset knobs, and then feather in the secret ingredient: Probability!
In a way, Drum Toy ‘thinks’ the way drummers do. Find the pocket, lay down the main groove and sprinkle lightly with a few tasty fills or variations here and there to keep things fresh. Drum Toy is set up to force certain beats while leaving others to just the right amount of chance. This simple mechanism yields a surprising array of personality.
Update: Feb 2018
I’ve recently created a Logic/MainStage MidiFX Scripter hack that is really useful as it combines my favorite features from DrumToy & MidiBot!
Test harness of the MB1 used for coding and testing the Arduino software.
I’ve created a number of versions of “The Rotator” over the years. This video details one of the first versions that involved creating custom hardware based on the Arduino micro-controller and several parts scavenged from various bits and pieces I had lying around.
Then a couple of years ago while in Japan I made a field trip to “Electric City” in Akihabara where I was able to buy a bunch of electronics parts that allowed me to start work on yet another version.
Rev 1 of the MB1 rotator board I designed. (shoutout to Dennis Alichwer from Neve for all his help and advice.)
I wanted to start again with improved hardware and a better layout. I was really excited to find what appeared to be Prophet-V-style gray buttons and some decent push-button rotary encoders. Within a few weeks I was able to get a working breadboard up and running (top left) where I could test the code and work on the software.
Enter the amazing Dennis Alichwer who totally crushed it on laying out a custom circuit board based on my design for the test harness.
MB1 prototype in a makeshift enclosure
The result was the “MB1” (middle left) – which stands for “Midi Box 1” or possibly “Mike Brecker 1” we’re not sure.
Anyway, I have a couple of boxes in use. One is running the latest rotator software, and the other is running an alpha version of a new toy I’m working on called MidiBot.
Update: Feb 2018
I’ve recently gotten so many requests from people wanting to buy rotators, it just didn’t seem practical to build them myself. So I spent the last month or two working on a port to a MidiFX plugin. I’ve just released a VST and an AU version of the plugin for use in Logic Pro X and MainStage as well as other DAW environments. I’m quite pleased with the UI and it’s proven to be even more flexible than the hardware version in that it can support many more voices and rotations.
Here’s a brief example of what it can do harmonically to some simple triads!
As a kid, I learned to play folk guitar during the early sixties, and I’d experimented with a number of “open tunings”. Though I knew a couple of standard tunings used for bottleneck blues and such, I didn’t take much notice until I started listening to Joni Mitchell.
Here’s an incarnation of a graphic interface for creating and controlling probability-based ambient audio soundscapes. WHAT IS THIS? This is a set of 8 rack units. Each one contains a preset bank with a handful of sounds. You can trigger them as a “single shot”, by pressing the play button. A sound will loop indefinitely when the “Loop” mode button is selected.
RANDOM TRIGGERS
On the left hand side of each unit is a “random trigger generator”. Clicking the button marked RND engages the random trigger for that channel. The frequency knob controls the “rate” at which new random numbers are generated. That is, every time the LED goes on or off, a random number between 0-100 is generated.
If the random number is less than the value of “Amount” then the sound will play. The “Prob” knob adjusts the “amount of probability” or the threshold below which the random number will trigger the sound.
It’s best to test each sound by just playing it once with the play button. Some sounds are quite long. These long sounds are best triggered with either a very slow rate of chance or with a low probability of occurring.
MASTER: NEW, LOAD & SAVE
I’ve added the ability to save configurations for the full rack of eight units. You could tweak an existing patch and click “Save” and it will be updated. Tweak an existing patch but type in a new name is equivalent to “Save As”.