All posts by Robby

Tooling for expressive TTS

Over the last several years I’ve been working on a number of ways to make computer generated speech more expressive.
Not just a bit, but radically more so.

A Christmas Carol - audiobook excerpt

Several years ago before the Cambrian explosion of neural speech models – I spent a lot of time working on parametric synthesis. The thought being that one could animate speech in the same way that Pixar animators manipulate wireframe models to create the illusion of life in CG characters. This work resulted in a couple of patents which have since been outpaced by the rapid development in machine learning and neural speech synthesis.

Once spline control of parameters became obsolete, I switch my focus to developing better tools for managing large scripts and multiple takes of a large cast of characters. The most recent incarnation of this is tied to the open source TTS package “Chatterbox Turbo” from Resemble.ai.

As a proof of concept, I wanted to use these tools to create a long form excerpt from “A Christmas Carol”. I designed a pleasing narrator voice, and an appropriately grumpy one for Scrooge.

ChatterWolf

Recently I’ve gotten fascinated with a game called Werewolf – which is based a similar social deduction game “Mafia”. Players have limited information and have to talk amongst themselves to try and figure out who did what to whom, and who knew what when, before time runs out.

This is all an outgrowth of my decades-long obsession with speech driven entertainment. Modern players are increasingly drawn to games that talk back. With the advent of real-time text-to-speech rendering and emotionally adaptive voices, a new class of voice-forward games can deliver immersion and intimacy unmatched by traditional click-through dialog trees. This framework delivers up-to-the-minute world-aware, character-centric dialog, allowing each NPC to feel alive, distinct, and self-aware.

Social Deduction Games are interesting. A number of studies have been released that look at Werewolf and games like it as a measure of how effectively LLMs can keep track of imperfect information and reason out solutions amongst themselves. Games like Werewolf, Mafia, and Spyfall depend on the core tension between what players know and what they pretend they know. Modeling this requires support for misinformation, asymmetric perception, and role-aware deception. This enables each character to operate within their own worldview – capable of lying convincingly or expressing genuine uncertainty – all while maintaining logical consistency with prior actions. They’re not perfect, but the debate can take surprising turns on occasion.

“The Harbor House Affair” – a 1930’s Who done it?

The beauty of the simple Werewolf mechanic is you can map any number of settings and mappings to the framework.


1. Arctic Research Outpost (Cold War, 1962)
Setting: An isolated American research station on the Greenland ice sheet. Long nights. Radio static. Everyone’s trapped.
 
Dr. Elaine Mercer – Station director. Calm, controlled, visibly exhausted. Career scientist whose last posting ended abruptly after an “equipment failure.”
Walter Briggs – Mechanical engineer. Gruff, practical, distrustful of theory. Knows the station’s systems better than anyone.
Yuri Volkov – Soviet “exchange” scientist. Polite, reserved, observant. His credentials check out… mostly.
Anne Carlisle – Young meteorologist. Nervous, defensive, eager to please. Keeps meticulous logs, sometimes too meticulous.
Father Thomas Keene – Chaplain and morale officer. Warm, articulate, oddly perceptive. Claims to be here for “spiritual support,” but asks very pointed questions.
 
Killer reskin: Saboteur / infiltrator
Seer reskin: Analyst / profiler / confessor
Swapper reskin: Technician rerouting access, logs, or credentials
 

2. Luxury Transatlantic Liner (1935)
Setting: Mid-Atlantic, three days from New York. First-class glamour above, locked doors and rumors below.
 
Margaret Ashcroft – Wealthy widow traveling with too much luggage and no clear destination. Socially adept, emotionally opaque.
Henry Bell – Ship’s purser. Impeccably polite, knows everyone’s business. Keeps records others assume are private.
Lucien Moreau – European art dealer. Smooth, charming, evasive about his past clients.
Clara Finch – Young governess escorting a child who never seems to appear. Easily flustered, deeply observant.
Samuel Reed – Retired naval officer. Straightforward, rigid, uncomfortable with ambiguity. Watches routines closely.
 
Killer reskin: Assassin / blackmailer
Seer reskin: Information broker / reader of people
Swapper reskin: Thief switching cabins, papers, or identities
 

3. Desert Cult Compound (1978)
Setting: A remote spiritual commune in the California desert. Solar panels, white robes, whispered doubts.
 
Evelyn Cross – Charismatic leader. Soft-spoken, reassuring, never raises her voice. Knows everyone’s weaknesses.
Jonah Pike – Head of security. Protective, suspicious of outsiders, deeply loyal to Evelyn.
Maribel Santos – Recent arrival. Idealistic, conflicted, struggling with doubts she tries to hide.
Caleb Wright – Accountant and logistics manager. Quiet, precise, uncomfortable with improvisation.
Ruth Holloway – Elder member. Maternal, observant, remembers “how things used to be” before the group grew.
 
Killer reskin: Enforcer / true believer
Seer reskin: Confessor / spiritual guide
Swapper reskin: Manipulator of assignments, rituals, or blame
 

4. Deep-Space Salvage Ship (Late 22nd Century)
Setting: A long-range salvage vessel towing a derelict alien craft. Corporate contracts, minimal oversight.
 
Captain Rhea Solano – Pragmatic, mission-focused, under pressure from corporate HQ. Hides how much she knows.
Ishaan Patel – Xenotech specialist. Brilliant, distracted, emotionally detached. Obsessed with the alien hull.
Mara Vance – Security officer. Cynical, alert, assumes the worst of everyone—including herself.
Leo Kincaid – Systems operator. Friendly, talkative, eager to help. Knows the ship’s internals intimately.
Dr. Yelena Orlov – Medical officer. Calm, incisive, unsettlingly perceptive. Notices changes others miss.
 
Killer reskin: Infected crew / alien agent
Seer reskin: Diagnostic expert
Swapper reskin: Systems tech rerouting permissions or sensor data
 

5. Small-Town Political Campaign (Modern Day)
Setting: Final week of a heated mayoral race in a dying Midwestern town.
 
Diane Keller – Candidate. Polished, relentlessly composed. Keeps her real thoughts well hidden.
Mark Feldman – Campaign manager. Tense, sleep-deprived, constantly calculating risk.
Rachel Nguyen – Volunteer coordinator. Earnest, observant, hears things people assume don’t matter.
Tom Wilkes – Local radio host. Folksy, probing, knows the town’s secrets.
Eddie Morales – Data analyst. Quiet, precise, socially awkward. Knows far more than he lets on.
 
Killer reskin: Saboteur / opposition plant
Seer reskin: Journalist / data analyst
Swapper reskin: Operative manipulating information flow

Discography

For a time, I made a living playing keyboard and programming synthesizers on a lots of the records made in NY, LA, and London. I’d forgotten many of them, but thanks to the inter-tubes they’ve been hoovered up into a nice discography. Here’s a listing of some you’ll remember – others, not so much.

Exploring the Fifths Patch

I love alternate tunings. As a kid, I learned to play folk guitar, and I’d experimented a little with a couple of the more common “open tunings” used for bottleneck blues and such, I didn’t really get into deeply until I started listening to Joni Mitchell. She kind of blew my mind.

Much later as I got more serious about keyboards, and started learning to program synthesizers, I thought: “What if you could do the same thing on keyboard?” Turns out that open-tunings on a keyboard is a thing. And, it’s easy if you have more than one oscillator.

It’s a simple, but deceptively powerful harmonic trick: the “fifths patch”. These are patches where two oscillators are tuned in fifths – or a fourth below which is effectively the same thing. This creates an automatic shadow or parallel harmony that’s surprisingly interesting. For musicians, especially those dabbling in synthesis and MIDI, understanding this can open up a world of harmonic possibilities.

There have been numerous recordings written with fifths patches, which exhibit interesting harmonies and showcase the technique’s compositional power. Two striking examples of fifths patches in the wild are Bruce Woolley’s beautiful opening chords on Grace Jones’s “Slave to the Rhythm” and the complex harmonies in Weather Report’s “River People” from the album “Mr. Gone”. I’ve used them myself on a number of records – most notably the gated chords that provide the rhythmic hook on Shannon’s “Let The Music Play“.

These fifths patches, where oscillators are tuned in perfect intervals, not only create magical harmonies, but also offer a playground for musical exploration.

Play a single note on a keyboard set with a fifths patch, and you get a parallel fifth. Nothing clashes, but there’s nothing particularly interesting yet. But, as I discovered in my years as a session musician, the magic happens when you start exploring other intervals.

For example, if I play C and E, I hear C, E, plus their “synthetic harmony” at G, and B – a beautiful major seventh chord. This logic applies to all two-note intervals except the tritone and the minor second.

Triad Clusters

But why stop there?. Moving to three-note chords, you’ll find yourself crafting lovely six-note clusters. As long as these chords avoid the tricky tritone and minor second intervals, the harmonies you’ll get are nothing short of beautiful.Over time, I’ve learned to appreciate how these patches add inner harmonies and upper extensions to simple voicings.

Experimentation is key

Take, for instance, a simple G triad played over C. This major seventh sonority undergoes a kind of lydian transformation as the B (the major seventh) is shadowed by an F# (the fifth above B), resulting in a C6 add9 +11. The simple rule of thumb is: all intervals work EXCEPT the tritone, and the minor 2nd. It’s these extended harmonies that make fifths patches such a wonderful tool for exploration and composition.

Open voicings and drop 2s work nicely too since they’re built on consonant intervals. It takes a bit of practice to get used to what works and what doesn’t. The main thing is to stick to simple triadic voicings and avoid those tritones. Have fun.

MidiBot, improbable music

MidiBot Plugin image
MidiBot – a MIDI pulse probability generator

Brooklyn – sometime in 2018…
My latest weekend project is a radical upgrade for one of my old iPhone apps. MidiBot is a MidiFX plugin coded using the Juce framework.

What Is It?
MidiBot is a poly-rhythmic pulse generator that probabilistically adds rhythmic and harmonic material as MIDI triggered sequences. It’s basically DrumToy with a ton of extra functionality thrown in.

What Does It Do?
I tend to use MidiBot as a super funky bass player, deep-pocketed percussionist, and ultra-cool practice metronome. But truth be told, I mostly made it ’cause I don’t get to jam with real people much any more. So I coded a workaround.
Continue reading MidiBot, improbable music

Midi Rotator 1.3


Available Now: MidiRotator 1.3
Price: $5.00 USD


playdemo.jpg“Boure”

MidiRotator 1.1 is a plugin available for both Mac and PC.

I’ve recently gotten so many requests from people wanting to buy rotators, it just didn’t seem practical to build them myself. So I spent the last month or two working on a port to a MidiFX plugin. I’ve just released a VST and an AU version of the plugin for use in Logic Pro X and MainStage as well as other DAW environments. I’m quite pleased with the UI and it’s proven to be even more flexible than the hardware version.

Here’s a brief example of what it can do harmonically to some simple triads!

This latest version fixes several annoying bugs that plagued the previous free version. I’ve also added some automation capabilities. Lots more features and updates are in the works, so stay tuned, and thanks for your generous support!

Read more about how the MidiRotator came to be.

NYC Ferry – Google Custom Action

A little while ago, I hacked together a custom action for the Google Assistant that would take the publicly available time tables for the NYC Ferry and answer questions about the routes and schedules.

I’ve never released it publicly. So far, I’ve just kept it for personal use as it’s often faster than pulling out my phone, loading the NYC Ferry app and navigating to the right sub-page for schedules. Voice search can be powerful for the right kinds of tasks. If you can simply ask “When’s the next boat from Dumbo to South Williamsburg?” that can flatten a lot of otherwise tedious menus.

Interactive Audiobook

A closer look at the audio animation engine developed for interactive fiction.
A closer look at the audio animation engine developed for interactive fiction.
Ever wonder what happened to those text adventure games? You know the ones like “Zork” and “A Hitchhiker’s Guide To The Galaxy”. A paragraph of text would come up on screen. You type in what you want to do, and then you get to read what happens next. Imagine a mobile device with speech in, and multiple channels of audio out. A hands free, eyes free, audiobook that you can push around with speech.

What follows is a short video of the actual working prototype I coded for iOS. The “story” is nothing special, it’s only intended to illustrate the capabilities of the platform – namely the delivery of interactive fiction through complex audio and narration with speech input. Everything you hear in the demo is made up of hundreds of individual sounds being mixed on-the-fly based around what the player/reader does within the story.