Saturday, September 12, 2020

Speak to Me! - Speech Synthesis with Early Home Video Games

When considering the evolution of video game audio, of the three components of audio, sound effects, music and speech, those components were introduced into video games in that order.  The earliest video games generated simple tones and noise to produce simple sound effects.  Music chips were well developed by the late 1970s, bringing a slightly more sophisticated method of sound generation to video game players.  Speech, which requires the utilization of more complex sounds to be intelligible, tended to be brought to home consoles and computers in the form of specialized speech chips.  In this article we will trace some of the lineages of speech in early video games.  

Any home console or home computer with a sound chip or sound and graphics chip or even a simple beeper could produce intelligible speech.  The disadvantages to using this hardware as a crude DAC were many, there would often be high pitched whines accompanying speech, muffled speech that was hard to understand and the constant attention to send data to the sound registers would take up most of the CPU's time.  Certain software programs like the S.A.M. (Software Automatic Mouth) could allow for a text-to-speech type function.  There is hardware which provided speech synthesis but was not supported by games, and for the sake of brevity and interest this blog entry will not cover them/  This blog entry is focused on non-edutainment uses for speech synthesis hardware for home console and computer games.

Add-on Console Expansions - The Intellivoice & Odyssey²

The add-in modules for the Intellivision and the Odyssey² sought to address these deficiencies by using dedicated expansion devices with their own built-in speech hardware.  In certain ways, these modules are similar.  Both modules plug into the cartridge port and passthrough cartridge signals to the system.  Both devices use a General Instruments SP0256 speech chip, but that is where the similarities end.  

The Invellivision Voice Synthesis Module was introduced in 1982 and used the General Instruments SP0256-12.  Inside the chip is 2KiB of ROM used for voice samples made up of words and phrases, including "Mattel Electronics Presents".  Five games supported the Intellivoice : Bomb Squad, B-17 Bomber, Space Spartans, Tron: Solar Sailer, Intellivision World Series Baseball.  All but Intellivision World Series Baseball require the Intellivoice to work.  Intellivision World Series Baseball requires the Entertainment Computer System to work and can optionally work with the Intellivoice.  The volume wheel on the Intellivoice unit controls the volume of the speech, which is sent to the main console unit and mixed in with the console's 3-voice sound and music chip, the AY-3-8914.  

The Intellivoice also contains a buffer and interface chip, General Instruments SPB-640.  This chip handles communication between the cartridge, system and the speech chip.  In practice it was responsible for permitting data to be written to the speech chip to tell it which sample to play, but it could also send new sample data from a game cartridge to the speech chip, allowing the speech chip to produce new sounds built into a cartridge ROM chip.  This data would be converted from parallel to serial by the SPB-640 and fed into the SP0256.  The heart of the SP0256 is the Vocal Tract Model, which is influenced by Linear Predictive Coding-derived coefficients, pitch, repeat and amplitude registers to form distinct sounds.  The output is sent as pulse width modulation, which is filtered outside of the chip.  

The Voice for the Odyssey², also introduced in 1982, uses the SP0256-19 chip, but Magnavox's approach differed from Mattel's in many ways.  First, while the Voice unit fits over the Odyssey²'s cartridge slot, it's audio does not mix with audio from the main unit.  Its volume slider controls a speaker inside The Voice unit.  There is a cutout for the power switch from the Odyssey², which The Voice unit would otherwise cover up. Second, no Odyssey² game requires The Voice expansion to work.  The official list of games which work with The Voice are Sid the Spellbinder, Nimble Numbers Ned, Type & Tell, Smithereens, K.C.'s Krazy Chase, P.T. Barnum's Acrobats, Attack of the Timelord, Turtles and Killer Bees.  As the names of these titles suggest, The Voice was primarily geared to more educational titles.  

Inside the The Voice, the SP0256-19 was accompanied by a 16KiB ROM chip, the SPR128-003.  The SP0256-19's internal 2KiB ROM was used as pointers to samples in the external 16KiB ROM.  This 16KiB ROM contained 64 allophones also found in the SP0256-AL2 as well as customized words and phrases from Magnavox.  It even contained sound effects and musical tones.  All The Voice games used the samples in this chip and did not provide their own with the exception of Sid the Spellbinder.  Sid the Spellbinder contained another 16KiB speech ROM, the SPR128-004.  

Speech Add-ons Expansion Cards : The IBM PCjr. Speech Attachment and IBM PS/2 Speech Adapter

Texas Instruments produced a line of speech synthesizer chips beginning with the TMS5100, used in the Speak & Spells.  Later releases in this line were used on several pinball and arcade games.  The TMS5220 and TMS5220C were used in many arcade games made by Atari, including its Star Wars trilogy of early to mid 80s games, Gauntlet, Paperboy and others.  Unlike the General Instruments chips, the Texas Instrument TMS5220 relies on external ROMs to supply voice data.  The IBM PCjr. Speech Attachment from 1984 includes the TMS5220C, as does the IBM PS/2 Speech Adapter released in 1987.  

Like other speech chips mentioned in this article, the TMS series is based on Linear Predictive Coding (LPC).  LPC is a method of audio signal processing that is ideal for forming intelligible speech sounds with a minimum amount of non-volatile memory and fairly low sample rates.  Speech is constructed by vibrations (buzz) from the glottis resonating through the throat cavity (formants) and being affected by movements of the mouth, tongue, teeth and throat muscles (residue).  The result becomes those particular frequencies and pitches which we recognize as the building blocks of words, phonemes.    LPC.  LPC uses digital filtering to make up the formants that shape the buzz and the residue, thereby significantly decreasing the amount of memory needed to simulate the voice tract.  

The IBM PCjr. Speech Attachment was the first embodiment of a speech solution for the PC platform with any kind of acceptance.  The Speech Attachment had a vocabulary of 196 words built-in.  The Attachment came on the unique sidecars which the PCjr. used, but the sidecar was essentially a version of the standard PC expansion bus with a few added and missing signals.  IBM included a 32KiB ROM chip on the Attachment which added BIOS routines for the Speech Attachment and the vocabulary words.  The TMS5220 can address 256KiB of voice data.  The Speech Attachment output to the PCjr's main audio outputs, but cannot be combined with the 4-voice TI SN76496 sound chip in the PCjr.

A more interesting feature of the Speech Attachment is support for a second kind of synthesis which IBM called "Continuously Variable Slope Delta Modulation" (CVSD), based on the Motorola MC3418 chip.  This feature is used for the playback and record features of the Speech Attachment.  Delta Modulation, as its name suggests, records the change in samples over time instead of recording the amplitude of sound over time, and with simpler sounds can result in good economies.  However, to accurately reproduce more complex sound, the delta has to be sampled more and more frequently.  The Speech Attachment has a microphone connector on it for this very purpose.  

Both LPC and CVSD can be used by programs for output of new speech, which on the PCjr. could come on cartridges, floppy disks or cassette.  An 8255 PPI chip controlled access to the LPC and CVSD sections.  There are third-party programs which essentially use the CVSD function similar to a Sound Blaster and allow you to record microphone input to disk or playback text to speech.  

The first two games released for the Speech Attachment are Bouncy Bee Learns Letters and Bouncy Bee Learns Words.  These games added speech from the floppy disk and had the advantage of using the same person whose voice was used for the vocabulary on the Speech Attachment also voice the additional words used by these programs.  These educational titles only work correctly on a stock PCjr. without a memory manager loaded.  If a memory manager is loaded or the CPU is upgraded, the words will playback too quickly.  

The IBM PCjr. was no great success, but IBM revisited the TMS5220 a few years later in 1987 with the IBM PS/2 Speech Attachment.  Although the PS/2 was known for its Microchannel architecture, the PS/2 Speech Attachment was designed to work with the standard XT/AT ISA expansion bus.  The PS/2 Speech Attachment used the same hardware the PCjr. Speech Attachment used.  The card came with a breakout box which connects via custom mini-DIN connector.  The breakout box has two headphone output jacks, a microphone input and an external speaker.  So if you do not have that box, you will hear nothing from the sound card without a custom solution.  The PS/2 Speech Attachment was compatible with software previously released for the PCjr., although as I mentioned above speed issues may have prevented true compatibility.  Third-party sound cards could be compatible with the IBM Speech Attachments by a TSR which simulated the BIOS routines which most software used to communicate with the Speech Attachment.  A list of games which support an IBM Speech Attachment can be found here.

Expansion Attachment : The Texas Instruments TI/99 4A Speech Synthesizer

The Texas Instruments TI/99 4A was a home computer which mainly used cartridges for its software programs due to the lack of general purpose main memory.  The cartridge port doubled as the expansion bus and many expansion modules could be plugged-in the side of the machine in a daisy-chain fashion.  One of these expansions is the Speech Synthesizer module.  It contains a TMS5220 chip and 32KiB of external ROM.  It sends its audio data into the main computer unit to be mixed in with the 4-voice TMS9919 sound chip (an early version of the TI SN 76496).  Games which supported the Speech Synthesizer are Alpiner, Buck Rogers: Planet of Zoom. M*A*S*H, Microsurgeon, Moon Mine, Parsec, Pulsar, The Secret Agent, Star Trek: Strategic Operations Simulator.

Speech Add-on Expansion Cards : The Mockingboard Speech Cards for the Apple II

The Apple II was blessed with seven more-or-less general purpose expansion slots, and several speech solutions were developed for them.  The only one which gained some gaming traction was the Mockingboard sound cards.  The Mockingboard Speech I and Sound/Speech I cards from 1982 use a Vortrax SC -01-A speech chip and the Mockingboard A and C had sockets for the Vortrax SC-02/SSI-263P, a.k.a. "Mockingboard B".  Vortrax SC-01 speech chips could be found in Wizard of Wor, Gorf, Reactor and Q*Bert arcade machines.  The SC-01 and SC-01-A are fully compatible replacements, but the SC-01-A boasts better sound quality due to some tweaked phonemes.  

Several games support the speech aspect of the Mockingboard : A2Bejwld, Berzap!, Bouncing Kamungas, Crimewave, Crypt of Medea, Rescue Raiders, Spy Strikes Back, Thunder Bombs, Willy Byte, Zoo Master.  The SC-02 uses a wider package than an SC-01 chip, so the chips are not drop-in replacements for each other.  Access to a speech chip and an AY-3-8910 or AY-3-8913 sound chips on a Mockingboard is handled by a MOS 6522 VIA chip.

The SC-01 contains 64 different phonemes which are accessed by writing 6-bit values to the chip and four pitches or inflections.  You cannot add additional sounds to the SC-01.  Compared to other solutions it is fairly basic in terms of its ability to reproduce sound.  The SC-02 allows for much greater control over the speech elements.  Registers in the SC-02 allow the programmer to control the rate, duration, articulation, amplitude and filter frequency in addition to phoneme selection and inflection.  It is unknown if any Mockingboard games supported the advanced features of the SC-02.

Cartridge Adapter : The Commodore 64 Magic Voice Module

The Magic Voice plugged into the C64's cartridge slot and had a passthrough slot for another cartridge.  BASIC and any software loaded via disk or tape as well as cartridges could access the speech synthesis.  Inside the Magic Voice unit was a 16KiB EPROM containing the data for 235 prerecorded words, a gate array to turn parallel data into serial data, a MOS 6525A Tri-Parallel Interface I/O chip and the Toshiba T6721A speech chip.  The Toshiba T6721A accepts voice data by an external ROM, and can address up to 1Mbyte of ROM samples. Also like the TMS5220, it has a sample rate of 8KHz, giving an effective 4KHz sound sample rate.  

The Magic Voice Module has a pair of RCA jacks, which would be used with cable to mix in the sound from the 6581 or 8580 SID chip inside the C64 with the speech sounds produced by the module.  Software that supports the Voice Module includes : Wizard of Wor, Gorf, Counting Bee, A Bee C's, The Spelling Bee, The Magic Garden Talking Book Series and Magic Desk 1+.  Comparing Wizard of Wor and Gorf arcade speech and C64 speech, the C64 speech is less affected but more intelligible.

Cartridge Plug-in Module : The Tandy Color Computer Speech/Sound Cartridge

All Tandy Color Computers came with a cartridge port which could be used as a more general expansion bus.  The Speech/Sound Cartridge plugged into the cartridge slot, but unlike other cartridge-slot devices mentioned here, it had no passthrough functionality.  BASIC and cassette and disk software was required to use it.  The Sound was the 3-voice AY-3-8913 sound chip, the Speech was provided by the SP0256-AL2 and PIC 7040 microcontroller is used to provide communication ports, interrupts and 2KiB RAM.  The microcontroller has 4KiB of embedded ROM.  The audio from the Speech/Sound Cartridge was passed through the cartridge port and mixed in with the 6-bit DAC provided by the Color Computer.

The microcontroller was clocked by the clock of the main system and was designed for the .89MHz frequency of the Color Computer and Color Computer 2.  The Color Computer 3 doubles this frequency to 1.78MHz, so a modification must be done to get the Speech/Sound Cartridge working properly in the Model 3.  If the modification is not done, then the Microcontroller will be subject to a 200% overclock and not work properly.  A list of games which support the Speech/Sound Cartridge can be found here.

Expansion Attachment : The Amstrad CPC Speech Synthesizers

Most of the Amstrad CPC computers had an card-edge expansion connector.  Amstrad made the SSA-1 Speech Synthesizer Attachment which plugged into this expansion connector.  This Speech Synthesizer had an SP0256-AL2 speech chip.  This chip's internal ROM comprised of 59 allophones used to build speech.  No external ROM usage is known.  A jack would take the sound output from the internal AY-3-8912 sound chip, mix it in with the SP0256-AL2's speech and then output that to speakers.  The Amstrad device has a volume wheel to control the volume of the speech chip.  Supported games include : 3D Boxing, 3D Stunt Rider, Alex Higgins World Pool, Alex Higgins World Snooker, Darkwurlde, Glen Hoddle Soccer, Roland in Space, Tubaruba.  

The Dk'tronics Speech Synthesizer is a similar device which uses the SP0256-AL2 but does not appear to be directly compatible with Amstrad's device.  It supports the following games : Alex Higgens World Pool, Jump Jet, Roland in Space.

Expansion Attachment : The ZX Spectrum Currah Microspeech

Released by Currah Computer Components. the Currah Microspeech was another user of the SP0256-AL2.  It also has a 2KiB external ROM and a ULA, but this ROM does not appear to be used for speech samples but fort the other functions of the Microspeech like setting up a 256-byte FIFO in RAM.  This plugged into the ZX Spectrum's expansion port but it did not provide a passthrough connector for other peripherals like joystick adapters.  This could be overcome with a multi-slot adapter, which Currah just so happened to sell as well.  Sound had a separate output jack or the input from the internal ZX Spectrum's RF output could be passed into the Microspeech and mixed in that way.  There is a small screw-like adjust which can adjust the audio.  

A fair number of games support this device, a list of which can be found here.  Dk'tronics bought out Currah and continued to produce the Microspeech.  Some non-speech games are incompatible with the Microspeech due to the way it uses memory.  

Serial-based Peripheral : The Votrax Type 'N Talk and the Commodore VIC-20

The Votrax Type 'N Talk was an external speech box containing a SC-01 chip.  It was compatible with any RS-232 interface, so it was not system specific.  The Scott Adams Adventures for the VIC-20 could communicate with it to produce speech.  These were text-based adventure games which came on VOC-20 cartridges, five were released : Adventure Land, Pirate's Cove, Mission Impossible Adventure, Voodoo Castle and The Count.  If the Votrax Type 'N Talk was connected to the VIC-20's serial port, then the unit would speak all the words one of these games output to the screen.

In order to connect the VIC-20 to a Votrax Type 'N Talk, you will need an RS-232 adapter such as the VIC-1011A which plugs into the serial port and make a custom null-modem like cable to interface with the Votrax unit.  Instructions are here.

Speech Chips in Cartridges : Famicom Third Parties

When Nintendo was first exploring permitting third parties to develop games for its first programmable home console, the Family Computer, it offered more generous terms to these first licensees than later licensees.  Among the things that these publishers could do that later publishers could not was the right to manufacture their own cartridges outside of Nintendo's factories and control.  The Companies that availed themselves of this privilege were Konami, Namco, Jaleco, Bandai, Sunsoft, Irem and Taito.  The ability to make their own cartridges inspired them to come up with their own unique ways to address more memory and add new sound hardware to their games.  

Jaleco was in the process of making a baseball game called Moero Pro Yakyuu!! and wanted to voice samples in the game for the umpire calls and sound effects and crowd yells and other iconic baseball sounds.  They used a chip in their cartridge called the NEC µPD7756C,  This chip included 32KiB of built-in ROM which could hold sixteen samples of ADPCM-compressed speech.  When the game wanted to play a sample, it would access this chip and the chip would play the sample back and not interrupt the game play.  

Jaleco used this speech chip in six other games, Moe Pro! '90: Kandou-hen, Moe Pro!: Saikyou-hen, Moero!! Pro Tennis, Moero!! Pro Yakyuu '88: Kettei Ban, Shin Moero!! Pro Yakyuu and a cut-down 12KiB version called the NEC µPD7755C in one game, Terao no Dosukoi Oozumou.  When Jaleco decided to port its Moero Pro Yakyuu!! as Bases Loaded for the NES, it had a problem.  Nintendo controlled the manufacturing of all cartridges outside of Japan and would charge Jaleco a lot of money to implement its speech chip design into cartridges Nintendo would build.  So Jaleco decided to simplify their games and cartridge production by eliminating the Speech Chip and using larger PRG-ROMs in their overseas versions for digital samples.  The digital samples would play back through the NES's DPCM audio channel, which could play back 7-bit PCM audio.  

One drawback of this method is that fewer digitized samples could be used because the NES's CPU takes a lot of processing time to feed data into the PCM audio channel.  So while you can hear a realistic sounding crack when a baseball hits a baseball bat in Moero Pro Yakyuu!!, in Bases Loaded you hear a typical NES sound effect.  Whenever umpires call strike, ball, safe or out in Bases Loaded, all animation stops until the sample has finished playing.  Because the burden of generating speech is lifted from the CPU in Moero Pro Yakyuu!!, animation can continue while speech samples play.  You could do worse than look at my comparison video.

Bandai also used a speech chip called Mitsubishi M50805 for its Family Trainer 3 - Aerobics Studio cartridge in Japan and a larger ROM for its overseas counterpart, Dance Aerobics.  This speech chip can barely output eight voice samples of intelligible speech.  The speech data is contained in only 960 bytes of ROM on the chip.

Internal Upgrade : The Acorn Computer Speech Upgrade Kit for the BBC Micro

The BBC Micro was not unlike the Apple II for the United Kingdom.  It was mainly encountered in schools and it encouraged more tinkering with hardware than most other home computers of the day.  Acorn Computer, which made the BBC Micro line of computers, sold a speech upgrade kit containing a TMS 5220 chip, a TMS 6100 16KiB phrase ROM and some passive components.  The TMS6100 is a custom serial Mask ROM chip which came with phrases spoken by BBC News anchor Kenneth Kendall.  Games could access the TMS 5220 directly with data from another source to produce sound.  The chips fit into a pair of sockets on the computer's main PCB.  Some revisions of the main PCB or the keyboard needed to have traces cut and wires soldered to get the kit working. The unofficial homebrew port of AstroBlaster to the BBC Micro supports the Speech Upgrade.  Also, there exist modern kits which let you program your own samples into a TMS 6100 workalike if you do not care for the "dulcet tones" of Ken Kendall.  

Better Late than Never : The AtariVox+ Joystick-Port Based Speech Adapter

While no speech chip was ever used with the Atari 2600 during its long market life (1977-1992), the homebrew community found a way to deal with its weak abilities to handle intelligible speech, The AtariVox is an external adapter which plugs into a controller port on the Atari 2600 or 7800.  The device can also be used with the Vectrex.  While the Atari joystick ports are generally considered input devices, they can be used as 5-bit TTL outputs by programming the 6532 RIOT chip.  It provides two features, namely speech and music synthesis by the SpeakJet chip and 32KiB of non-volatile EEPROM memory for homebrew game saving.  The sound is output to a mini-jack on the back of the unit.  

The Atari 2600 can produce speech without the assistance of a speech chip or specialized hardware.  Normally, feeding the TIA with graphics data is the CPU's primary function during the active screen display.  However, generating any kind of speech take up all the CPU's time, so in games like Quadrun the speech is played back to a black screen.  With the AtariVox+, speech can be played back without interrupting the game.  Watch my video here for some examples of AtariVox+ speech during gameplay.

The SpeakJet is a self contained chip containing  72 allophones, 43 sound effects, and 12 DTMF Touch Tones.  It can also function as a 5-channel synthesizer for music and has a 64 byte input buffer.  The SpeakJet can be purchased as a generic chip, in fact it is a 18F1320 PIC chip with an 8KiB embedded ROM telling the PIC to function as a SpeakJet.  It contains 256 bytes of user-programmable EEPROM which can be used for custom words and phrases.  The AtariVox has some custom phrases pre-programmed into it.  Even though the SpeakJet was released two decades after the other chips featured here, it does not sound like natural human speech.  At its core it is still based on LPC with the quality drawbacks which come with that speech encoding method.  A list of games which support the device can be found here.

Final Words

The Famicom was the last hurrah for the dedicated speech chip.  Later consoles had sufficient processing power and larger memories to handle speech without the need for dedicated hardware.  8-bit handheld consoles did not use speech chips due to the extra power draw.  16/32-bit home computers like the Commodore Amiga had DSP chips like Paula which could handle multiple digitized audio streams in hardware or were powerful enough to feed DACs with PCM-encoded speech samples. The PC-compatible platform had the Sound Blaster, and when introduced in 1989 could reproduce speech via a Digital Sound Processor which functioned as a DAC.  Limited floppy disk and ROM cartridge space limited the amount of speech heard in home video games until the CD-ROM allowed for vast amounts of speech.


  1. You neglected to mention the software-based speech synthesis available for several platforms. Software Automatic Mouth for several 8-bit platforms, as well as the SoftVoice technology found on the early Mac and Amiga.

    1. I tried to stay away from pure software methods and simple hardware DACs (like the SAM in the Apple II) to focus on speech synthesizer chips and devices here. A blog entry devoted to software methods would be interminably long!