The Famicom's Internal Audio
First, we must consider the features of the internal audio. Inside the NES and Famicom's CPU chip, the 2A03 or 2A07, there exists an Audio Processing Unit. The APU can produce five channels of music. The first two channels produce pulse wave generators with variable duty cycles (three effective choices) with 11-bit frequency divider. The variable duty cycles feature makes these channels more flexible than the ordinary square wave channels found on the TI SN76489 used on the Sega Master System. The three aurally distinct choices have the duty cycle at 50%, 25% and 12.5%, which can be visualized in the following manner (one period) :
They are frequently used as the main instruments in APU music, but sound somewhat hollow. The two channels have equal functionality.
The third channel is a triangle wave generator with a 32-step waveform (16 steps up, 16 steps down) and an 11-bit frequency divider. It is typically used for rhythm and usually sounds rather mellow. The fourth channel is a noise generator with sixteen frequency selections and short and long sequences. The long sequence typically sounds something like white noise and is often used for sound effects or percussion. The short sequence was not available until 2A03 revision E CPUs, which were used after Nintendo had been manufacturing Famicoms for about a year.
The pulse channels can apply sweeps to the frequency and both the pulse and noise channels can have ASDR envelopes applied to them. The pulse and noise channels have a 4-bit volume control. The fifth channel can play back digitized samples with 7-bit fidelity typically using delta modulation but it can also output 7-bit samples directly. The 2A03 outputs the pulse channels on one pin and the triangle, noise and DPCM channels on another pin and they are mixed on the console's PCB and then sent to the cartridge connector.
While the NES has the same internal audio as the Famicom, it does not send its audio to the cartridge connector. Instead, the NES sends its audio directly to the RF & AV unit. There is a pin on the expansion connector on the NES front loader by which a cartridge can send its audio to be mixed with the audio from the NES's APU. However, this pin must be bridged with a resistor to an unused pin on the cartridge connector to mix the audio. Nintendo never released a peripheral that used the NES expansion port, so NES games never used the expansion audio functionality.
Cartridge Games with Expansion Audio
Bandai M50805 - 1 game
Bandai appears to be the first cartridge hardware maker to use hardware to allow a cartridge to generate on-board audio. In this case, Bandai used a crude speech synthesis chip called the Mitsubishi M50805. It only used this chip for Family Trainer 3: Aerobics Studio in February of 1987, one of the ten games it released for the Japanese version of the NES Power Pad (the Family Trainer Mat). This chip can store eight speech samples within 960 bytes of ROM space. The samples use a female voice and are "one" "two" "three" "four" "hi (or hai)" "good" "next" and "hello, let's go". With only a few simple words to be spoken, Bandai could get away with using such a simple chip.
When the game was released for the NES, Bandai had to use a larger ROM with DPCM samples for the internal APU. The sounds are more muffled than the Japanese version. However, the instructor in the game talks constantly, so the less harsh sound of the NES version may be preferable. In any event, unless you have a Family Trainer mat, you will not be able to really play the Japanese game.
Here are recordings of the samples from the Japanese version, which can be used in some emulators :
Jaleco D7756/55 - 7 games
Jaleco apparently had a similar idea to Bandai, but was more ambitious in both the chips it used and the number of games with supported its sample-playback chips. It used the NEC D7756C in six of its games, which could hold and play back 32KB of embedded samples. The first game to use this chip was Moero!! Pro Yakyuu in June, 1987. One game, Terao no Dosukoi Oozumou (a sumo wrestling game) used the D7755C chip, would could only support 12KB of samples.
Five of these games were baseball games, and in these games umpires and announcers speak words like "strike" "ball" "foul", "safe", "out", "you're out" and "home run". They also use samples for sound effects like the bat or racket hitting the ball and the crowd cheering. Both the Mitsubishi and NEC chips are implemented simply to play samples via a single command, there is no adjustment for the pitch, speed or volume of the voices.
Five of these seven games were released in the US. Given that baseball and tennis are popular on both sides of the Pacific, this was a natural choice.
Moero!! Pro Yakyuu - Bases Loaded
Moero!! Pro Tennis - Racket Attack
Moero!! Pro Yakyuu '88: Kettei Ban - Bases Loaded II
Terao no Dosukoi Oozumou
Shin Moero!! Pro Yakyuu
Moe Pro! '90: Kandou-hen - Bases Loaded 3
Moe Pro!: Saikyou-hen - Bases Loaded 4
Like Bandai, when Jaleco ported these games they could not use speech chips inside their cartridges. Instead they used the DPCM channel of the APU to play samples. The Jaleco speech chip's samples sound clearer and less muffled than the NES's DPCM playback. Sound effects typically used the noise or pulse wave channels instead of samples. The PRG-ROM is usually larger in the NES cartridges to compensate for the need for extra space to store the samples.
I made a video showing gameplay of both Moero!! Pro Yakyuu and Bases Loaded here : https://www.youtube.com/watch?v=JQAy6_ZLE3c&t=270s
You can find samples for several of the games here for emulator purposes : http://tsk-tsk.net/net/adpcm%20samples/
Namcot-163 - 9 games
Namco was decidedly more ambitious than Bandai or Jaleco. It was the first company to implement a chip designed for enhanced music playback. Its solution is the Namcot-163 chip, an eight voice wavetable playback chip first used in 1988. This chip allocated 128 bytes of RAM to the sample playback system. Each wavetable sample can be up to 64 nibbles (4-bits) long. The functionality is similar to the FDS's sound channel and the TurboGrafx-16/PC Engine's APU but less sophisticated.
All but two of the games which used this chip only used four voices. This is because the chip outputs each voice serially instead of mixing them together. With fewer audio channels, it can output each channel faster and allow for more detailed tones. Only two games, King of Kings and Erika to Satoru no Yumebouken used eight channels The use of eight channels can result in very audible high frequency noise, especially through separate audio output (compared to the RF modulator). This is because the chip cannot clock each channel at a rate above the frequency which most humans can when seven or eight channels are used. A high pitched ~15KHz or ~17KHz tone/whine can sometimes be heard.
Namco also used the 163 on other games as a cheap way to obtain an extra 128 bytes of RAM instead of using the chip as a sound device. There will be a battery inside those cartridges. On the list below, Megami Tensei II and King of Kings have batteries, but they also have a separate S-RAM chip.
Megami Tensei II: Digital Devil Story*
Namco Classic 2@
King of Kings*
Erika to Satoru no Yumebouken
Of the games on this list, I would consider only Rolling Thunder, Final Lap and possibly Mappy Kids to be sufficiently English-language friendly that you could enjoy the game without needing a translation patch. While Mappy Kids, Megami Tensei II: Digital Devil Story and King of Kings have full translation patches (*) and Namco Classic 2 has a partial translation patch, patching these games on real hardware is often impossible because Namco used epoxy covered bonded ROMs instead of plastic cased ROMs most of the time. Namco was really cheap.
Konami VRC-VI - 3 games
Konami's first venture into expansion sound came with its VRC-VI chip. This chip could produce extra three channels of audio. There are two pulse wave channels with eight duty cycle selections and 12-bit frequency control. It also has a sawtooth channel with 12-bit frequency control. Sawtooth waves have a bright, brassy sound.
Three games use this chip, the famous Akumajou Densetsu froom 1989 being the most prominent. Akumajou Densetsu is the Japanese version of Castlevania 3 and has better music thanks to the VRC6 chip. Madara and Esper Dream 2 use the chip and extra battery backed memory. Unfortunately, the board used by Madara and Esper Dream 2 has switched addresses compared to the board used by Akumajou Densetsu. This can be overcome if you wish to use one of those two less well-known games to make a functional Akumajou Densetsu reproduction. Translation patches exist for all three games, but Akumajou Densetsu does not really need one to be playable.
Nintendo MMC5 - 4 games
After seeing its licensees produce cartridges with built-in audio, Nintendo decided to get in on the act when it designed the most advanced memory controller chip found in a licensed Famicom cartridge, the MMC5. This chip added many features, the ability to address up to 1MB of PRG-ROM and 1MB of CHR-ROM, a a hardware multiplier, 1KB of embedded RAM with the ability to add a third nametable or allow every background tile to choose its own palette entries (instead of eight tiles sharing a palette). It also has a vertical split screen mode, but the only game confirmed to use it is Uchuu Keibitai SDF.
The MMC5 adds three extra channels of audio. The first two channels are pulse channels which operate almost identically to the APU's pulse channels but lacking a sweep unit. The third is a PCM channel that handles raw 8-bit samples compared to the raw 7-bit samples of the APU.
Koei used the MMC5 the most often but never used the extra sound channels. Nintendo and HAL mainly used the chip's audio channels for sound effects and PCM drums. Enix used the channels for musical accompaniment in Just Breed to much more impressive effect. Uchuu Keibitai SDF from 1990 is English friendly and Just Breed has a full translation. Metal Slader Glory is the largest licensed Famicom game at 1MB, but it is a graphics-based PC-style adventure game without a translation patch available.
Uchuu Keibitai SDF
Shin 4-Nin Uchi Mahjong
Metal Slader Glory
Konami VRC-VII - 1 game
Konami's second expansion was even more impressive than its first. The VRC-VII chip supports an 6-channel FM Synthesis core based on the YM-2413. The YM-2413 is a 2-operator FM Synthesis chip used in the FM Sound Unit in Sega Mark III and Japanese Master System. Both the YM-2413 and the VRC7 have sixteen preset instruments, but they are not the same. Both chips can support a user-defined instrument, but the YM-2413 has 9 channels, 3 of which can be devoted to Rhythm sounds. The YM-2413 is a cost reduced version of the YM-3812 FM Synthesis chip used in the Adlib and Sound Blaster cards.
Lagrange Point from 1991 is the only game that uses VRC7 audio, but Tiny Toon Adventures 2: Montana Land e Youkoso also contains the chip. However, Lagrange Point uses a large cartridge shell, has battery backed memory with a regulator chip, uses CHR-RAM and has an unusual amplifier/mixer chip on board. Repros are not feasible. There is a translation patch for the game, but the game is rather an old-school RPG. The music is excellent, so it is worth trying out.
Sunsoft 5B - 1 game
Sunsoft was rather late to the join its fellow Famicom cartridge makers with a game that supported expansion audio. The Sunsoft 5B chip uses an YM-2149 sound core. Only one game used the audio functionality, the famous Gimmick! Released in the twilight years of the Famicom in 1992, Gimmick! was not a great seller, making it the most expensive cartridge with expansion audio by far.
While Gimmick! does not use the noise or envelope features of the YM-2149 core, they are present on the chip. All Gimmick! uses is the three square wave channels (with 12-bit frequency divider). Unlike the pulse wave generators of the APU, VRC6 and MMC5, the square wave generator of the YM-2149 chip cannot vary its duty cycle, it is on and off at equal intervals.
The Sunsoft 5B is the audio enhanced version of the Sunsoft 5A and FME-7 chips found in other games. You can add an AY-3-8910, 12 or 13 or YM-2149 to another game to get expansion audio from a Japanese Gimmick! reproduction. Some Famicom games like Gremlins 2 sometimes have the Sunsoft 5B chip instead of the 5A or FME-7. The instructions can be found here : https://jensma.de/nesrepro/gimmick/
While there is a translation patch for Gimmick!, the translation is trivial and not needed to enjoy the game. The game is one of the best Famicom games not to see a wide release in the west. While there was an official PAL release in Scandinavia and an NTSC prototype ROM around, the lack of the 5B music channels means that the APU has to take up the slack. The 5B music channels were used for the star sound effects and some of the music, but the APU has to generate the star sound effect and some elements of the music had to be altered or eliminated.
So far, I have discussed only cartridge games that contained extra audio hardware. There also several peripherals which also could generate audio in one form or another and mix it with the Famicom's internal audio.
Famicom Disk System
Of the 206 licensed games released for the FDS, at least 71 of those games supported the FDS audio channel. The launch title The Legend of Zelda in 1986 was the first game to support expansion audio, and it was the only launch title which had not previously been released on cartridge.
The FDS only added one extra channel via the 2C33 chip found in the FDS RAM Adapter. This channel can accept a custom generated waveform and modulate it. The waveform can be 64 bytes with each byte allowing for a 6-bit resolution. The results can be surprisingly realistic in terms of samples. Vs. Excitebike has good motorcycle engine samples. The modulation unit can alter the pitch and gain of the sound, and volume envelope functions are available. Modulation is well-used for the door opening and closing in Metroid.
At first, Nintendo seemed almost to have a monopoly on the usage of this form of expansion audio. Eventually companies like Konami and HAL Labratory would also use the channel memorably. Four unlicensed games also supported it. Jaleco, Sunsoft and Square were also frequent users. Some companies such as Bandai and Taito rarely used it and other like Irem and Capcom did not use it at all in their games, perhaps reflecting their general lack of enthusiasm to develop games for the FDS. No unlicensed game, then or since, has supported expanded audio except via the FDS.
The Karaoke Studio was not particularly impressive technically, it's functionality is little more advanced than your typical discrete logic-based mapper. However, it does contain a microphone that is attached to the unit that plugs into the Famicom's cartridge slot.
This microphone is a standard condenser microphone with two buttons. When a user speaks into the microphone, that sound will be mixed with the APU audio and sent to the TV. In short, this is the expansion audio that the player makes for himself. The game can read the state of the buttons and a 1-bit ADC stream from the microphone from the memory area typically reserved for cartridge RAM. The game relies on singing challenges, with the player needing to hit the right notes of pop songs at the right time, not unlike Dance Dance Revolution and karaoke video games today.
This is similar to how the microphone on Famicom Controller Port 2 is mixed with the APU audio and sent to the TV. The Controller Port 2 microphone was intended to be cheap and small, so it is an electret condenser microphone. Its input, read on Controller Port 1, acts as a crude ADC. Some games like The Legend of Zelda and Palutena no Kagami (Kid Icarus) support using the microphone to obtain special advantages in the game. In Zelda, shouting or blowing into the microphone will kill any Pols-Voice in the room. You can blow into the microphone in Palutena no Kagami while pressing the A button on Controller II to try to persuade the shopkeeper to lower his prices. Cartridge games like Takeshi's Challenge also used the microphone for a karaoke mini-game.
One final distinguishing feature of the Karaoke Studio is its support for expansion game cartridges. The Karaoke Studio had a special slot which allowed a smaller cartridge with a single ROM chip inside it to expand the game. Two of these expansion cartridges were released. The only other game which supported expansion cartridges was Nantettatte!! Baseball, which allowed you to change player stats based on the new baseball season. Expansion cartridges would be unknown in the west.
Fukutake Publishing's StudyBox
This peripheral runs software stored on cassette tapes and contains a cassette deck built into the unit or a cable to attach to a separate proprietary tape player, depending on the version of the device. The ROM inside the device will load data from the cassette tape and will play analog audio on the tape and send that audio to the Famicom's output. There were many cassette tapes and the software was all educational in nature. Loading times are fairly reasonable. Each tape has two tracks, a digital track with program data and an audio track for playing back audio samples.
Assuming that a load can fill 32KB of PRG and 8KB of CHR addressing space, loading times can take approximately 40 seconds. Here is a representative video of the device in action with an English lesson cassette tape :