Sierra Adventure Game Interpreter specifications: SOUND resources

9. SOUND resources

Written by Lance Ewing, with additions/modifications by Claudio Matsuoka, Paul Lunga and Ian Schmidt (Last updated: 22 May 1999).

Most people who think of AGI games remember that they played their music and sounds over the PC speaker. What they may not know is that all sounds in the MS-DOS, Tandy and Macintosh versions are composed of four parts, one which is the melody, two which are accompaniment, and the final one being noise. The IBM PC can only play one note at a time so all AGI games for the PC play the melody by itself. The Apple IIgs version has much more sophisticated sound: 16 channel wavetable based MIDI songs for the soundtracks, and digitally sampled PCM sound effects.

According to Donald B. Trivette, author of ``The Official Book of King's Quest'', a year before the IBM PCjr was announced IBM asked Sierra to create a game that would show off the new computers color graphics capabilities. IBM supplied the company with a prototype Junior, and Roberta set to work designing a new type of adventure game. The game produced was called King's Quest. This is important because the IBM PCjr had a different method of sound generation than the IBM compatibles of today. The sound data was stored to make it easy to send to the Juniors sound generators. This format appears to have remained right through the AGI games up until 1989--90 when SCI took over even though the PCjr had long since been surpassed by the 286, and 386.

9.2 Sound in the IBM PCjr

The best known source of sound in the Junior is the TI SN76496A sound generator chip. This source has four separate sound voices. Three of these are tone generators and the fourth is a noise source. All four voices have an independent volume control, providing an evenly graduated set of 15 volume levels, plus a zero volume (off). Each of the three pure voices has an independently selected frequency. The noise voice has three preselected frequencies and a fourth option, which borrows the frequency of the third pure voice. The data stored in the AGI games is designed to be sent to these four voices.

The tone generation

A tone is produced on a voice by passing the sound chip a 3-bit register address and then a 10-bit frequency divisor. The register address specifies which voice the tone will be produced on. This is done through port 192 on the IBM PCjr by sending it 2 bytes in the following format:


First Byte
7  6  5  4  3  2  1  0

1  .  .  .  .  .  .  .      Identifies first byte (command byte)
.  R0 R1 R2 .  .  .  .      Register number in T1 chip (0, 2, 4).
.  .  .  .  F6 F7 F8 F9     4 of 10-bits in frequency count.

Second Byte
7  6  5  4  3  2  1  0

0  .  .  .  .  .  .  .      Identifies second byte (completing byte)
.  X  .  .  .  .  .  .      Unused, ignored.
.  .  F0 F1 F2 F3 F4 F5     6 of 10-bits in frequency count.

Register Addresses:
R0      R1      R2

 0       0       0          Holds voice 1 frequency number.
 0       1       0          Holds voice 2 frequency number.
 1       0       0          Holds voice 3 frequency number.

The actual frequency produced is the 10-bit frequency divisor given by F0 to F9 divided into 1/32 of the system clock frequency (3.579 MHz) which turns out to be 111,860 Hz. Keeping all this in mind, the following is the formula for calculating the frequency:


f = 111860 / (((Byte2 & 0x3F) << 4) + (Byte1 & 0x0F))

Note: The order of the bytes are reversed for AGI sound data.

Attenuation

Each voice in the T1 sound chip has an independent sound-level control, which is calculated in terms of decibels of attenuation, or softening. There are four bits uses to control the volume. These bits, labeled A0 through A3, can be set independently or added together to produce sixteen volume levels as shown below.


A0 A1 A2 A3        Value        Attenuation (decibels)

 .  .  .  1          1                    2
 .  .  1  .          2                    4
 .  1  .  .          4                    8
 1  .  .  .          8                   16
 1  1  1  1                           Volume off

When a bit is set on, the sound is attenuated (reduced) by a specific amount: either 2, 4, 8, or 16 decibels. When all four bits are set on, the sound is turned completely off. When all four bits are off, the sound is at its fullest volume.

The attenuation is set by sending a byte of the following format to the T1 sound chip:


7  6  5  4  3  2  1  0

1  .  .  .  .  .  .  .      Identifies first byte (command byte)
.  R0 R1 R2 .  .  .  .      Register number in T1 chip (1, 3, 5, or 7).
.  .  .  .  A0 A1 A2 A3     4 attenuation bits

   Register Addresses:
R0      R1      R2

 0       0       1          Holds voice 1 attenuation.
 0       1       1          Holds voice 2 attenuation.
 1       0       1          Holds voice 3 attenuation.
 1       1       1          Holds noise voice attenuation.

The noise generator

There are two modes for the noise operation, besides the four frequency selections. One, called periodic noise, produces a steady sound; the other, called white noise, produces a hissing sound. These two modes are controlled by a bit known as the FB bit. When FB is 0, the periodic noise is generated; when FB is 1, the white noise is produced.

Two bits, known as NF0 and NF1, control the frequency at which the noise generator works. Three of the four possible combinations of NF0 and NF1 set an independent noise frequency based on the timer. The fourth combination borrows the frequency from the third of the three pure voices made by the tone generators.


 NF0  NF1       Noise Frequency

  0    0         1,193,180 / 512 = 2330
  0    1         1,193,180 / 1024 = 1165
  1    0         1,193,180 / 2048 = 583

The noise frequency is set by sending a byte of the following format to the T1 sound chip:


7  6  5  4  3  2  1  0

1  .  .  .  .  .  .  .      Identifies first byte (command byte)
.  1  1  0  .  .  .  .      Register number in T1 chip (6)
.  .  .  .  X  .  .  .      Unused, ignored; can be set to 0 or 1
.  .  .  .  .  FB .  .      1 for white noise, 0 for periodic
.  .  .  .  .  . NF0 NF1    2 noise frequency control bits

9.3 Sound in the Apple IIgs

The Apple IIgs uses the Ensoniq 5503 DOC (Digital Oscillator Chip) to produce its sound. The 5503 has 32 oscillators and is capable of playing wavetable based music using digital sound samples stored in its own dedicated RAM (much like the Gravis Ultrasound card for the IBM PC). Thanks to the 5503, AGI games for the IIgs have a much richer sound than the PC, Mac or Tandy versions.

Follows an excerpt of the Apple II Sound and Music FAQ, version 1.7.

Technical Specs for the GS Ensoniq chip

Written by Ian Schmidt (Last updated: 3 November 1997).

The 5503 Ensoniq Digital Oscillator Chip (DOC) contains 32 fundamental sound-generator units, known as ``oscillators''. Each oscillator is capable of either making an independent tone by itself, or of being paired up cooperatively with it's neighbor in a pairing known as a ``generator''. The generator arrangement is used by most programs, for it allows more flexibility and a thicker, lusher sound.

The DOC plays 8-bit waveforms, with the centerline at 0x80 (128 decimal). This format is known as ``8-bit unsigned''. 0x00 (0 decimal too) is reserved for `stop'. If a sample value of 0 is encountered by a DOC oscillator, the oscillator will immediately halt and not produce any more sound. The DOC additionally has an 8-bit volume register for each oscillator, with a linear slope. The dynamic range of the DOC (the `space' between the softest and loudest sounds it can produce) is approximately 42 dB, or about on par with an average cassette tape.

Each oscillator has it's own 16 bit frequency register, ranging from 0 to 65535. In a normal DOC configuration, each step of the frequency register increases the play rate by 51 Hz, and computing the maximum theoretical play rate is left as an exercise for the student.

When oscillators are paired to create generators, there are 4 possible modes:

Free-run: the oscillator simply plays the waveform and stops. No interaction with it's 'twin' occurs.
Swap: Only one oscillator of the pair is active at a time. When one stops, the other immediately starts.
Loop: The oscillator simply plays the waveform and if it hits the end without encounter.cgiing a zero, it starts over at the beginning.
Sync/AM: This actually has 2 possible effects: either one oscillator of the pair modulates the volume of the other with the waveform it's playing, or both oscillators sync up perfectly, causing a louder and more 'solid' sound.

Oscillators play waves stored in up to 128k of DRAM. This DRAM is not directly visible from the GS's 65816 CPU, but can be accessed (slowly) via services supplied by the Sound GLU chip. Note that no widely manufactured IIgs motherboard supported the full 128k of DRAM that the DOC can see. Conversely, no synthesizer Ensoniq made using the DOC had anything less than the full 128k.

The output of an oscillator can be directed to any one of 16 possible channels. Apple only makes 8 channels avalible via the 3 bits on the sound expansion connector, and all current stereo cards limit this to 1 bit, or two channels. However, the ``Bernie II The Rescue'' IIgs emulator for the Power Mac expands this support to 4 discrete output channels, two of which are encoded to the rear channel for Dolby Pro-Logic compatible output. No IIgs software that I'm aware of supports more than 2 channels however.

9.4 Sound in other platforms

According to Paul Lunga, sound in the Macintosh and Tandy versions of the AGI games are pretty much the same as the PCjr version (three sound channels plus noise). AIFF files of AGI music in these platforms are available at http://agi.helllabs.org/sound.

9.5 SOUND resource format (PCjr version)

We now know enough about the PCjr's T1 sound chip to discuss the AGI sound format. The sound is stored as four separate units of data, one for each voice. Each sound file stored in the VOL files has an 8-bit header which contains offsets into file. The format is as follows:


Byte  Meaning
----- -----------------------------------------------------------
 0-1  Offset of first voice data
 2-3  Offset of second voice data
 4-5  Offset of third voice data
 6-7  Offset of noise voice data
----- -----------------------------------------------------------

The data starting at each voice offset is stored as 5-byte notes which give the frequency and duration of a note played on that voice. The 5 bytes have the following meanings:


Byte  Meaning
----- -----------------------------------------------------------
 0-1  Duration (16-bit word)
 2-3  Frequency divisor of the format described in the PCjr
      section above except the two bytes are around the other way
  4   Attenuation of the note in the format described above in
      the PCjr section
----- -----------------------------------------------------------

Note that the last three bytes were around the other way in version 1 of the AGI interpreter. The above order is opposite from the order that would be output to the T1 sound chip.

Each voice's data section in the SOUND resource file is usually terminated by two consecutive 0xFF codes. Another way of checking for the end is to see if it has reached the start of the next voice section, or in the case of the noise voise, the end of the SOUND data.

Summary

The header consists of four two-byte offsets, one for each voice. The format is little-endian. Each offset points to the note data for the relevant voice. The note data for a voice consists entirely of five-byte note entries of the following format:

First and second byte: Note duration

Third byte

In the case of a tone voice,

7  6  5  4  3  2  1  0

0  .  .  .  .  .  .  .      Always 0.
.  X  .  .  .  .  .  .      Unused, ignored.
.  .  F0 F1 F2 F3 F4 F5     6 of 10-bits in frequency count.

In the case of the noise voice, this byte is equal to zero.

Fourth byte

In the case of a tone voice,

7  6  5  4  3  2  1  0

1  .  .  .  .  .  .  .      Always 1.
.  R0 R1 R2 .  .  .  .      Register number in T1 chip (0, 2, 4).
.  .  .  .  F6 F7 F8 F9     4 of 10 bits in frequency count.

F = frequency = 111860 / (((Byte3 & 0x3f) << 4) + (Byte4 & 0x0f))
R = register address

In the case of the noise voice,

7  6  5  4  3  2  1  0

1  .  .  .  .  .  .  .      Always 1.
.  1  1  0  .  .  .  .      Register number in T1 chip (6)
.  .  .  .  X  .  .  .      Unused, ignored; can be set to 0 or 1
.  .  .  .  .  FB .  .      1 for white noise, 0 for periodic
.  .  .  .  .  . NF0 NF1    2 noise frequency control bits

NF0  NF1       Noise Frequency

 0    0         1,193,180 / 512 = 2330
 0    1         1,193,180 / 1024 = 1165
 1    0         1,193,180 / 2048 = 583

Fifth byte

   7  6  5  4  3  2  1  0

   1  .  .  .  .  .  .  .      Identifies first byte (command byte)
   .  R0 R1 R2 .  .  .  .      Register number in T1 chip (1, 3, 5, or 7).
   .  .  .  .  A0 A1 A2 A3     4 attenuation bits


   A0 A1 A2 A3        Value        Attenuation (decibels)

    .  .  .  1          1                    2
    .  .  1  .          2                    4
    .  1  .  .          4                    8
    1  .  .  .          8                   16
    1  1  1  1                           Volume off


 Register Addresses:

   R0 R1 R2        Parameter

    0  0  0        Voice 1 frequency control number (10 bits)
    0  0  1        Voice 1 attenuation (4 bits)
    0  1  0        Voice 2 frequency control number (10 bits)
    0  1  1        Voice 2 attenuation (4 bits)
    1  0  0        Voice 3 frequency control number (10 bits)
    1  0  1        Voice 3 attenuation (4 bits)
    1  1  0        Noise voice control (4 bits; 3 used)
    1  1  1        Noise voice attenuation (4 bits)

The note data for one voice is terminated by two consecutive 0xFF values.

AGI v1.12 sound format

The sound format used in version 1.12 of the AGI interpreter was quite different from the format described above for AGIv2 and AGIv3. It still uses the PCjr format for the note data but it does not store the duration as a separate field. The best way to describe it is by an example:


90 80 16 B0 A0 15 D0 C0 0E FF E4 00 80 17 A0 16 C0 11 00
80 16 B1 A0 14 C0 12 00 80 16 B2 A0 16 C0 13 00 ...

The first thing to point out is that the PCjr note data is in the opposite order to AGIv2. Secondly, all four parts are included together rather than in separate sections. Taking the above example, lets look at the first note and show the equivalent AGIv2 notation.


90 80 16 --> 03 00 16 80 90

Now, the duration isn't immediately obvious, but we will come to that in a short while. The followint three bytes give the first note for the second part, the third part, and the noise part (at least as far as this example is concerned).


B0 A0 15 --> 03 00 15 A0 B0
D0 C0 0E --> 03 00 0E C0 D0
FF E4 00 --> 33 00 00 E4 FF

The data that follows after these initial four starting notes is basically any changes in the note value which each 3 duration step. For example,


80 17 --> 03 00 17 80 90

Note that 0x90 doesn't need to be stored because that byte has retained its value. Every 0x00 byte that is encountered is the end of one set of note changes. Each set of note changes is the equivalent of a duration of 3 in the AGIv2 format. Continuing with our example,


A0 16 --> 03 00 16 A0 B0
C0 11 --> 03 00 11 C0 D0

The example now encounters a 0x00 byte which means that the noise voice isn't changed at this point. In fact, from the AGIv2 equivalent note above, you will see that the noise note will not change until 49 (or 0x33) sets of note changes have been processed.


80 16    --> 03 00 16 80 90
B1 A0 14 --> 03 00 14 A0 B1
C0 12    --> 03 00 12 C0 D0

How exactly the AGIv1.12 interpreter knows which voice is having its notes changed, and which bytes of the note are being changed, is not yet certain. On some occassion a sets of changes will contain only one byte which corresponds to one of the bytes which makes up one of the voices note value, but how it knows which one is a mystery to me.

On other occassions, there could be a whole chain of 0x00 bytes which means that during that whole time, none of the voices are changing their notes value.

9.6 SOUND resource format (IIgs version)

There are two types of SOUND resources in the IIgs AGI games: PCM samples (used for sound effects) and MIDI sequences. The first two bytes can tell what type of resource we have:


Byte  Meaning
----- -----------------------------------------------------------
 0-1  Resource type (01 00 = sample, 02 00 = MIDI)
----- -----------------------------------------------------------

Sampled sounds

Sampled sounds (resource type 01) are stored in 8 bit, unsigned format after a 54 byte header described below.


Byte  Meaning
----- -----------------------------------------------------------
 0-1  Resource type (01 00)
 2-7  ???
 8-9  Sample size
10-53 ???
----- -----------------------------------------------------------

MIDI sequences

Written by Ian Schmidt (Last updated: 3 April 1999).

MIDI songs have a stram of MIDI data following the resource type. The following dump shows the MIDI data in the King's Quest I opening theme resource:


02 00           Type: MIDI sequence

00 c0 28        Set patch 0x28 in channel 0
00 c1 28        Set patch 0x28 in channel 1
00 c2 29        Set patch 0x29 in channel 2
00 c3 16        Set patch 0x16 in channel 3
00 c4 01        Set patch 0x01 in channel 4

00 b0 07 7f     Set channel volumes (MIDI controller 07)
00 b2 07 7f     ...
00 b4 07 6a
01 b3 07 6e
18 b1 07 7b

4d 90 43 38     Play note 0x43 in channel 0 with velocity 0x38
0a 80 43 40     Resease note 0x43 in channel 0 with velocity 0x40
0c 90 43 45     ...
0b 80 43 40
01 91 3c 35
00 92 30 40
00 91 43 39
00 40 37
02 90 48 40

The patch number is mapped to a sound sample stored in the sierrastandard file in a more complicated way. The IIgs interpreter uses the patch number as a lookup into a list of instrument definitions, which are stored in a format used by an API called the ``Note Synthesizer''.

For example, here's instrument 0 from Police Quest, as dumped out of the pq.sys16 file:


INST #000:                                                                      
Envelope:                                                                       
[seg 0]: BP 7f  Inc f00                                                         
[seg 1]: BP 78  Inc a                                                           
[seg 2]: BP 78  Inc 0                                                           
[seg 3]: BP 0  Inc 514                                                          
[seg 4]: BP 0  Inc 0                                                            
[seg 5]: BP 0  Inc 0                                                            
[seg 6]: BP 0  Inc 0                                                            
[seg 7]: BP 0  Inc 0                                                            
rel seg: 3, pri inc: 32, bend range: 2, vib dep: 75, vib spd: 50                
A wave count: 1, B wave count: 1                                                
[A 1 of 1] top: 7f, wave address: 50, size: 12 mode: 00, relPitch: 00fe         
[B 1 of 1] top: 7f, wave address: 50, size: 12 mode: 00, relPitch: 00fe

Now, that's a bit scary looking, but all the important information is there :-)

Basically the Note Synth API groups 2 5503 voices together to make 1 voice, and this has all the data you need to control it. Let me rephrase some info from my dusty old ``IIgs Toolbox Reference, Volume 3''.

The envelope is first: for each segment there's a breakpoint (target volume, which is on a logarithmic scale in 6 decibel units) and an Increment (a 8.8 fixed point number telling how much to adjust the volume on each tick). For instance, if the first segment had BP = 1 and Inc = 0x0001, it would take 256 ticks for the volume to reach 1. The rel seg tells which segment of the envelope is the final one. Ticks in the Note Synth default to 100 Hz, although the AGI interpreter may well have used a different value -- I'll have to check.

Bend range is the number of semitones the instrument will be bent by if a pitch wheel message at maximum deflection in either direction is encountered. For this instrument, that's 3 semitones in each direction.

Vib Dep and Vib Spd aren't being dumped properly yet (minor bug in my utility), but they specify the depth and speed of an optional vibrato effect.

A wave count and B wave count tell how many wavelists there are for each 5503 voice. If there's more than 1 wavelist you compare the note being started with the top value in each wavelist and if top is greater than or equal to the note you're starting you use that wavelist. In this case there's only 1 possible wavelist so top is naturally 0x7f, the highest possible numbered MIDI note.

Once you've picked a wavelist using the note, the other information is all there. Wave address is the offset in 256-byte pages into the 64k sierrastandard image (ie, (Wave Address)<<8 gives a true offset). In this case the wave starts 0x5000 bytes into the image. For size you mask off all but the lowest 3 bits and it gives you the basic wave size as follows:

Case the wave starts 0x5000 bytes into the image: For size you mask off all but the lowest 3 bits and it gives you the basic wave size as follows:

%000 = 1 page (256 bytes)
%001 = 2 pages (512 bytes)
%010 = 4 pages (1k bytes)
%011 = 8 pages (2k bytes)
%100 = 16 pages (4k bytes)
%101 = 32 pages (8k bytes)
%110 = 64 pages (16k bytes)
%111 = 128 pages (32k bytes)

Note that if a zero is encountered in the wave before that size you still stop at the zero.

Mode is the 5503 oscillator mode for the voice in the bits 1 and 2. The lowest bit (bit 0) is a ``halt'' flag. 0 is looping, 1 is oneshot (play once), 2 is sync/AM (which nobody uses, but I'll try to explain it if they actually are), and 3 is swap (oscillator 0 plays once, generates an IRQ, and oscillator 1 starts automatically. If oscillator 1 is also in swap mode, it will play once, generate and IRQ, and auto-start oscillator 0 again. If oscillator 1 is loop mode it will just loop continuously - this setup is often used to have a sampled ``attack'' on an instrument followed by a loop.

The top 4 bits of the mode is the stereo channel where even numbers are right and odd numbers are left, I believe (no real harm in reversing them).

The full bitmap for the control register is:


% cccc 0mmh

Where:

c - stereo output channel. Odd values mean left, even mean right.
m - mode, as described previously.
h - halt bit. 0 to start the oscillator, 1 to halt it. This is handled specially in swap mode.

relPitch is the fine-tune value in 8.8 fixed point, given in semitones.

So you'd then start 2 voices using the appropriate A and B wavelist entries and handle them accordingly.

The full structures look like this: (there is no structure padding on the IIgs - all bytes are jammed right together!)

In Police Quest these structures start at offset 0x8469 in PQ.SYS16. (The same offset is used for KQ1 --CM) I can't find any of my other old disks with AGI games to locate their offsets. PQ only defines 28 instruments.


ENVELOPE:

Byte  Meaning
----- -----------------------------------------------------------
  0   Breakpoint for this segment
 1-2  Increment for this segment
----- -----------------------------------------------------------

WAVELIST:

Byte  Meaning
----- -----------------------------------------------------------
  0   Top key
  1   Wave address
  2   Wave size
  3   Mode / stereo position
 4-5  relPitch
----- -----------------------------------------------------------

INSTRUMENT:

Byte  Meaning
----- -----------------------------------------------------------
0-23  8 envelope segments
 24   Release segment
 25   Priority increment (you can ignore this)
 26   Bend range
 27   Priority increment (you can ignore this)
 28   Bend range
 29   Vibrato depth
 30   Vibrato speed
 31   "spare" (unused)
 32   A wave count (number of A oscillator wavelists)
 33   B wave count (number of B oscillator wavelists)
34-?  (A wave count number of wavelists)
  ?   (B wave count number of wavelists)
----- -----------------------------------------------------------

9.7 Playing the sounds on a sound card

Written by Lance Ewing (Last updated: 18 August 1997)

Writing a program to play the tunes will require four pointers which keep track of where in each voice segment the program currently is since all four voices are played simultaneously. The first voice is the melody and is the voice that is played on the PC speaker in today's modern PC compatibles, the other two voices being ignored. I'd imagine that other platforms such as the Amiga and Macintosh would probably play all three voices.

A program would start by reading each of the four offsets in the header. It would then go through a loop which begins by reading the first note of each voice section. The duration's are then monitored and when each note finishes, another note is read. Note that the notes for each voice will usually finish at different times. The program finishes when all of the voice sections have been entirely played. This will usually occur for each voice at the same time but not necessarily I don't think.

Then of course you could always convert the AGI SOUND to a MIDI file and play that which will sound a hundred times better :)

Calculating frequencies when playing notes on a sound card

My program reads in the duration as a 16 bit word. It then loads the two following bytes and calculates the frequency as follows:


f = 111860 / (((Byte2 & 0x3F) << 4) + (Byte3 & 0x0F))

The 111860 comes from the PCjr discussion above. Note that the bytes are in the opposite order from that mentioned in the PCjr information.

Remember also that the SOUND format includes volume information for each voice. The exact conversion from the decible values to the volume control on todays sound cards is uncertain at this stage.

9.8 Sample code

The following examples are available in the distribution package:

adlib.c by Kevin A. Lee: low level adlib routines
adlib.h by Kevin A. Lee: header for adlib.c
oldplay.c by Lance Ewing: old program to play AGI sounds
play.c by Jens Christian Restemeier: new program for playing AGI sounds (plays as a MIDI file)
agiplay.c by Claudio Matsuoka: program to play PCjr AGI sound resources in Linux using software mixing and /dev/dsp

Previous Next Table of Contents