Monday, February 15, 2021

The .woz Format - Accurate Preservation of Apple II Floppy Disks

The need for proper preservation of floppy-based software cannot be understated.  Floppy disks were not designed to store data for forty plus years, but for the oldest home computer systems like the Apple II, most of its software is at least thirty years old.  But it is not preservation merely to dump a copy of a game which was pirated in the day.  Those games usually have "cracktros" which do not represent the developer's intended presentation of the game, may have cut out elements of the original game to save space or may include corrupt data in them.  Ideally one should have a proper image of original disks with all data preserved.  Of course, from almost the earliest days of the Apple II's Disk II drive, copy protection schemes were implemented on commercial software to prevent casual disk copying.  True preservation requires preserving them as well, and that requires emulation to become more accurate than it needed to be for just sector based .dsk images.  In this blog article I will describe in as much detail as I can how the Disk II Floppy Drive works, how it is different from floppy drives for other systems, how data is stored on disk, the benefits of the .woz format and how .woz images are made.

Features of the Disk II Floppy Drive

Almost every disk drive ever made has at least two motors, one to spin the disk, the other to position the read/write head.  The Apple II's Disk II drive is no exception to this general principle, but the manner in which it controls the read/write head motor is very unusual for a consumer-based drive.  Most home computers issue a command to the drive interface for the head to change to the next track, the previous track or back to track #0.  Their disk drives and controllers have sufficient capabilities to be able to position the head over the surface of a disk so that track 1 did not overlap track 2 and so on.

But the Apple II's disk drive came from an earlier time when drive electronics were not so sophisticated.  Steve Wozniak based the Disk II on the earliest 5.25" floppy drive, the Shugart Associates SA400, taking the drive mechanics but ditching the sophisticated controller board and using his own simpler and cheaper board.  The Disk II uses a four-phase stepper motor, and toggling a phase turns the motor a one-quarter turn in one direction or the other.  To move a standard 5.25" track width, one had to toggle all four phases in software in a particular order and with careful timing.  The Apple II's Disk II Controller card provided softswitches to manipulate the phases.  The SA400 and its later IBM-derived drives like the Tandon TM100-2 just have one signal line for stepping and one line for the direction and let the drive's controller handle the phases.

Of course, if a programmer did not want to interact with the disk drive at such a low level, the Disk Operating System's (DOS) Read-Write-Track-Sector (RWTS) routines provided a higher level abstraction to disk drive operations.  The Disk II, unlike an IBM PC drive, cannot tell the system that it has reached track 0, so when power is applied to the Drive, it is sent more than enough phase changes to ensure it reaches track 0.  The characteristic stutter when a Disk II drive is powered on is due to the drive head being knocked against the forward end of the rails several times to ensure the head is positioned over track 0 on the outer edge of the disk.  The Commodore 1541 was also notorious for using this method to find track 0.

In addition, many floppy drives came with a sector index hole.  The sector index hole was a hole punched through the jacket and the surface of the floppy disk which could be viewed by an optical sensor in the drive.  Essentially when the hole emitted light from an LED to a phototransceiver, that would tell the computer that the disk is at a physically known location.  Computers that could use the index hole would write sector 0 aligned to that hole.  While convenient, this was not strictly necessary as software could determine the location of sector 0 by reading sector index data stored on a disk in between sector data areas.  Steve Wozniak removed the index sector hole sensor to save cost on the Disk II drive, but neither Atari nor Commodore 8-bit computers have an index sector hole sensor either.

When a consumer based floppy drive without an index hole sensor writes track, each sector 0 on track will almost certainly be offset from the previous track by some random amount.  A commercial-based floppy duplicator is not so limited, it can be programmed to synchronize tracks and sectors in a predictable manner.  Many copy protected games use synchronization tricks that could only work reliably on disks duplicated on professional-grade equipment.  But this is hardly unique to the Apple II, some Commodore 64 games used "half-tracks" which wrote data to a track half as wide as a standard track. What appears to be unique to Apple II copy protection is the "quarter-track" method.  This works because, as I explained previously, the Apple II can position its disk drive in quarter track increments and decrements.  

Data is recorded on a floppy disk surface as flux transitions.  A flux transition represents a change in the polarity on the surface of a disk.  As the disk consists of magnetic particles, the write head can polarize an area of particles to North or to South.  The disk controller does not know whether a particular set of particles is polarized in one direction or another when reading the disk, all it can register is when the polarity changes.  And to keep the data stream in synch and to prevent the drift that occurs after a lengthy sequence of no transitions, all floppy disks with usable data on them require frequent transitions.  

Structure of the Disk II Floppy Disk

This is only a broad overview of the subject, for all the details given in this section and the next section, read the seminal works on the subject, "Beneath Apple DOS" and "Beneath Apple ProDOS" by Don Worth and Pieter Lechner. Also, Jim Sather's "Understanding the Apple II" and "Understanding the Apple IIe" is also helpful.

How those transitions are converted into usable data, in the Apple II, is done by a combination of hardware and software.  The principal hardware is a chip inside the Disk II drive called the MC3470, which is essentially a DAC specifically designed for floppy drives that converts the analogish-nature of the drive's reading mechanism into digital data which a computer can understand.  Then the Disk II Controller card implements, via firmware and PROM-based logic, a scheme which decodes the flux transitions into nibblized (4-bit, 5-bit or 6-bit) bit cells.  The scheme Wozniak came up with is a particular implementation Group Code Recording (GCR).  The GCR scheme was originally defined as all encoded bytes must start with a 1 bit as the most significant bit and can have no more than one 0 bit in between 1 bits.  Even if the area is not one with valid data, extra clocking bits can be inserted into the bitstream to keep the clocking of the bits (at 4us for a flux transition) reliable.  

His GCR scheme as actually implemented in released hardware and software originally encoded 5-bits of useful data out of every two nibbles/8-bits read from the disk (5,3).  Later he extended the scheme to encode 6-bits of useful data (6,2) by permitting no more than two 0 bits in a byte.  This enhancement required new firmware for the Disk II Controller card.  The original 5,3 scheme was supported in DOS 3.1-3.2.1 (1978-79).  The 6,2 scheme is supported in DOS 3.3 and ProDOS (1980-2018).  The wikipedia article on Group Code Recording describes exactly how bytes are encoded in the GCR schemes used by Apple, Commodore and others.  However, for purposes of this article, know that the 5,3 GCR scheme can have 32 distinct byte values and the 6,2 GCR scheme can have 64 distinct byte values with meet the "rules" of Wozniak's GCR implementations.  

Disks formatted under the 5,3 scheme under DOS 3.1-3.2.1 support 13 sectors per track, Disks formatted under the 6,2 scheme used for DOS 3.3 and ProDOS support 16 sectors per track. The number of sectors made available under DOS 3.1-3.2.1 are 455 and the number for DOS 3.3 and ProDOS are 560.  Given 256-byte sectors, this gives a disk size of 116,480 bytes for 13-sector disks and 143,360 bytes for 16-sector disks.

The structure of each sector consists of an Address Field followed by a Data Field with a variable number of Gap bytes before and after each Field.

An Address Field consists of a Prologue, Volume, Track, Sector IDs, a Checksum and an Epilogue.  The Prologue for a 6,2 GCR disk is always a three byte sequence D5 AA 96.  The Epilogue is usually DE AA EB.  Thus an Address field looks like this :

D5 AA 96   XX YY   XX YY   XX YY    XX YY     DE AA EB

Prologue   Volume  Track   Sector   Checksum  Epilogue

ProDOS does not use Volume numbers, but they are essential for DOS 3.x disks.  The Volume, Track, Sector and Checksum bytes are encoded with 4,4 GCR encoding, which gives one useful byte for every two bytes in the field.  GCR 4,4 encoded bytes only have 16 distinct byte values which meet Wozniak's rules that only 0 bit may fall between a 1 bit and all bytes must start with a 1 bit.

A Data Field contains a Prologue (always D5 AA AD), 342 bytes of data, a checksum and an Epilogue (usually DE AA EB).  Thus it looks like this :

D5 AA AD   XX . . . . . . YY   XX        DE AA EB

Prologue   342 bytes of Data   Checksum  Epilogue

342 bytes of data encoded in the GCR 6,2 format gives 256 of actual data.  In other words, read is read off a disk in 6-bit bytes and converted by DOS or ProDOS into 8-bit bytes.  Note that bytes D5 and AA are always reserved by DOS and should never be found within the 342 bytes of Data within a Data Field.  They are how the Disk II, which has no sector index sensor, finds sectors on a track.  When an Epilogue is written, the result of the final byte is usually EB but it may be something slightly different because the byte is incompletely written.  

The Gap before sector 0 is a little longer than other gaps, usually around 40-95 bytes of FF.  The Gap between an Address and a Data Field is much shorter, about 5-10 bytes of FFs.  The Gap between the Data Field of a previous sector and the Address Field of the next sector is a little longer, about 14-24 bytes of FFs. The Gap bytes give the computer time to process the Address and Data fields and account for the differences in writing speeds between drives.  To ensure self-synchronization, at least the first four FFs are written as 10-bit/40-cycle FFs before every address and data field.  Other bytes require 32 cycles to read or write, except for the EB of the address field epilogue, which is not completely written.

In the earlier GCR 5,3 decoding of DOS 3.1 and 3.2/3.2.1, it required 410 bytes of data in a Data Field to represent 256 bytes of actual data.  In this scheme 5-bit encoded bytes are converted into 8-bit usable bytes.  The Address Field Prologue for these disks is D5 AA B5, so DOS 3.3 will not find the Address Fields for DOS 3.2 disks and vice versa.  Seven 36-cycle FFs are used at the beginning of evert address and data field as self-sync bytes. With this scheme, the EB at the end of the epilogue of the data field is incompletely written.  Otherwise, DOS 3.1-3.2.1 and DOS 3.3 use identical markers and encode the address field in GCR 4,4.  

13-sector GCR 5.3 disks were the only disks being authored in 1978 and 1979 and even when DOS 3.3 and 16-sector GCR 6,2 came out in 1980 not everybody transitioned to the newer format overnight.  You will see many disks from 1980 and even into 1981 either authored in GCR 5,3 or have dual 13/16 sector bootstraps on the disk to accommodate for either kind of Disk II Controller Card.  Later software may use 13 sector tracks for copy protection.  

The File Systems of DOS and ProDOS

The Disk II Drive can reliably read 35 tracks per side. Disks containing more tracks (up to 40-41) may be supported with third party drives, but as they were non-standard commercial software stuck with 35 tracks (and occasionally a 36th track for copy protection, most drives could manage that).  

DOS and ProDOS do not read sectors in sequential order on the track, the 6502 in the Apple II cannot decode data instantly, so sectors are read about track to make disk reading more efficient.

Assuming sectors are numbered from 0-15 and laid out sequentially on a track, DOS 3.3 reads them from the track like this :

0, 7, 14, 6, 13, 5, 12, 4, 11, 3, 10, 2, 9, 1, 8, 15

As you can see, the idea is to have about almost a full track's rotation between consecutively numbered tracks.

Apple PASCAL and CP/M read their disks differently, but they are rarely used these days.  

ProDOS does not deal with 256-byte "sectors", it deals with 512-byte "blocks."  Every 5.25" disk consists of 280 Blocks in ProDOS.  Every block identifies two sectors on a track.  ProDOS interleaves blocks on a track so that there is a one sector gap between two halves of block.  The result is improved reading performance So for Track 0 you get the following :

Block #    Physical Sectors Read/Written

Block 0 = Sectors 0 & 2

Block 1 = Sectors 4 & 6

Block 2 = Sectors 8 & 10

Block 3 = Sectors 12 & 14

Block 4 = Sectors 1 & 3

Block 5 = Sectors 5 & 7

Block 6 = Sectors 7 & 9

Block 7 = Sectors 11 & 13

In a floppy formatted with DOS 3.x's INIT command, tracks 0-2 and 17 are taken up by DOS for the bootstrap, implementing the commands and RWTS and organizing and locating files.  Track 17, Sector 0 is used by the Volume Table of Contents (VTOC).  VTOC gives basic parameters, it identifies the OS used, whether the disk is 13 or 16 sectors, how many tracks the disks has, where the catalog sectors begin and a bit map of free sectors in each track.  The catalog sectors occupy the remaining sectors of track 17.  Each catalog sector identifies 7 files on the disk for a total of 105 files.  Each file entry in the catalog sector gives the file's name (up to 30 characters), its type (usually T, B, A, I), the track and sector where the first track/sector list begins and the number of sectors it occupies.  As track 17 is at the middle of the disk, you can save time moving the head whenever you need to access the the VTOC and catalog sectors.  The track/sector list identifies each track and sector the file uses, files do not have to be stored in sequential sectors. 

In ProDOS, which was designed to be a universal storage format, everything it considered in terms of blocks.  It stores its boot loader in Blocks 0 & 1 and the Volume Directory is stored in Blocks 2-5 and the Volume Bit Map in Block 6.  ProDOS takes up almost all of track 0 on a disk for the OS.  The Volume Bit Map assigns one bit for each block to determine whether it is free or in use, and with 4096 btis in a block, it has ample room to indicate the status of 280 blocks on a 5.25" disk.  If a hard drive is being used, the Volume Bit Map will take up more than one block.  

Of the Volume Directory blocks, Block 2 can store the information for 12 files and Blocks 3-5 can store 13 files each for a total of 51 defined files on a disk.  The Volume Directory Header which starts Block 2 identifies the Volume Name, when the Volume was created, the version of ProDOS used, the File Count, and the total blocks available on the Volume.  Each file and subdirectory has a File Descriptive Entry which gives the file name, file type, its creation date and time, the first block used and the number of blocks used.  ProDOS has different handling for "seedling" (512 bytes or less), "sapling"(more than 512 bytes but less than or equal to 128KiB) and "tree" (more than 128KiB up to 16MiB) file sizes.  This way ProDOS can handle larger files with an index block which will identify free blocks for use and use them as they appear on the disk.  When a file is too large to be represented in a single index block, a master index block is created which points to the index blocks which identifies the blocks each file uses.

ProDOS introduced subdirectories to the Apple II storage world, and while not  the most common thing you will find on 5.25" disks, on 3.5" disks and hard drives they will be much more common.  Subdirectories, which can have multiple levels of subdirectories, permit the user to have more than 51 files on a disk.  A subdirectory header is similar to the Volume Directory header, but has a different identifier and has fields which point to the parent directory.  Subdirectories reference File Descriptive Entries.

.a2r and .woz Formats

.a2r is result when a floppy disk is read at the flux transition level.  As the number of flux transitions is larger than the bits of useful data stored on a disk, the size of the disk is much larger than a pure sector dump.  .a2r can hold data from multiple reads of a track, which is very useful considering the age of these disks and drives and also for certain copy protection schemes which may have written a track faster or slower than the standard 300rpm of a Disk II drive and rely on that speed difference.  An .a2r file for a standard 140KiB disk is usually around 21MiB in size.  

.a2r files reflect what was read from a floppy disk and they are unsuitable for real-time emulation.  In essence they are "too raw" to be usable, just as kyroflux .raw stream files are.  The .woz format is one method to turn the raw data into something an emulator can use.  .woz files represent the nibblized bit stream of a floppy disk, in other words the .woz file reflects what was written to the disk.

.woz was not the first disk image format to support bitcell data as opposed to byte size sector data (.dsk, .do, .po images).  The mainstay for 2000-2012 was the .nib format.  The .nib format held all the bitcell data from a disk and could defeat less sophisticated protection schemes which looked for data outside the sector data, like the Disk's Volume ID (it need not always be 254).  But it held no track synchronization or bitcell timing information and there was no info or metadata to tell an emulator how the disk worked, so even .nib images would not defeat sophisticated protection routines.  

.edd was a precusor of sorts to .woz.  It started life as the Essential Data Duplicator (EDD), a hardware copying product that functioned like an Apple II version of the PC's Option Board.  A good overview of the types of copy protection Apple II software used can be found in Appendix D of the Essential Data Duplicator 4 manual.  The EDD was very powerful for its time and was adopted by the Apple II community in 2012.  When 4am started cracking back in 2014, he used EDD images which had been created with the "i'm FEDD up" program.  .a2r files are created using the Applesauce hardware, which supports a sync sensor upgrade to a Disk II drive.  The sync sensor upgrade allows for much more accurate imaging than the EDD could provide.  The Applesauce hardware is box with a Teensy 3.5 inside it and some support circuitry to interface between a Disk II drive and a modern PC or Mac via a USB cable.  

When the .a2r and .woz formats were finalized in 2018, they were a culmination of forty years of handling disks.  These files use a "chunk format" with INFO and META blocks which provide necessary information about how the image works and useful information about who made the program and the dump.  The .a2r format is now in version 2.0.1 and can support not only 140KiB 5.25" disks but the various 3.5" formats used by the //gs, the //c+ and early Macintoshes.  The .woz format is now in version 2.0 and has better support for 13-sector images than .woz 1.0 did.

How to Find and Make .woz Files

The A Woz A Day archive on Internet Archive is an admirable project intending to preserve as many original images as possible.  4am also includes .woz or .edd files of his cracks in his 4am Collection in the "extras" zip.  However, one person (4am) can only do so much and I can trust 4am not to place bad .woz images in his A Woz A Day archive, but I accord that trust status to few others.  This archive, which I believe was converted by Jason Scott, may fill many holes from A Woz A Day, but without the .a2r or .edd files the soundness of the .woz cannot be verified.  There are many, many important games and versions of games still needing to be converted to .woz, but nothing good in life comes easy.  

Finding a .woz  image file does not mean you have actually have a working disk or even a properly preserved disk.  The disk may have been damaged but the woz metadata may not tell you that.  The disk may have been imaged from a backup copy or a cracked copy instead of an original disk.  

In order to make a .woz  file, you must start with an .a2r file or an .edd file.  There are over a hundred archives with .a2r files on the Internet Archive, so you likely do not lack for raw material for all but the oldest and most obscure software titles.  There are two ways to turn an .a2r file into a .woz file.  The first method is to use the Python 3 script passport.py, written in part by 4am.  The passport.py program can also convert .edd files into WOZ files.  Use the command passport.py convert "file name" to convert a disk image to WOZ.  If passport.py recognizes the file system and the protection, then it should convert, but if it does not it will not convert.  This method probably only has a 50% success rate.

Another method to convert WOZ files, and this method is far more reliable, is to use Applesauce's software.  Applesauce's main function is to handle imaging, but it has a batch .a2r to .woz  file converter utility which can convert one or many .a2rs by dragging and dropping the files onto the utility's window.  Applesauce will convert each file and note and log any oddities or errors it encountered in the conversion process.  Disks without errors tend to be converted in a minute or two, disks with errors or unusual formatting can take much longer, and Applesauce will eventually give up if trying to deal with a track or a sector if too many errors are encountered.

However, before you run out and start a project to convert every .a2r on the Internet, know that both passport.py and Applesauce both have rather significant technical hurdles to run these programs.  Passport requires installing Python3 and some felicity with a command line.  Applesauce's software is only available for macOS, so if you are running Windows or Linux you must run macOS within a Virtual Machine.  I managed it, just barely, and probably more by luck than anything else by following this guide very carefully.  Of course if you run macOS on Apple hardware, then you are good to go.  

When I had a working installation I had to give the Virtual Machine exclusive control over a USB stick for storage that both it and my Windows OS could access.  Essentially the Virtual Machine would lock out Windows when it needed to use the USB stick and when done I would unlock the stick and control over it would return to Windows.  I am sure there are more convenient methods but this one worked for me.  Additionally, in order to drag windows and files within the macOS desktop, I had to give Virtual Box exclusive control of a USB mouse.  And on my system the OS runs very sluggish, but it gets the job done.  

Before you start installing Python 3, you should be aware that passport.py has a module dependency.  Passport.py relies on Wozardry.py, a script also authored in part by 4am.  Wozardry relies on a Python 3 module called "bitarray", and installing it was even more difficult for me than setting up the macOS Virtual Machine.  Of course if you are an old hand with Python or are running Linux, installing a module is probably as easy for you as passing wind.

I cannot urge people enough that when you image .a2r files, image them in small batches and pay attention to the Applesauce log.  Then test those images in an emulator like AppleWin or microM8 or Virtual II (on a Mac).  Also, if you have a Floppy Emu or a wDrive, try the image in them (anti-m comes in handy for games which only run with II or II+ ROMs, require 13-sector firmware or will not work with a 65C02 CPU).  Make sure your emulator supports writing to .woz files.  A good image may work in one emulator but not in another.  Wozardry can be used to disable write protection in a .woz file.  

It is quite conceivable you will run into an image which may confound every emulator even though Applesauce reports no errors in the conversion.  It is also possible that the automatic .woz conversion will not produce a working image and the .a2r file will need to be tweaked in order for the file to pass the protection.  Also, not every error message is fatal, which is why I have explained all about the structure of common Apple II Disk Encoding and File Systems above so you may be able to make sense out of the warnings Applesauce gives to determine whether the warning is likely to be an issue or not.

My Humble Efforts

Using Applesauce and passport.py I have made over 100 .woz images since last September.  I have endeavored to fill holes not yet covered by the A Woz A Day archive.  I identify the title I want to preserve, download all the flux images I can find on Internet Archive for it, then run the images through Applesauce or passport.py.  Once the program is done, I test those images with no errors reported.  If there is only one flux image of a game available and some minor errors were reported, then I try the image anyway to see if it is usable.  When all is done I rename the files as best I can to reflect the original nomenclature of the disks ("Side A" "Side 1" "Program" "Boot" etc.) without being overly wordy.  As a friend of mine once said "A filename is not a database."  

I have preserved many games not otherwise available in the .woz format and particular versions not available in .woz (Ultima II On-line Systems' version).  Due to the difficulties I have described above, I understand that these .woz files will beyond the reach of many who might otherwise enjoy them.  What is the point of preservation if the data is locked away in an unusable format?  I have no desire to be selfish, so I have created an archive on the Internet Archive.  All the fluxes I have personally converted to .woz files are there.  As I convert more .woz I should add them to the archive if I can get them working on some emulator.

I have concentrated on games which interest me, namely those which came in "big boxes", support the Mockingboard, came in zip-lock bags from individuals and companies which went on to greater success (Richard Garriott, On-line Systems), were historically significant or popular (Oregon Trail, Infocom Text Adventures) or were original to the Apple II platform and were widely ported (Wasteland).  To be clear I do not have the Applesauce hardware or a single original Apple II disk.  I took other people's images and converted them, but the point of my archive is to ensure the .woz files work.  I do not claim to be the only individual who can make working .woz images, so I will not include any .woz files I did not personally create unless they cannot be located elsewhere on the Internet Archive.  

3 comments:

  1. Great writeup as usual, it's a shame Apple iis are not frequent en EU, it's a fantastic system to have in physical ! :D

    ReplyDelete
  2. I had an Apple IIe through high school and college, two disk drives. My parents heard about needing 2 drives, and they're completely non-technical, word of the need must have spread far and fast for them to know. Anyways, they are now waaaay to expensive on eBay. There are an amazing number of modern upgrades like WIFI, but again, all very expensive. I'm looking at other 8-bit computer, to relive someone else's childhood experiences ;) Maybe I'll get a Speccy, pretend I was a Brit kid!

    ReplyDelete
  3. I worked at Dysan where we duplicated floppy disks. I wrote a duplication program that would copy Apple 2s. I also invented a copy protection that inserted two zero bits between the sector header bytes D5 and AA. The format of the entire disk was otherwise normal. A check program would detect the two zero bits returning a pass or fail. It simply did the normal LDA BPL loop until it found the D5 then it would delay a few clocks and do a LDA from the floppy controller. If the two zeros were there, it might read a 2A or 15. If not, it might read an AA or 55. When copied, the disk would not have the two zeros and the check would fail.
    One board-level disk copier required the user to enter hex code to copy a particular protected disk. Most had 5-20 bytes, mine needed about 120 bytes. If you made an error, it would crash. Customers were quite happy with the protection level.
    Only problem was, everybody hated me. The software users because they couldn't backup their floppies and the software vendors because it was never transparent enough and their software couldn't be backed up by the customer yet not pirated...

    ReplyDelete