Sunday, February 4, 2024

Lessons from Ancient File Systems

In my previous post, I mentioned that I found a number of oddities when digging through the details of various Atari 8-bit file systems.  I read through the specifications I could find online, and ran the actual code in emulators to verify and discover details when the specifications were unclear or incorrect.  There were some surprising finds.

I looked at:

  • Atari DOS 1.0
  • Atari DOS 2.0s
  • Atari DOS 2.0d
  • Atari DOS 2.5
  • MyDOS 4.5
  • Atari DOS 3
  • Atari DOS 4 (prototype, never released)
  • DOS XE
  • LiteDOS
  • Sparta DOS

Atari DOS 1.0 started it all off back in 1979.  It supported single-sided, single-density disks that consisted of 720 sectors holding 128 bytes each.  It has some bugs and implementation limitations, so DOS 2.0s ('s' for single density) soon replaced it, making some key changes.  Soon other DOS versions started appearing, some trying to maintain backwards compatibility, and others trying completely new approaches.  I won't go into the technical details, but I do want to highlight what I think are the most interesting design decisions.

Keep in mind that DOS 1 was designed when they were planning on selling the base Atari 800 with only 8K of RAM.  (The 400 would have been only 4K, but marketed with a cassette drive instead.)  So the design minimized the need for additional sector buffers.  Hence the strategy for files was to use the last few bytes of each sector as metadata, including the link to the next sector in the file.

Atari used 6 bits at the end of each sector to hold a file number.  This was the index of the file in the directory.  (The other two bits and another byte together pointed at the next sector of the file.)  This was presented in documentation as being there to detect file corruption, which seemed like a waste, as that sort of file corruption almost never happened.  But I'm now pretty sure that wasn't the reason the file number was there.  Instead, it was to support "note" and "point" commands.  Atari believed people would use files as a database, and with the sector chaining, it was impossible to jump around in a file without reading it linearly.  So a program could "note" its position in a file (sector and offset) and then return there later with the point command.  This is where verifying that the file number in the user-supplied sector number matched the entry used when opening the file came in.  Without that number, there would be no verification that a "point" would end up in the right file.

In DOS 1, the last byte of each sector was a sector sequence number with the high bit cleared.  This was a nice idea, and would have been useful if someone were writing a tool to undelete files.  If the high bit were set, it was the last sector of the file, and the low bits were the number of data bytes in the sector.  In DOS 2, the last byte was instead always the number of data bytes in the sector.  This would seem to have made the code simpler, but actually the opposite, as DOS 2 also included compatibility code for reading DOS 1 files (as did DOS 2.5).  The real reason is that allowed partially filled sectors in the middle of files, which happened when a file was opened for append, as it would start with an empty write buffer instead of filling it with the data from the last sector.  (DOS 1 didn't have an append option at all.)  I would have thought it would have been simpler to implement append without partial sectors and avoid needing compatibility code, but apparently not.

Several less-popular versions of DOS didn't support zero-length files.  If you opened a new file for writing and closed it without first writing anything, the file was deleted.  That seems crazy today, but at the time, that was not one of the quirks people complained about; mostly those DOS versions were unpopular because of incompatibility and issues like wasted space from internal fragmentation.

Only two versions of DOS supported time stamps on files, DOS XE and Sparta DOS.  Nobody wanted to type in the date and time on every boot, and a battery-backed real-time clock wasn't a standard feature of any of the 8-bit computers that I'm aware of (certainly not the Atari).

Several versions of DOS supported subdirectories.  This was mostly for versions that supported larger disks, like MyDOS and SpartaDOS.

One quirk of Atari DOS (1, 2, and 4) is putting the directory and free sector bitmap table on a track in the middle of the disk.  The theory was that this would minimize seek time between reading the directory and reading a file.  Not a horrible idea, though it was horribly mangled by an off-by-one error because sector numbers on the drive were 1-based instead of 0-based as the original coders believed, so there was a seek between the sector bitmaps and the directory.  They doubled down on this in DOS 4, where the location of the directory changes depending on the disk format, so larger disks have the directory at higher sector numbers.  There was a lot wrong with DOS 4; it probably would have been quite unpopular if it had made it past being a prototype without significant changes.

The biggest problem with the official Atari versions of DOS is that they never anticipated future needs.  I can forgive DOS 1, as it was developed on a short time frame for an 8K computer, and that locked in DOS 2 for compatibility reasons.  But DOS 3 was a disaster.  Atari had a new drive that could format 130K disks as well as the old 90K disks, so they needed a new DOS that would take advantage of the new space.  But they ignored other drives with 256-byte double-density sectors (180K) and the possibility of double-sided disks (360K) that were already on the market for other computers.  Better engineers would have designed something to grow and meet future needs, but instead they just implemented the minimum requirements.  So when Atari was later designing the 1450XLD computer which did have 360K drives, they had to have a new DOS again.  While that was never released, they did develop a DOS for it, which again, only handled the specific formats that that drive supported, with no thoughts for the future.

On the other hand, the most successful DOS versions for the retro computing Atari community now are precisely the ones that could scale to any drive, namely MyDOS and SpartaDOS.  MyDOS retained DOS 2 compatibility, but extended it to support upto 16MB drives and adding subdirectories.  SpartaDOS dropped compatibility, but was able to support 512-byte sectors, allowing for 32MB drive support.  (The Atari's I/O protocols limited the system to a 16-bit field for sector numbers.)

So the big lesson here is to always plan for the future.  Listen to the requirements for the current product, but then design with the assumption that you'll be asked to expand the requirements in the future.  If you don't, users may be cursing you when the code is released.  But who knows?  If you do it right, people may still be using your code in 40 years.

A User-Space File System for Fun and History

I got my start with computers as a kid with an Atari 800 way back when.  When I'm feeling nostalgic, I still enjoy pulling it out and playing with it.  These days, I'm more likely to pull out an emulator than the real thing, as they're now so precise that it's very difficult to find a difference, and it eliminates all the hassles involved with disks that were slow and may have failed with age.  That means I need tools for manipulating disk images.

So there I was...  I was posting at AtariAge about a tool I wrote to extract the files from a disk image, and another to create an image containing a set of files, and someone remarked that they were surprised that nobody had created a Linux kernel module to mount the disk image files as a file system.  That was a dangerous comment.  Writing a Linux kernel module to support Atari file systems is certainly possible, but would be a lot of work.  But there's another option, which is to write a file system in user space using a package called "FUSE."

I've written a file system using FUSE before, and it's far less complicated than writing one in kernel space.  Besides making debugging vastly simpler, you can also just skip lots of things and the library will take care of it for you.  And with Atari file systems, there are, of course, many modern features that just aren't supported.  And another big advantage of FUSE is that it works on MacOS as well as Linux, so it makes the project available to a wider audience.  (It also runs on the Windows Subsystem for Linux.)

But note that I said Atari file systems, not file system.  Depending on how you count, I found at least seven or perhaps ten different file systems.  (Several are variations, so they can share most of the same code.)  This means the first task is to detect which file system a disk image is using, if any (as many programs just read disk sectors directly and didn't use a file system at all).  

So the first step is to have sanity checking routines to determine if a disk image is consistent for each file system.  Then implement the key file system features for each DOS version.  The key functions that had to be implemented are reading a directory, getting file attributes, reading a file, and writing a file.  Most other features are trivial or unsupported for most file systems, but I implemented as much as possible.  For example, files could be locked, which I interpreted as the write-permission bit for the file owner.  Some file systems did support time stamps on files, but for most, I just copied the time stamp on the disk image file.

But once it was working, it was quite simple to add more features.  So I made files like ".sector361" that would contain the raw bytes of sector 361.  The file doesn't have to exist in the directory, but as long as it's supported by the get attributes, read, and possibly write functions, it will work just fine.  This feature was also very helpful in debugging the file systems, as I could look at a hexdump of raw sectors when something wasn't behaving as expected, or when I had to reverse engineer a file system with incomplete specifications.

Of course, every time I finished a file system or new feature, someone on AtariAge would find yet another for me to look at.  That was really half the fun of it.  I discovered a number of file systems used on the Atari that I hadn't encountered before, and there were some weird ones.  The oddities of those file systems probably deserves its own blog post.

But once I had it pretty solid, there was another wrinkle.  There's a special "Atari Partition Table" format for hard disks that's used with some newer add-ons, including flash card readers that people now use with their old computers.  This presented two major problems.  First, the documentation for the APT images, while detailed, had points that were confusing or ambiguitiies, and included many options that aren't actually used.  This required getting sample images from users, as I didn't use it myself.  Second, this meant having a number of different file systems mounted in subdirectories.  That required refactoring the code, including adding a middle layer to call into.  Sometimes adding an extra layer of indirection lets you do magic.

Unfortunately, this doesn't work for all file systems in an APT partition, as the file system code uses a memory map of the image file, and with some APT options, the disk image is no longer a straight linear stream of bytes.  This has to do with fitting smaller sector sizes into 512-byte sectors when sometimes you care a lot more about efficient code than efficient space.  But since I can export the image of the contents of each partition, even if the bytes have to be scrambled by the read and write functions, that image can then be opened separately by a new instance of my program, creating a new mount point.  A bit awkward, but it works.

So that was a fun project.  While this had no commercial value, it was interesting to look at ancient file systems and see how they did things, as well as to play around with FUSE.

So here is the code if you're interested:

https://github.com/pcrow/atari_8bit_utils/tree/main/atrfs

Now I suppose what's needed next is a wrapper to take the code for a kernel file system, and run it under FUSE.  That would allow you to develop a kernel file system with all the convenience of user-space tools, but you could then build the code as a kernel module once it's working.  That's a project for another day.