Disks: The Alto File System




This document describes the disk formats used in the Alto  File System.
It also describes  a "disk object," a  Bcpl software construct  that is
used to interface low-level  disk drivers with packages  that implement
higher-level objects, such as streams.

The primary focus  of the description will  be for the  "standard" Alto
disks: either (1) up to 2 Diablo Model 31 disk drives or (2) one Diablo
Model 44 disk drive.  The low-level drivers for these disks  are called
"Bfs" (Basic File  System).  With minor modifications,  the description
below  applies to  the Trident  Model T80  and T300  disk  drives, when
formatted  for  Alto  file  system  conventions.   The  differences are
flagged with the string  [Trident].  Low-level drivers for  the Trident
disks are called "Tfs."



1. Distribution


Relocatable binary files for the BFS are kept in  BFSBrs.dm.  The
sources,  command  files, and  test  program (described  later  in this
document)  are  kept  in  BFSSources.dm  Relocatable binary
files  for  the TFS  are  kept  in TFS.dm;  sources  are  kept on
TFSSources.dm.



2. File and Disk Structure


This section describes  the conventions of  the Alto file  system.  The
files AltoFileSys.D and Bfs.D contain Bcpl structure  declarations that
correspond to this description ([Trident]: See also "Tfs.D").

The unit  of transfer between  disk and memory,  and hence that  of the
file system, is  the disk sector.  Each  sector has three fields:  a 2-
word header, an  8-word label, and  a 256-word data  page.  ([Trident]:
The fields are a 2-word  header, a 10-word label, and a  1024-word data
page.)

A sector is identified by a  disk address; there are two kinds  of disk
addresses, real  and virtual.   The hardware  deals in  real addresses,
which have a somewhat arbitrary format.  An unfortunate  consequence is
that the real addresses for all the pages on a disk unit are  sparse in
the set of 16 bit integers.  To correct this defect,  virtual addresses
have been introduced.  They have the property that the pages of  a disk
unit  which  holds  n  pages  have  virtual  addresses  0   ...  (n-1).
Furthermore,  the ordering  of pages  by virtual  address is  such that
successive pages  in the  virtual space are  usually sequential  on the
disk.   As  a result,  assigning  a sequence  of  pages  to consecutive
virtual  addresses will  ensure that  they can  be read  in as  fast as
possible.

                             ------------
                   Copyright Xerox Corporation 1982


Disks & Bfs                 April 10, 1982                            2




2.1. Legal Alto Files

An  Alto  file  is  a  data  structure  that  contains  two   sorts  of
information: some is  mandatory, and is  required for all  legal files;
the  remainder  is  "hints".  Programs  that  operate  on  files should
endeavor to  keep the hints  accurate, but should  never depend  on the
accuracy of a hint.

A legal Alto file  consists of a sequence  of pages held together  by a
doubly-linked list  recorded in the  label fields. Each  label contains
the mandatory information:

   The forward and backward links, recorded as real disk addresses.

   A page  number which  gives the position  of the  page in  the file;
   pages are numbered from 0.

   A count of the number of characters of data in the  page (numchars).
   This may range from  0 (for a completely  empty page) to 512  (for a
   completely  full  page).  ([Trident]:  A  full  page  contains  2048
   characters.)

   A real  file id,  which is  a three-word  unique identifier  for the
   file.   The  user normally  deals  with virtual  file  ids  (see the
   discussion  of  file  pointers,  below),  which   are  automatically
   converted into real file ids when a label is needed.

Three bits in the file id deserve special mention:

   Directory: This bit  is on if the  file is itself a  directory file.
   This information is  used by the disk  Scavenger when trying  to re-
   build a damaged disk data structure.

   Random: This bit is currently unused.

   NoLog:  This bit  is no  longer used,  but many  existing  files are
   likely to have it set.

Leader Page: Page 0 of a file is called the leader page; it contains no
file data, but only a  collection of file properties, all of  which are
hints.   The structure  LD in  AltoFileSys.D declares  the format  of a
leader page, which contains the following standard items:

     The file name, a hint so that the Scavenger can enter this file in
     a directory if it is not already in one.

     The times for creation,  last read and last write,  interpreted as
     follows:

          A  file's  creation  date  is  a  stamp  generated  when  the
          information in the  file is created.   When a file  is copied
          (without modification),  the creation  date should  be copied
          with it.  When a file is modified in any way (either in-place
          or  as  a  result  of  being  overwritten   by  newly-created
          information), a new creation date should be generated.

          A  file's  write  date  is  updated  whenever  that  file  is
          physically written on a given file system.


Disks & Bfs                 April 10, 1982                            3




          A  file's  read  date  is  updated  whenever  that   file  is
          physically read from within a given file system.

     A pointer  to the  directory in which  the file  is thought  to be
     entered (zeroes imply the system directory SysDir).

     A "hint" describing the last page of the file.

     A "consecutive" bit which is a hint that the pages of the file lie
     at consecutive virtual disk addresses.

     The changeSerial  field related to  version numbering:  whenever a
     new version of a file "foo" is made, the changeSerial field of all
     other files "foo" (i.e., older versions) is incremented.   Thus, a
     program that wishes  to be sure that  it is using the  most recent
     version of a  file can verify  that changeSerial=0.  If  a program
     keeps an  FP as  a hint  for a  file, and  is concerned  about the
     relative position of that file in the list of version  numbers, it
     can  also keep  and  verify the  changeSerial entry  of  the file.
     Version numbers have been deimplemented.

These standard  items use up  about 40 words  of the leader  page.  The
remaining space  is available for  storing other information  in blocks
which start with a one  word header containing type and  length fields.
A  zero  terminates the  list.   The structure  FPROP  in AltoFileSys.d
defines the header format.  The  only standard use of this  facility is
to record the logical shape of the disk in the leader page of SysDir.

Data: The first data byte of a file is the first byte of page 1.

In a legal file with n pages, the label field of page i must contain:

   A next link with the real disk address of page (i+1), or 0 if i=n-1.

   A previous link with  the real disk address  of page (i-1), or  0 if
   i=0.

   A page number between 0 and (n-1), inclusive.

   A numchars word  = 512 if  i>FP.serialNumber.word1
   portion of the file  pointer.  This allows the directory  and random
   bits to be set in the file id.

   If useOldFp is  true, then filePtr already  points to a  legal file;
   the purpose of calling CreateDiskFile is to re-write all  the labels
   of  the  existing  file  with the  new  serial  number,  and  to re-
   initialize the leader page.  The data contents of the  original file
   are  lost. Note  that this  process effectively  "deletes"  the file
   described by filePtr when CreateDiskFile is called, and makes  a new
   file; the FP for the new file is returned in filePtr.

   If pageBuf is supplied, it is written on the leader page of  the new
   file  after setting  the  creation date  and directory  FP  hint (if
   supplied).  If pageBuf is omitted, a minimal leader page is created.

DeleteDiskPages(disk, CA, firstDA, filePtr, firstPage, newFp,
    hintLastPage)

Arguments beyond firstPage are optional.  Deletes the pages of  a file,
starting with the page whose number is firstPage and whose disk address
is  firstDA.  CA  is  a page-sized  buffer  which is  clobbered  by the
routine.  hintLastPage is as described under ActOnDiskPages.

If  newFp  is supplied  and  nonzero, it  (rather  than  freePageFp) is
installed as the FP of the file, and the pages are not deallocated.



6. Allocating Disk Space


The  disk class  also contains  routines for  allocating space  and for
converting between  virtual and  real disk  addresses.  In  most cases,
users need not call these routines directly, as the four routines given
above (ActOnDiskPages, WriteDiskPages, DeleteDiskPages, CreateDiskFile)
manage disk addresses and disk space internally.

AssignDiskPage(disk, virtualDA, nil)  returns the virtual  disk address
of the first free page following virtualDA, according to the bit table,


Disks & Bfs                 April 10, 1982                           11




and sets the corresponding bit.   It does not do any checking  that the
page is actually free (but WriteDiskPages does).  If there are  no free
pages it returns -1.  If it is called with three arguments,  it returns
true if (virtualDA+1) is available without assigning it.

If virtualDA is  eofDA, AssignDiskPage makes a  free-choice assignment.
The disk object remembers the virtual DA of the last page  assigned and
uses it as the first page to attempt to assign next time AssignDiskPage
is called with a virtualDA of  eofDA.  This means that you can  force a
file to be created starting at a particular virtual address by means of
the following strategy:

        ReleaseDiskPage(disk, AssignDiskPage(disk, desiredVDA-1))
        CreateDiskFile(disk, ...)  // or whatever (e.g., OpenFile)

ReleaseDiskPage(disk,  virtualDA) marks  the page  as free  in  the bit
table.  It  does not  write anything on  the disk  (but DeleteDiskPages
does).

VirtualDiskDA(disk, lvRealDA) returns the virtual disk address, given a
real disk address  in rv lvRealDA.   (The address, lvRealDA,  is passed
because  a  real  disk  address may  occupy  more  than  1  word.) This
procedure returns eofDA if the real disk address is zero (end-of-file),
and fillInDA if  the real disk address  does not correspond to  a legal
virtual disk address in this file system.

RealDiskDA(disk, virtualDA,  lvRealDA) computes  the real  disk address
and stores it in rv lvRealDA.  The function returns true if the virtual
disk address is legal, i.e. within the bounds of disk addresses for the
given "disk." Otherwise, it returns false.



7. Lower-level Disk Access


The transfer routines described  previously have the property  that all
disk activity occurs  during calls to  the routines; the  routines wait
for  the  requested  disk  transfers  to  complete   before  returning.
Consequently,  disk transfers  cannot conveniently  be  overlapped with
computation, and the number of pages transferred consecutively  at full
disk speed is generally limited by the number of buffers that  a caller
is able to supply in a single call.

It is also possible to use the disk routines at a lower level  in order
to overlap transfers with computation and to transfer pages at the full
speed of the disk (assuming the file is consecutively allocated  on the
disk and the amount of computation per page is kept  relatively small).
The  necessary  generic  disk  operations  and  other  information  are
available to permit callers to operate the low-level disk routines in a
device-independent fashion for most applications.

This level  makes used  of a Command  Block Zone  (CBZ), part  of whose
structure is public  and defined in Disks.d,  and the rest of  which is
private to the implementation.  The  general idea is that a CBZ  is set
up with empty disk command blocks in it.  A free block is obtained from
the CBZ with GetDiskCb and  sent to the disk with  DoDiskCommand.  When
it is sent  to the disk,  it is also put  on the queue  which GetDiskCb
uses,  but GetDiskCb  waits until  the disk  is done  with  the command
before returning it, and also checks for errors.


Disks & Bfs                 April 10, 1982                           12




If you plan to use these routines, read the code for  ActOnDiskPages to
find out  how they are  intended to  be called.  An  example of  use of
these  routines in  a disk-independent  fashion (i.e.,  using  only the
public  definitions in  Disks.d) may  be found  in  the DiskStreamsScan
module of the Operating System.  Only in unusual applications should it
be necessary to make use of the implementation-dependent information in
Bfs.d or Tfs.d.

InitializeDiskCBZ(disk, cbz, firstPage, length, retry, lvErrorRoutine).
CBZ is  the address of  a block of  length words which  can be  used to
store CBs.  It takes at least three CBs to run the disk at  full speed;
the disk object contains the values DSK.lengthCBZ (fixed  overhead) and
DSK.lengthCB (size of each command block) which may be used  to compute
the   required   length   (that  is,   length   should   be   at  least
lengthCBZ+3*lengthCB).  FirstPage is used to initialize the currentPage
field  of the  cbz.  Retry  is a  label used  for an  error  return, as
described below.  lvErrorRoutine is an error routine  for unrecoverable
errors, described below; it  defaults to a routine that  simply invokes
SysErr.  The arguments  after firstPage can  be omitted if  an existing
CBZ is  being reinitialized,  and they will  remain unchanged  from the
previous initialization.

cb  =   GetDiskCb(disk,  cbz,   dontClear[false],  returnIfNoCB[false])
returns the next CB for the CBZ.  If the next CB is empty (i.e., it has
never been  passed to  DoDiskCommand), GetDiskCb  simply zeroes  it and
returns  it.  However,  if the  next CB  is still  on the  disk command
queue, GetDiskCb  waits until  the disk has  finished with  it.  Before
returning  a  CB, GetDiskCb  checks  for errors,  and  handles  them as
described below.  If  there is no  error, GetDiskCb updates  the nextDA
and    currentNumChars    cells     in    the    CBZ,     then    calls
cbz>>CBZ.cleanupRoutine(disk,  cb,  cbz).   Next,  unless  dontClear is
true, the CB  is zeroed. Finally,  the CB is  returned as the  value of
GetDiskCb.  If  returnIfNoCB is true,  GetDiskCb returns zero  if there
are no  CBs in  the CBZ or  the next  CB is still  on the  disk command
queue.

If the next CB has suffered an error, then GetDiskCb instead  takes the
following actions.  First  it increments cbz>>CBZ.errorCount.   If this
number is ge the value disk>>DSK.retryCount, GetDiskCb calls  the error
routine which was passed to InitializeDiskCBZ; the way this is  done is
explained in  the description of  ActOnDiskPages above.  (If  the error
routine  returns,  GetDiskCb  will  proceed  as  if  an   error  hadn't
occurred.) Otherwise, after doing  a restore on the disk  if errorCount
ge  disk>>DSK.retryCount/2,  it reinitializes  the  CBZ  with firstPage
equal to the page with the error, and returns to  cbz>>CBZ.retry (which
was initialized  by InitializeDiskCBZ)  instead of  returning normally.
The idea is that the code following the retry label will retry  all the
incomplete  commands,  starting  with  the  one  whose  page  number is
cbz>>CBZ.currentPage and whose disk address is cbz>>CBZ.errorDA.

DoDiskCommand(disk, cb,  CA, DA,  filePtr, pageNumber,  action, nextCb)
Constructs a  disk command  in cb  with data  address CA,  virtual disk
address DA, serial and version number taken from the virtual file id in
filePtr, page number taken from pageNumber, and disk  command specified
by action.  The nextCb  argument is optional; if supplied  and nonzero,
DoDiskCommand will "chain" the current CB's label address to nextCb, in
such a way that the DL.next word will fall into nextCb>>CB.diskAddress.

DoDiskCommand expects the  cb to be  zeroed, except that  the following


Disks & Bfs                 April 10, 1982                           13




fields  may  be preset;  if  they  are zero  the  indicated  default is
supplied:

       labelAddress      lv cb>>CB.label
       numChars          0


If DA eq fillInDA, the real disk address in the command is not set (the
caller should  have either set  it explicitly or  passed the CB  as the
nextCb  argument  for a  previous  command).  Actions  are  checked for
legality.

The public  cells in  the CBZ  most likely  to be  of interest  are the
following:

   client: information of the  caller's choosing (e.g., a pointer  to a
   related higher-level data structure such as a stream.)

   cleanupRoutine: the cleanup  routine called by  GetDiskCb (defaulted
   to Noop by InitializeDiskCBZ).

   currentPage: set to the firstPage argument of  InitializeDiskCBZ and
   not touched by the  other routines.  (Note, however,  that GetDiskCb
   calls  InitializeDiskCBZ when  a retry  is about  to occur,  so when
   control arrives at the retry  label, currentPage will be set  to the
   page number of the command that suffered the error.)

   errorDA: set by GetDiskCb to the virtual disk address of the command
   that suffered an error.

   nextDA: set  by GetDiskCb to  the virtual disk  address of  the page
   following the one whose CB is being returned.  (This  information is
   obtained from the  next pointer in  the current page's  label.  Note
   that errorDA  and nextDA are  actually the same  cell, but  they are
   used in non-conflicting circumstances.)

   currentNumChars: set by GetDiskCb to the numChars of the  page whose
   CB is being returned.

   head: points to the first CB on GetDiskCb's queue; contains  zero if
   the queue is empty.



8. Error Codes


The  following errors  are generated  by the  BFS.  Similar  errors are
generated by other instances of a disk object.

        1101    unrecoverable disk error
        1102    disk full
        1103    bad disk action
        1104    control block queues fouled up
        1105    attempt to create a file without creation ability
        1106    can't create an essential file during NewDisk
        1107    bit table problem during NewDisk
        1108    attempt to access nonexistant bit table page


Disks & Bfs                 April 10, 1982                           14




9. Implementation -- Bfs


The  implementation expects  a  structure BFSDSK  to be  passed  as the
"disk" argument to the routines.  The initial portion of this structure
is the standard DSK structure followed by a copy of  the DiskDescriptor
header and  finally some  private instance  data for  the disk  in use.
(Note: The Alto operating system maintains a static sysDisk that points
to such a structure for disk drive 0.)

Bfs ("Basic File  System") is the name  for a package of  routines that
implement the  disk class  for the standard  Alto disks  (either Diablo
Model 31 drives  or a single Diablo  Model 44 drive).   The definitions
(in addition to  those in AltoFileSys.D  and Disks.D) are  contained in
Bfs.D.   The code  comes  in two  "levels:"  a "base"  for  reading and
writing  existing  files  (implements  ActOnDiskPages,  RealDiskDA  and
VirtualDiskDA  only);  and  a  "write"  level  for  creating, deleting,
lengthening   and   shortening   files    (implements   WriteDiskPages,
CreateDiskFile, DeleteDiskPages, AssignDiskPage, ReleaseDiskPage).  The
source files  BfsBase.Bcpl, Dvec.Bcpl and  BfsMl.Asm comprise  the base
level;   files   BfsWrite.Bcpl   BfsCreate.bcpl,   BfsClose.bcpl,   and
BfsDDMgr.bcpl implement the write level.

BfsMakeFpFromLabel(fp, la)  constructs a  virtual file  id in  the file
pointer fp from the real file id in the label la.

disk  =  BFSInit(diskZone,  allocate[false],  driveNumber[0], ddMgr[0],
freshDisk[false],  tempZone[diskZone])   returns  a  disk   object  for
driveNumber or zero.   The permanent data  structures for the  disk are
allocated  from  diskZone;  temporary free  storage  needed  during the
initialization  process is  allocated  from tempZone.   If  allocate is
true,  the  machinery for  allocating  and deallocating  disk  space is
enabled.  If it is enabled, a small DDMgr object and a 256  word buffer
will be extracted  from diskZone in order  to buffer the bit  table.  A
single  DDMgr,  created  by calling  'ddMgr  =  CreateDDMgr(zone)', can
manage both disks.  If freshDisk  is true, BFSInit does not  attempt to
open and read the DiskDescriptor file.  This operation is essential for
creating a virgin file system.

success  =   BFSNewDisk(zone,  driveNum[0],   nDisks[number  spinning],
nTracks[physical size], dirLen[3000], nSectors[physical  size]) creates
a virgin Alto  file system on the  specified drive and returns  true if
successful.  The zone must be capable of supplying about 1000  words of
storage.  The logical size of the file system may be different from the
physical size of driveNum: it may span both disks (a  'double-disk file
system'), or it  may occupy fewer  tracks (a model  44 used as  a model
31).  The length in words of SysDir, the master directory, is specified
by dirLen.  Some machines  that emulate Altos implement 14  sectors per
track.

BFSExtendDisk(zone, disk, nDisks, nTracks) extends (i.e. adds pages to)
the filesystem on 'disk'.  Presumably 'nDisks' or 'nTracks' or  both is
bigger than the corresponding  parameters currently in disk.   A single
model 31 may be extended to a double model 31 or a single model 44 or a
double model  44, and a  single model  44 may be  extended to  a double
model 44.   The zone must  be capable of  supplying about 750  words of
storage.

0  = BFSClose(disk,  dontFree[false]) destroys  the disk  object  in an


Disks & Bfs                 April 10, 1982                           15




orderly  way.  If  dontFree is  true,  the ddMgr  for the  disk  is not
destroyed; presumably it is still in use by the other disk.  (Note that
this procedure is the one invoked by the CloseDisk generic operation.)

BFSWriteDiskDescriptor(disk) insures that any important state  saved in
memory is correctly written on the disk.

virtualDA  = BFSFindHole(disk,  nPages) attempts  to find  a contiguous
hole nPages long in disk.   It returns the virtual disk address  of the
first page of a hole if successful, else -1.

BFSTryDisk(drive, track, sector[0]) returns  true if a seek  command to
the specified track  on the specified  drive is successful.   Note that
the drive argument can contain an imbedded partition number.   Seeks to
track zero  will fail  if the  drive is  not on  line.  Seeks  to track
BFS31NTracks+1 will fail if the drive is a model 31.



10. Implementation -- Tfs


Operation and implementation of  the Trident T80 disks is  described in
separate documentation under  the heading "TFS/TFU" in  Alto Subsystems
documentation.



11. BFSTest


BFSTest is the test and utility program for the Basic File  System.  It
performs many of the same functions as TFU (in fact much of its code is
lifted from  TFU), except  that commands which  are better  provided by
other subsystems such as the Executive (copy, rename, delete, etc.) are
omitted.   It has  a conventional  command scanner  and  implements the
following commands.

ERASE formats one or more disks as an Alto file system.  It asks you to
     specify the number of  disks, cylinders and sectors.   Any sectors
     marked  "incorrigable" (by  the  CERTIFY command,  below)  are not
     included in  the file  system and will  never again  cause trouble
     unless the disk  is erased with DiEx.   The OS erase  command also
     preserves this bad spot information.

EXERCISE creates, deletes, reads,  writes and positions files  the same
     way that normal programs  do, and checks the results  which normal
     programs do not do.  These high-level operations cause patterns of
     disk commands  which are quite  different from those  generated by
     lower-level tests such as DiEx.

     Exercise assumes  that a  file system exists  on DP0  (and perhaps
     extends  on  to  DP1).   It  creates  as  many  test  files (named
     Test.001, Test.002, ...) as  will fit in the file  system, filling
     each file with a carefully chosen test pattern.  When it  is done,
     it  deletes all  of its  files.  One  'pass' consists  of stepping
     through the test files, performing a randomly chosen  operation on
     the file, and checking the results.  The duration  and throughness
     of  a pass  depends on  the  amount of  free space  on  the disks.


Disks & Bfs                 April 10, 1982                           16




     Ideally, the  disk(s) under  test have just  been erased  with the
     ERASE command, below.

     While running, exercise looks for commands from the keyboard.  The
     current commands are:


          Q Quit             Delete all test files and stop.
          S StopOnError      Wait until a character is typed.

     All test files are  100 pages long.  Each  page of a file  has the
     page number in its first and last words and a data pattern  in the
     middle 254 words.  The data pattern is constant throughout a file,
     consisting of  a single  one-bit in a  word of  zeros or  a single
     zero-bit  in a  word of  ones.  Files  are read  and  written with
     ReadBlock  and  WriteBlock  using buffers  whose  lengths  are not
     multiples of the page size.  The operations are:

          Write         Write the entire file with the data pattern.

          Read          Read the entire file checking the data pattern.

          Delete        Delete the file, create it again and then write
                        it.

          Copy          Copy  the file  to some  other  randomly chosen
                        file.   If  both disks  are  being  tested, one
                        third of  the time pick  a destination  file on
                        the other disk.

          Position      Position to twenty randomly chosen pages in the
                        file.  Check that the first word of the page is
                        indeed the page number.  One third of  the time
                        dirty the stream by writing the page  number in
                        the last word of the page.

CERTIFY tests  one or  more disks for  bad spots  and marks  such pages
     "incorrigable" so  that they  will not  be included  in subsequent
     file systems.   Ideally, a disk  should be certified  before being
     used.  Certify can be stopped at any time by hitting any key.

     A  bad spot  is  any sector  which  gives three  or  more checksum
     errors.  If the Scavenger encounters a bad spot, it will also mark
     the sector incorrigable.  Alto and Dorado disks almost  never have
     bad spots; but Dolphin  disks typically have a few,  so CERTIFYing
     is  a  must  for  trouble-free  operation.   Note  that  bad  spot
     information  will  be  lost  if a  certified  pack  is  written by
     COPYDISK or DIEX.

CREATEFILE attempts to  create a contiguous  file of a  specified size.
     If it can't, it creates a file with a minimum number of page runs.

PARTITION allows you  to set the disk  partition on which  BFSTest will
     operate.  This command is only available on Dolphins and Dorados.

A thorough, high-level "acceptance  test" for the disk subsystem  of an
Alto or Alto-emulating machine  (i.e. a Dolphin or Dorado)  consists of
at least  100 passes of  CERTIFY with  less than 5  bad spots  (any bad
spots on cylinder 0  are unacceptable), followed by ERASE,  followed by
at least 10 passes of EXERCISE with no errors.