Heading:
IFS Operation
Page Numbers: Yes X: 527 Y: 10.5" First Page: 17
9. Other servers
IFS contains various servers that provide essential services to other hosts on the directly-connected Ethernet. These include the ‘miscellaneous’ servers (principally boot, name, and time), the Leaf page-level access server, the CopyDisk server, and the LookupFile server.
The boot, name, and time server functions duplicate those provided by gateway systems, so in a network with at least one gateway, it is not necessary for IFS to provide these services. But in a network that includes an IFS and no gateways, it is necessary for the IFS to provide the services. Even in networks that do have one or more gateways, running the IFS miscellaneous servers may be advantageous in that the availability of the services is improved. (Also, it should be noted that the IFS boot server is noticeably faster than the boot servers of existing gateway systems.)
Since running the miscellaneous servers may slightly degrade the performance of an IFS in its principal functions, means are provided to turn them off (the Change System-Parameters command, described in section 7). When a file system is first created, all the miscellaneous servers are disabled.
IFS participates in the protocols for automatic distribution and maintenance of the date and time, the network directory, and the common boot files. When IFS is started up for the first time, and thereafter whenever any changes are distributed, IFS obtains all necessary files from neighboring servers (gateways or other IFSs). The name server data base is maintained even if the IFS name server is disabled, because IFS requires it for its own internal purposes (principally mail forwarding).
9.1. Name server
The name server data base is kept as file ‘<System>Pup-network.directory’; a new version is created and older versions deleted whenever a new file is distributed. If there are no other name servers on the directly-connected Ethernet, you must use the BuildNetworkDirectory procedure to install new versions of the network directory.
9.2. Boot server
The boot files are kept in files ‘<System>Boot>number-name.boot’, where name is the name of the boot file and number is its boot file number in octal (for example, ‘<System>Boot>4-CopyDisk.boot’). Standard boot files have centrally-assigned boot file numbers less than 100000 octal, and are distributed automatically. Non-standard boot files have boot file numbers greater than or equal to 100000 octal and are not distributed automatically.
Ordinarily, IFS will obtain and maintain its own copy of all standard boot files maintained by any other boot server on the same Ethernet. This is the appropriate mode of operation for most boot servers. However, in some situations it is desirable for an IFS boot server to maintain only a subset of all available boot files. The Disable New-boot-files sub-command of Change System-parameters may be used to enter this mode; subsequently, IFS will not obtain any new boot files, but will continue to maintain and update any boot files that it already has. Additionally, boot files with numbers in the range 40000 to 77777 octal will always be managed in this fashion, regardless of the setting of the New-boot-files switch. Also, a boot file will not participate in the update protocol if it has a different number than the correspondingly-named boot file in other boot servers. By this means, special versions of standard boot files may be maintained on particular servers without interfering with the update of the standard versions on all other servers.
You may install or delete boot files by manual means (e.g., FTP), keeping in mind the file name and boot file number conventions described above.
Additionally, the boot server supports the MicrocodeBoot protocol, used to boot-load microcode on Dolphins and Dorados. The microcode files use a different numbering scheme from normal boot files. For purposes of boot file update, microcode file number n is assigned boot file number 3000B+n.
The boot server is also capable of boot-loading Sun workstations. The Sun boot protocol identifies boot files by name rather than by number. However, Sun boot files must still be assigned numbers to control the boot file update process, as described previously. Users need not mention these numbers when invoking boot files.
The various boot protocols are documented in [Maxc]<Pup>EtherBoot.press.
9.3. Time server
You should not enable the time server unless you have first calibrated and corrected the Alto clock, using the procedure described in section 7.
9.4. Leaf server
IFS contains a server for the ‘Leaf’ page-level access protocol, which permits random access to parts of IFS files as opposed to the transfer of entire files. There are several user programs that take advantage of this capability, though these programs are still experimental and not widely available.
At present, the Leaf software is being made available on a ‘use at your own risk’ basis. While it is included in the standard IFS release, it is enabled only on some file servers. Leaf was developed by Steve Butterfield, and is presently maintained by Ted Wobber <Wobber.PA> rather than by Taft and Boggs, who are responsible for the remainder of the system. Inquiries and trouble reports concerning the use of the Leaf server should be directed to Ted.
For performance reasons, the Leaf server should not be enabled in an IFS that also supports heavy FTP, Mail, CopyDisk and Telnet traffic, and the IFS Alto should have at least 192K of memory.
The Leaf server is enabled and disabled by sub-commands to the Change System-parameters command (section 7). However, for the Leaf server, the IFS software has to do a substantial amount of memory management at startup time which is impractical to perform during normal operation. Therefore, the Enable/Disable Leaf-server commands do not take effect immediately but rather are deferred until the next restart of IFS. More precisely, the Enable Leaf-server command is deferred until the next restart unless Leaf was already enabled at the last restart or IFS was last restarted with the /L switch. The Disable Leaf-server command takes effect after a delay of at most 3 minutes, but memory resources consumed by the Leaf server are not reclaimed until the next restart of IFS.
9.5. CopyDisk server
IFS contains a server for the CopyDisk protocol, which permits one to copy disks to and from an IFS. A CopyDisk server is equivalent to an FTP, Mail or Telnet server when deciding whether to create an IFS job in response to a request-for-connection. The load placed on the system by a CopyDisk job is about the same as an FTP job, except that transfers between disk and net will typically last much longer (minutes rather than seconds).
The CopyDisk server is enabled and disabled by sub-commands to the Change System-parameters command (section 7).
9.6. LookupFile server
The LookupFile server provides a means of verifying the existence of a file and determining its creation date and length, using a protocol that is substantially less expensive than either FTP or Leaf. Unlike FTP or Leaf, this server provides information to unauthenticated clients. If available, this server is used heavily by the file cache validation machinery in Cedar, with performance considerably better (and imposing less load on the server) than the corresponding operation performed via FTP.
The LookupFile server is enabled and disabled by sub-commands to the Change System-parameters command (section 7). The LookupFile protocol is documented in [Maxc]<Pup>LookupFile.press.
10. Adding packs to the file system
The capacity of an existing file system may be increased by adding more packs to it. This may be accomplished by the following procedure.
Initialize and test a pack using ‘TFU Certify’ in the normal fashion (section 3). Then, with IFS running, mount that pack on any free drive and issue the command:
! Extend (file system) Primary (by adding drive) d [Confirm]
‘Primary’ is the name of the file system you are extending, and d is the drive on which the new pack is mounted. IFS now initializes this pack, an operation that takes about 2 minutes for a T-80 and 7 minutes for a T-300. When it completes, the new pack has become part of the file system.
Note that there is no corresponding procedure for removing a pack from a file system. To decrease the number of packs in a file system, it is necessary to dump it by means of the backup system, initialize a new file system, and reload all the files from backup. This procedure is also required to move the contents of a file system from T-80 to T-300 packs.
Note also that adding packs to a file system does not increase the amount of directory space available. The size of the directory is determined when you first create the file system; there is no straightforward way (short of dumping and reloading) to extend it. (More precisely, while the software will attempt to extend the directory automatically if it overflows, this will significantly degrade subsequent performance, and too many such extensions will eventually cause the system to fail entirely.) Therefore, it is important that you allocate a directory large enough for all expected future needs. Experience has shown that 1000 directory pages are required for every 25,000 files in the file system, but this is highly dependent on a number of parameters including average file name length.
11. Backup
There are three facilities available for assuring reliability of file storage and for recovering from various sorts of disasters.
The first facility is the IFSScavenger program. It is analogous to the standard Alto Scavenger subsystem. It reads every page in the file system, makes sure that every file is well-formed, and checks for consistency between files and directories. For safest operation, it should be run after every crash of the IFS program. However, since it takes a long time to run, in practice it should only be run when major file system troubles are suspected (in particular, when IFS falls into Swat complaining about disk or directory errors). The IFSScavenger is described in a separate memo, available as <IFS>IFSScavOp.Bravo (or .Press) on Maxc1.
The second facility is an on-line incremental backup system that is part of the IFS program itself. It operates by copying files incrementally to a backup file system (ordinarily a single disk pack) mounted on an extra drive. The file system is available to users while the backup operation is taking place (though backup should be scheduled during periods of light activity to avoid serious performance degradations). Use of the incremental backup system requires that there be an additional disk drive connected to the Alto, over and above the drives needed for the primary file system itself. The backup system is described in the next section.
The third facility is the CopyDisk program. To back up the file system, one must take IFS down and copy each of the file system packs onto backup packs. On a machine with multiple disk drives, one may copy from one drive to another, an operation that takes about 4 minutes per T-80 and 15 minutes per T-300 if the check pass is turned off. One may also copy disks over the Ethernet to another Alto-Trident system, but this takes about five times as long.
At PARC we use the Scavenger and the Backup system; we no longer use CopyDisk for backing up IFS. Regular operation of either the Backup system or CopyDisk is essential for reliable file storage. We have observed several instances of Trident disk drive failures that result in widespread destruction of data. It is not possible to recover from such failures using only the IFSScavenger: the IFSScavenger repairs only the structure of a file system, not its contents.
11.1. Backup system operation
The backup system works in the following way. Periodically (e.g., every 24 hours), a process in IFS starts up, checks to make sure a backup pack is mounted, and sweeps through the primary file system. Whenever it encounters a file that either has never been backed up before or was last backed up more than n days ago (a reasonable n is 30), it copies the file to the backup pack and marks the file as having been backed up now. Human intervention is required only to change backup packs when they become full.
The result of this is that all files are backed up once within 24 hours of their creation, and thereafter every n days. Hence every file that presently exists in the primary file system is also present on a backup pack written within the past n days. This makes it possible to re-use backup packs on an n-day cycle.
Operation of the backup system has been made relatively automatic so as to permit it to run unattended during the early morning hours when the number of users is likely to be small. This is important because system performance is degraded seriously while the backup system is running.
11.2. Initializing backup packs
To operate the backup system, you need a disk drive and some number of packs dedicated to this purpose. The number of packs required depends on the size of your primary file system, the file turnover rate, and the backup cycle period n. The packs should have their headers and labels initialized using ‘TFU Certify’ in the normal fashion (section 3). Then they must each be initialized for the backup system as follows.
With IFS running, mount a backup pack on the extra drive. Connect to IFS from some other Alto using Chat, log in, enable, issue the Initialize command, and go through this dialogue:
! Initialize (file system type)
Answer ‘Backup’.
Do you really want to create a file system?
Answer ‘y’.
Number of disk units:
Answer ‘1’.
Logical unit 0 = Disk drive:
Type the physical unit number of the drive on which the backup pack is mounted.
File system ID:
Type some short name that may be used to uniquely identify the pack, e.g., ‘Backup1’, ‘Backup2’, etc. No spaces are permitted in this identifier. It should be relatively short, since you will have to type it every time you mount the pack. (You should mark this name on the pack itself, also.)
File system name:
Type some longer identifying information, e.g., ‘Parc IFS Backup 1’, or ‘Serial number xxxx’, or something.
Directory size (pages):
Type RETURN. (The default of 1000 pages is plenty.)
Ok? [Confirm]
Answer ‘y’ if you want to go ahead, or ‘n’ if you made a mistake and wish to repeat the dialogue.
IFS now initializes the backup file system, an operation that takes about 2 minutes for a T-80 and 7 minutes for a T-300. The message ‘Done’ is displayed when it is finished.
11.3. Setting backup parameters
The next step is to set the backup parameters, an operation that generally need be done only once. Issue the Backup command to enter the backup system command processor (whose prompt is ‘*’), then the Change command. It will lead you through the following dialogue:
* Change (backup parameters)
Start next backup at:
Enter the date and time at which the next backup run is to be started, in the form ‘7-Oct-77 02:00’.
Stop next backup at:
Enter the date and time at which the next backup run is to stop if it has not yet completed, e.g., ‘7-Oct-77 05:00’.
Interval between backup runs (hours):
Type ‘24’.
Full file system dump period (days):
Enter the number of days between successive backups of existing files (the parameter n above). A good value is 30.
The backup system command processor is exited by means of the Quit command in response to the ‘*’ prompt.
11.4. Normal operation
The following commands are used during normal operation. All of them require that you first Enable and enter the backup system command processor by means of the Backup command.
* Status
Prints a message describing the state of the backup system. It will appear something like:
Backup system is enabled and waiting.
Backup scheduled between 7-Oct-77 02:00 and 7-Oct-77 05:00
File system Backup1 is available to backup system.
73589 free pages.
‘Enabled’ means that at the appropriate time the backup system will start up automatically; the opposite is ‘disabled’. The backup system becomes enabled when you mount a backup pack (see Mount command, below), and disabled when the backup system can no longer run due to come condition such as the backup pack being full.
‘Waiting’ means that the backup system is not presently running; the opposite is ‘running’. When it is running (or has been interrupted in the middle of a backup run for whatever reason), it will display an additional message of the form:
Presently working on file filename
as an indication of progress (files are backed up in alphabetical order).
The last lines display the status of the current backup pack (assuming one has been mounted. If several backup packs have been mounted, they will all be listed.) The possible states are ‘available’, ‘presently in use’, and ‘no longer usable’. In the last case, the reason for the non-usability is also stated, e.g., ‘Backup file system is full’.
* Enable (backup system)
* Disable (backup system)
Enables or disables running of the backup system. If Disable is issued while the backup system is actually running, it will stop immediately (within a few seconds). These commands are not ordinarily needed, because an Enable is automatically executed by Mount (see below) and a Disable is executed when the backup system finds that there are no longer any usable backup packs. The backup system also stops automatically if IFS is halted by the Halt command, but it is not disabled and will resume running when IFS is restarted.
* Mount (backup file system) name
Makes a backup pack known to the system. name is the file system ID of the backup pack (e.g., ‘Backup1’). The pack must be on-line.
If the file system is successfully mounted, a message appears in the form:
Backup1 (Parc IFS Backup 1),
initialized on 6-Oct-77 19:32, 273 free pages.
Is this the correct pack? [Confirm]
If this is the pack you intend to use, you should answer ‘y’. Then:
Do you want to overwrite (re-initialize) this pack? [Confirm]
Normally you will be mounting a backup pack that has either never been used before or was last used more than n (e.g., 30) days ago. In this case you should answer ‘y’. This will cause the backup pack to be erased (destroying all files stored in it) at the beginning of the next backup run.
If, however, you are re-mounting a partially-filled backup pack that was removed for some reason, you should answer ‘n’. The backup system will then not erase the backup pack but rather will simply copy additional files to it.
* Dismount (backup file system) name
Makes a previously mounted backup pack unavailable to IFS. This command may be issued only while the backup system is disabled (use the Disable command if necessary).
The normal operating procedure is very simple. Every day, issue the Enable and Backup commands to enter the backup system command processor, then issue the Status command. The status will indicate one of the following conditions:
1. ‘Enabled and waiting’, with one or more packs ‘available to backup system’. In this case you need not do anything.
2. ‘Disabled and waiting’, with one pack ‘no longer available to backup system’ because ‘Backup file system is full’. In this case, you should remove the backup pack, install another one (making sure it was last used more than n days ago), and declare it to IFS by means of the Mount command (above).
3. ‘Disabled and waiting’, with some other condition (e.g., ‘Can’t find logical unit 0’). You should correct the condition (most likely the required pack wasn’t mounted at the time the backup system last started to run), then issue the Mount command as above.
When done, issue the Quit command to exit the backup system command processor. It is a good idea to keep a record of the dates on which each backup pack was mounted and dismounted so that you know when a pack is available for re-use.
11.5. Restoring individual files from backup; listing backup files
Individual files may be restored from backup in the following manner. It is not a good idea to do this while the backup system is running.
Install the desired backup pack on any free drive. Issue the Enable and Backup commands to enter the backup command processor. Then go through the following dialogue:
* Restore (from backup pack) name
name (long-name) mounted
Restore:
file-designator
The name is the File system ID of the backup pack (e.g., ‘Backup1’). In response to ‘Restore:’, type the name of a file to be restored. ‘*’s are permitted, and the default version is ‘!*’. The name of each file is typed out as it is restored.
When all files matching file-designator have been restored, IFS will again prompt you with ‘Restore:’. You may either restore more files (from the same backup file system) or type RETURN to indicate that you are finished.
Files are restored from the backup system with precisely the attributes (version number, reference dates, etc.) they had when backed up. If a file already exists in the primary file system, IFS will refuse to overwrite it unless the version in the backup file system is newer.
It is also possible to examine the directory of a backup pack (or, indeed, any IFS file system you can mount in its entirety) by means of the following commands:
* OnLine (file system) name
name (long-name) mounted
Makes the secondary file system name available for use by the commands described below. If some other file system was already put on-line by a previous OnLine command, it is first put off-line.
* List (files) file-designator
This command is identical to the standard top-level List command (with all its sub-commands), but applies to the secondary file system most recently specified in an OnLine command rather than to the primary file system.
* OffLine
Makes unavailable the file system most recently specified in an OnLine command. This operation is performed automatically if you exit the Backup command by Quit or control-C.
After doing an OnLine, you may issue as may List commands as you want; control remains in the Backup command (unless you abort by control-C), and the secondary file system remains on-line. A Restore command will use the current secondary file system if there is one, and will automatically put it off-line when it is finished.
You must issue OffLine or an equivalent command before turning off the drive on which the secondary file system is mounted. Failure to do so will cause IFS to fall into Swat.
11.6. Reloading the entire file system from backup
If the primary file system is clobbered in a way that the Scavenger can’t repair, the following procedure may be used to recreate it from backup. If performed correctly, this procedure will restore the primary file system to its exact state at the time of the most recent backup run.
11.6.1. Complete reload
First, re-initialize the primary file system as described earlier (section 4). Then connect to IFS from another Alto using Chat, login as System (password IFS), and issue the Enable command. It is advisable at this point to disable logins with the Disable Logins sub-command of the Change System-parameters command so as to prevent users from accessing the file system while you are reloading it.
Mount (on any free drive) the most recent backup pack, i.e., the one most recently written on by the backup system (this is very important). Then:
* Reload (file system)
Reload the entire file system? [confirm] yes
Note: mount the LAST backup pack first.
Mount backup pack:
name
The name is the ID of the backup pack you have mounted. IFS will now proceed to restore files from the backup pack to the primary file system. When it is done, it will again ask you to ‘Mount backup pack:’, at which point you should mount the next most recent backup pack. Repeat this procedure until you have mounted all packs written within the past n days. When you are done, type control-C to terminate the reload process.
IFS will list out the files as they are restored. (To disable the pause at the end of each page, type ahead one space.) You will notice that not all files are restored. In particular:
Files that were backed up at some time but no longer existed at the time of the last backup are not restored. (The listing will say such a file is ‘deleted’.)
Files already restored from a more recent backup are not restored from an earlier one. (The listing will say ‘already exists’.)
It is important to understand the difference between Restore and Reload. Restore simply copies the specified files from the backup pack to the primary file system. Reload, on the other hand, attempts to recreate the state of the primary file system as of the most recent backup. To this end, Reload will restore only those files that actually existed at the time of the most recent backup run, and will skip files that once existed (and were backed up) but were subsequently deleted.
It is essential that the last backup pack be reloaded first. Failure to heed this instruction will cause some files not to be reloaded that should have been, and vice versa. If the reload is interrupted for any reason and must be restarted, you must again start by reloading the last backup pack (even though all files from that pack may have been reloaded already), and then skip to the pack whose reload was interrupted. This is because the decision whether or not to reload each file is made on the basis of the last state of the file system as recorded on the most recent backup pack.
Reloading the file system restores all system and backup parameters to their former state, so long as the System directory is one of those restored. If you are using the backup system to move files en masse from one file server to another, you should check to make sure that the system parameters that are restored are suitable for the new system. Also, after completing a reload, it is necessary to halt and restart IFS to ensure that all system parameters take effect.
11.6.2. Partial reload
Ordinarily you should answer ‘yes’ to the question ‘reload the entire file system?’. However, there are situations when you might wish to reload only some of the directories, such as when moving a group of users’ files from one IFS to another. The easiest way (though not the only way) to accomplish this is to reload those directories from the original IFS’s backup packs onto the new IFS.
If you answer ‘no’ to ‘reload the entire file system?’, IFS will ask you to type in the names of the directories to be reloaded, separated by spaces and terminated by RETURN. You should type these names carefully, as they are not being error-checked by IFS and there is no way to go back and make a correction. (Do not type angle brackets around the directory names.) After you type RETURN, the remainder of the reload process will take place as described above, but only the directories you specified will be restored and the rest will be skipped.
11.6.3. Reloading damaged files
When the IFSScavenger is run to repair a broken file system, it may find one or more files whose contents are damaged. The IFSScavenger is capable of repairing only the file system’s structure, not its contents. When it detects a damaged file, it marks the file as being damaged, sets the file’s protection to empty so as to discourage access by users, and puts an error report in the IFSScavenger.log file. (Certain kinds of damage cause the file to be deleted altogether; this is also noted in the log file.)
When only a few files are damaged or deleted, the easiest recovery procedure is to restore them individually using the Restore command described in section 11.5. But when many files are involved, it is better to use the following procedure, which is relatively automatic but quite time-consuming.
The procedure is simply to use the Reload command described in section 11.6.1, but without first initializing the file system. Reload will consider all the files on all the backup packs you mount (starting with the most recent one), but will copy only those files that are either damaged or missing in the primary file system. This procedure may be carried out while the file server is available to users.
Note that this operation will also reload copies of any files that were deliberately deleted by users since the most recent run of the backup system. This is because the Reload process has no way of determining whether a missing file was deleted by a user or by the IFSScavenger. After completing this procedure, you should warn users that recently-deleted files may have been resurrected.
11.7. Repeating backup runs
It may happen that you want to repeat one or more of the most recent backups—say, because the current pack suffered hard errors or irreparable damage, and you wish to repeat the backup of the affected files onto a fresh backup pack. This is controlled by the following commands:
* Repeat (backups later than) date
During the next run of the backup system, IFS will back up all files that were last backed up more recently than date, in addition to the files it normally backs up.
* Don’t Repeat (backups)
Cancels the Repeat command. The Repeat command is also cancelled automatically upon successful completion of a backup run.
12. Accounting
Accountant.run is a program which collects accounting and administrative information from a running IFS. It retrieves copies of all of the Directory Information Files (DIFs) from a running IFS and produces a text file containing per-directory and system-wide information.
In order to run Accountant, you must be a wheel, since the DIFs which Accountant reads are protected. Note that you run this program on some other Alto, not on the IFS Alto.
When first started up, Accountant asks you for the name of the IFS that it is to connect to. It then asks three questions: ‘Generate accounts listing?’, ‘Generate group membership summary?’, and ‘Generate disk usage listing?’; for each one, if you answer ‘yes’ then it requests an output file name (on your local Alto disk). It then connects to the IFS and produces the requested output.
An accounts listing consists of the names and attributes of all directories in the system, including disk page limit and current usage. At the end of the listing are the totals of page limits and current usages.
A group membership summary shows the memberships of each of the IFS user groups. This information is valid only for non-Grapevine members of non-Grapevine groups. The group membership summary is useful in managing an IFS that does not use Grapevine for access control, and is also useful when converting an existing IFS from non-Grapevine to Grapevine access control. Additionally, the summary includes, for each group, a list of directories whose connect, create, or default file protections refer to the group; this is useful in determining what a group is used for and whether it is still in active use.
A disk usage listing includes, for each directory, the number of files and pages accounted for by ‘old’ versions (all but the most current version of files for which multiple versions exist) and a histogram of time since most recent access. This information is useful for discovering obsolete files and directories.
The accounts listing and group membership summary are generated fairly quickly—15 minutes or so for a large IFS (4 T-300 disks). The disk usage listing takes a long time—2 to 3 hours for a large IFS. All three listings can be generated simultaneously; however, due to peculiarities of the FTP protocol, generating a disk usage listing at the same time as either or both of the others is likely to take longer than generating them separately.
13. Miscellaneous
13.1. Disk pack identification
If you forget the ID of some Trident pack (e.g., a backup pack), there is no way to ‘Mount’ it for the backup system. This is why it is a good idea to mark the ID on the pack itself (not on its plastic cover, which is interchangeable with other packs). A good place to mark it is on the plastic ring on the top of the pack. Do not affix a paper label: it will fly off and destroy the heads, the pack, or both.
There is, however, a command for finding out vital information about a pack. It is:
! What (is the pack on drive) d
where d is a drive number. If the pack is an IFS pack (primary or backup), this command will print out the vital parameters, including the ID. If the pack is not an IFS pack, it will say so.
13.2. Software performance
The IFS software strains the Alto’s capacity severly, particularly with respect to main memory. In combination with certain deficiencies of the BCPL runtime environment, this can lead to rather poor performance (in particular, excessive disk thrashing) when there are more than a few simultaneous users of the system.
Also, there are times when certain data structures and code segments cannot be swapped out. It is possible for the system to deadlock if all of memory is occupied by such immovable objects. The symptom of this is that IFS ceases to respond to requests for service, the Alto screen looks normal (black with ‘IFS’ in the cursor), and the screen does not flash when you press the space bar. The possibility of deadlocks is the principal reason for imposing a limit on the number of simultaneous server processes. To make things worse, deadlocks frequently occur during major changes to the directory, thereby leaving it in an inconsistent state requiring the IFSScavenger to correct.
If the IFS Alto has only 64K of memory, IFS must both keep data and swap code using that memory. Considering that there is over 100K of swappable code and only 32K of memory available for swapping both code and data, this leads to serious disk thrashing. In the 64K configuration, the maximum possible Server-limit is 6 and the default is 5. Even with a limit of 5, memory deadlocks are likely (depending on usage patterns), and it may be necessary to reduce the Server-limit to 4 or even 3 in order to entirely prevent deadlocks. For all practical purposes, a 64K Alto should be considered an unsupported configuration.
If the IFS Alto has extended memory, the situation is much better. IFS has an ‘extended emulator’ that is able (with some restrictions) to execute BCPL code residing in extended memory, though it can still reference data only in the first 64K of memory. Consequently, performance is significantly improved, since most or all of the first 64K is available for data and code swapping is reduced or eliminated.
The IFS software running on an Alto with 128K of memory can support up to 8 simultaneous users, and with 192K or more up to 10 simultaneous users. It is believed that memory deadlocks are impossible with these configurations. Therefore, it is strongly recommended that IFS Altos have at least 128K of memory, and systems that serve large or highly demanding user communities should have 192K of memory. (The software does not presently take any advantage of memory beyond 192K.)
13.3. Interpreting system statistics
The IFS Executive’s Statistics command pours out various internal operating statistics, some having to do with hardware and some with software. Many are of interest only to IFS implementors, but all are explained here for completeness. An IFS administrator should examine these statistics periodically (say, once per day) to notice problems that may lead to progressive failures; this is particularly important in the case of memory and disk errors. Except where noted, all statistics cover the interval since IFS was last restarted.
If you terminate the Statistics command with SPACE rather than CR, you will be asked for an additional keyword (one of Directory, Disk, Mail, Memory, or Server); and only the specified subset of the system statistics will be displayed.
SmallZone overflows, bigZone overflows, overflow pages
IFS has a three-level memory storage allocator. SmallZone and bigZone are heap-type allocators for objects of less than 25 words and of 25 to 500 words, respectively. Objects larger than 500 words are allocated by removing one or more 1024-word pages from the VMem (virtual memory manager) pool. If one of the first two zones becomes exhausted, it recovers by borrowing space from the next larger zone.
It is normal to encounter up to 100 or so zone overflows per day, and for there to be a maximum of 2 or 3 VMem pages used to recover from bigZone overflows. More overflows are indicative of the need to change some compile-time parameters. If the ‘current’ number of overflow pages remains nonzero for any significant length of time, it is indicative of a bug (loss of allocated storage).
Net blocks allocated minus blocks freed
This is simply the number of memory storage blocks presently allocated. If there is no system activity besides your Chat connection, this should be more-or-less constant. If it increases slowly over time, storage is being lost.
PBIs, PBI overflows
The Pup software maintains a pool of packet buffers (PBIs) that are shared among all active servers. The first number displayed is the normal number of PBIs, which is constant for a given IFS release and Alto memory configuration. Under certain circumstances (particularly when connections are made through slow links), the system runs out of PBIs; when this happens, additional PBIs are temporarily allocated (from bigZone) to cover the shortage. (Frequently this will cause bigZone to overflow as well.)
VMem buffers, buffer shortages
Approximately half of Alto memory is turned over to the VMem package, which manages it as a buffer pool of 1024-word pages and implements a software virtual memory for accessing various objects on the disk, including code overlays (if not resident in extended memory), directories, and bit tables. The number of VMem buffers is constant for a given IFS release and Alto memory configuration.
If the VMem package receives a request that it can’t satisfy because all buffers are in use by locked objects (or have been removed to service a zone overflow), it increments the ‘buffer shortages’ count (vMemBufferShortages, accessible from Swat) and then waits, in the hope that some other process will run to completion and release some buffers. Sometimes this works. On other occasions, all processes in the system get into this state and the system is deadlocked.
VMem reads and writes
This table contains the number of swap reads and writes for each of four main types of objects managed by the VMem package: code overlays, VFile pages (virtually accessed files, principally the IFS directory B-Tree), DiskDescriptor (disk bit map) pages, and Leaf pages (file pages accessed via the Leaf server).
Overlays read from XM and from disk
If the Alto has extended memory, this indicates how many overlay reads have been satisfied by reading from extended memory rather than from the disk. Most overlays are executed directly by the extended emulator, but under certain conditions overlays must be swapped into the first 64K before execution. There should be virtually no overlay swapping (from either XM or disk) on a machine with 192K of memory or more.
Main memory errors
For an Alto-II, if any main memory errors have occurred since IFS was last restarted, the offending memory card and chip numbers are reported. These are single-bit errors that have been corrected by the hardware, so they are not cause for immediate alarm. However, when such errors occur, you should schedule hardware maintenance for replacement of bad chips at the next convenient time. This will reduce the likelihood that a single-bit error develops into an uncorrectable error, causing the server to crash.
Disk statistics (cumulative)
This table contains operating statistics for each Trident disk unit, and, in the case of T-300 disks, each of the two logical file systems on the unit. All disk statistics are cumulative from the time the file system was created.
File systemFile system name and logical unit number within that file system. Logical unit 0 contains the IFS directory and the code swapping region.
TransfersNumber of pages transferred to and from the unit.
ErrThe number of errors of all kinds except data-late errors and label check errors knowingly caused by the software. (See below for more information.)
ECCThe number of data errors detected by the Error Correction Code. (See below for more information.)
FixThe number of ECC errors that have been corrected. The ECC permits correcting error bursts up to 11 bits long.
RestThe number of times a head-restore operation has been performed. This is done as a last-resort measure when an uncorrectable error persists through 8 retries.
UnrecThe number of errors unrecoverable after 16 retries. This usually causes IFS to fall into Swat and report an unrecoverable disk error. This may be indicative of a hardware problem or of an inconsistency in the structure of the file system. Running the IFSScavenger can tell you which.
BTerrThe number of times the bit table has been found to be incorrect, i.e., to claim that a page is free when it isn’t. This in itself is a non-fatal error, but it may be indicative of serious hardware or software problems. On the other hand, it can also be caused by restarting IFS after a crash without first running the IFSScavenger.
FreeThe number of free pages on the logical unit. IFS always allocates new files on the emptiest unit, and every file is contained completely within one unit. The software does not always recover from running out of disk space (i.e., it may fall into Swat), so be careful not to let the amount of free space get too low.
Disk drive statistics (since last restart)
This table corresponds to a portion of the previous table, but it is reset every time IFS is restarted, and the software-related information is omitted. Progressive hardware problems are more evident in this table than the previous one.
TransfersNumber of pages transferred to and from the unit.
ErrThe number of errors of all kinds except data-late errors and label check errors knowingly caused by the software. It is normal for these to occur at a low rate; they are caused by known hardware problems in the controller and disk drives, and by random electronic glitches. A sudden jump in this error count not accompanied by a corresponding jump in the ECC count is grounds for suspicion of a non-data-related problem (e.g., positioning errors or momentary not-ready conditions.)
ECCThe number of data errors detected by the Error Correction Code. These should be extremely infrequent, especially on T-300 drives which are very reliable if properly maintained. A sudden jump in the rate of ECC errors is grounds for suspicion of a hardware problem.
/10↑10 bitsThe ECC error rate per 1010 bits transferred. The Century Data specification for T-300 drives is one recoverable ECC error per 1010 bits; but this assumes perfect testing of disk packs, no interchanging of packs between drives, etc. Nevertheless, this statistic is useful for comparison between drives in one system, or between different systems.
Directory statistics
These include the total number of files in the system; the number of pages actually in use by the directory B-Tree; and (most important) the number of runs (fragments) comprising the directory file <System>IFS.dir. Ordinarily there should be only one run; however, if the directory has grown larger than its preallocated size (as discussed in section 10), the additional required pages will cause more runs to be created. (This can also occur if IFS.dir suffers hard disk errors and is destroyed and recreated by the IFSScavenger.) If the number of runs exceeds 40, IFS must be started with one or more /F switches, as described in section 4; otherwise it will crash occasionally due to running out of space in its file map.
Server statistics
These show, for each type of server, the number of connection requests that have been accepted and the number that have been refused due to the system reaching the Server-limit.
Mail statistics
These are described in section 8.1. They are cumulative since the file system was created or the mail statistics were last reset explicitly.
14. Known bugs and what to do about them
No bugs are presently known to exist in IFS version 1.37.
15. Revision history
IFS was designed during November and December of 1976. It was storing and retrieving files in January 1977, and was ‘released to friends’ in June of 1977 (releases through 1.02). The first ‘official’ release to the Alto user community was:
Version 1.03; August 4, 1977
Procedures added for running Triex, for blessing your Trident controller, and for halting IFS in a cleaner manner than before.
Version 1.07; September 3, 1977
/V switch added for startup-time directory B-Tree verification.
Version 1.08; October 5, 1977
Backup system released.
Version 1.10; November 1, 1977
Full T-300 support; ‘Extend’ command for adding packs to an existing file system; Triex and TFU operating procedures changed; IFSScavenger released.
Version 1.12; November 10, 1977
Performance improvements in both IFS and IFSScavenger; procedures for initializing and testing a disk pack again changed (Triex eliminated from procedure).
Version 1.14; February 21, 1978
File protections implemented; command added to disable logins; backup system bugs fixed.
Version 1.15; March 4, 1978
Converted to new time standard; ‘What’ command added; automatic SetTime at startup; obscure directory ordering bug fixed.
Version 1.18; November 15, 1978
Mail server added; limited support for extended memory; Change System-Parameters command added, with sub-commands to change clock correction, limit the number of simultaneous servers, reset the time, and enable and disable service; Logins command removed; screen flashes if you hit space bar and system is operating normally; Accountant program released; documentation on system performance and interpreting Statistics output; file IFS.Ov is no longer part of the release.
Version 1.21; July 16, 1979
Mail forwarding and Press file printing implemented; miscellaneous servers (name, time, and boot) added; Change System-Parameters sub-commands modified; protection groups may now be ‘owned’ by individual (non-wheel) users; Change Group-Membership and Show Group-Membership commands added to permit users to manipulate group membership; more statistics.
Version 1.23; January 13, 1980
Conforms to new file creation date standard; files-only directory’s owner is permitted connect access; mail server supports remote distribution list retrieval; privileged Telnet Mail command added, which brings together the mail related commands previously scattered among other Telnet commands.
Version 1.24; March 8, 1980
Extended emulator included, enabling substantially improved performance and more simultaneous users if the IFS Alto has extended memory (see section 13.2); new command file available to construct an IFS Operation disk from scratch (section 2); a few additional statistics are kept and displayed by the Statistics command (section 13.3); sub-command added to privileged Mail command to reset the mail statistics (section 8.1); optional Leaf page-level access server included on a ‘use at your own risk’ basis (section 13.4). Note: due to a format change, the mail statistics will be reset automatically the first time IFS 1.24 is started.
Version 1.25; May 19, 1980
CopyDisk server added (see section 13.5); new commands to display and cancel press printing requests (section 7); new commands to repeat the backup of files (section 11.7).
Version 1.27; September 6, 1980
Mail server changes, required for compatibility with the Grapevine servers; Printed-by sub-command added to Print command; a few bugs fixed.
Version 1.28; January 3, 1981
Rename command defaults new file name; Print command has new sub-commands Duplex and Password; backup system can reload selected directories rather than the entire file system (section 11.6); boot server can now boot-load microcode for Dolphins and Dorados; the boot server’s automatic acquisition of new boot files can be disabled (section 9); Accountant program augmented to produce disk usage listing (section 12); mail server changes, required for compatibility with the Grapevine servers (section 8.1). Note: the Leaf protocol has undergone minor changes that may require Leaf clients to be changed correspondingly; implementors of software that uses Leaf should contact Ted <Wobber.PA> for information.
Version 1.30; January 28, 1981
The Change Protection command has been generalized to Change Attributes, with new sub-commands Backup, Byte-size, and Type. The Backup command has new sub-commands OnLine, OffLine, and List (section 11.5). Some hardware error checks have been added: the Control RAM and S-registers are tested during startup (section 5), and corrected single-bit main memory errors are recorded (section 13.3). Additionally, a number of bugs have been fixed.
Version 1.31; May 10, 1981
The action of the /A switch has changed. The directory statistics (section 13.3) and the meaning of the /F switch are now documented. Mail system changes have been made to facilitate conversion of an IFS-based registry to Grapevine; in particular, disabling a user’s mail capability causes his in-box to be destroyed next time it is read rather than immediately; and the forwarder now understands about registry names that map to multiple addresses (section 8). The Leaf server now supports a ‘multipleWriters’ mode of access; consult Ted <Wobber.PA> for details. The FTP server now deals in date properties that include time zones; in conjunction with a new release of FTP.run, this enables file date properties to be transferred correctly between different time zones.
Version 1.33; June 29, 1981
The purpose of this release is principally to fix two long-standing and notorious bugs: the ‘B-Tree delete bug’ and the ‘infinite return-to-sender bug’. Additionally, the Trident disk software’s handling of recoverable disk errors has changed somewhat; in particular, the rate of non-ECC errors is substantially reduced. Some additional disk error information is displayed by the Statistics command (section 13.3).
Version 1.35; December 11, 1981
IFS can be configured to use Grapevine for authentication and access control; this replaces the user name and group structure maintained by IFS, and eliminates the need for Guest accounts (section 6). You should read the new ‘How to use IFS’ document, as it includes an explanation of how IFS uses Grapevine that is not repeated here. A Show System-Parameters command has been added. The Create and Change Directory-Parameters commands now have the same set of sub-commands (section 6). Reloading the file system from backup now properly restores all system parameters instead of resetting them to default values (section 11.6). Accountant generates a group membership summary (section 12). A new version of the IFSScavenger accompanies this release. Note: for proper error reporting, you should obtain the latest [Maxc2]<Alto>Sys.errors.
March 14, 1982 (documentation update only)
A summary of known bugs has been added (section 14). There is now a separate document describing access control policies and procedures in considerably more detail; please obtain and read [Maxc]<IFS>AccessControls.press.
Version 1.36; May 13, 1982
This is principally a maintenance release to fix a number of bugs found in previous releases. Functional changes are as follows. In an IFS that uses Grapevine group checking, a user who is not registered in Grapevine but does have a login directory on the IFS is no longer automatically a member of World; his membership in World is now controlled just the same as membership in other groups (using the Change Group-Membership command). The Backup Reload command may now be used to repair damage detected by the IFSScavenger (section 11.6.3). The FTP server supports some recent extensions to the FTP protocol that permit substantially improved performance in certain operations (particularly enumerations, which in certain cases are now over 10 times as fast as before); some changes to client software are required to take full advantage of this improvement.
Version 1.37; October 3, 1982
This release introduces some minor new features. The boot server can now boot-load Sun workstations; also, new boot files installed by manual means (e.g., FTP) are now noticed immediately instead of after a delay of up to 8 hours (section 9.2). A new server, LookupFile, is included and may optionally be enabled (section 9.6). The backup system now automatically fixes any incorrect directory page usage totals which it encounters. Additionally, internal changes in the VMem software have resulted in a modest performance improvement and elimination of the long-standing ‘Can’t flush locked page’ bug.