RFC 683 RFC 683, NIC 32251 FTPSRV - TENEX FTP EXTENSIONS FOR PAGED FILES R. Clements - BBN - 3 April 75 1 Introduction In response to a long-known need for the ability to transfer TENEX paged files over the net via FTP, the TENEX FTP implementation has been extended. This implementation is an extension to the "OLD" protocol (RFC 354). It was built after useful discussions with Postel, Neigus, et al. I do not mean to imply that they agreed that this implementation is correct, nor for that matter do I feel it is correct. A "correct" implementation will be negotiated and implemented in the "NEW" protocol (RFC 542), if funding ever appears for that task. 2 The Problem(s) This extension attacks two separate problems: Network reliability and TENEX disk file format's incompatibility with FTP. A checksummed and block-sequence-numbered transmission mode is seriously needed, in my opinion. This mode should also allow data compression. It is also necessary to handle paged, holey TENEX files. This latter problem, seriously needed for NLS, is the motivation for the current extension. The former problem requires a new MODE command, if done correctly; probably two MODEs, to allow data compression in addition to checksumming. Actually, I think that is the tip of an iceberg which grows as 2**N for additional sorts of modes, so maybe some mode combination system needs to be dreamed up. Cf the AN, AT, AC, EN, ET, EC TYPEs. Also, one should be able to use MODE B and MODE C together (NEW protocol) to gain both the compression and restart facilities if one wanted. The second problem, TENEX files, are probably a new kind of STRUcture. However, it should be possible to send a paper tape to a disk file, or vice versa, with the transfer looking like a paged file; so perhaps we are dealing with a data representation TYPE. This argument is a bit strained, though, so a paged STRUcture is quite likely correct. I admit to feeling very unsure about what is a MODE, what is a TYPE and what is a STRUcture. 3 The (Incorrect) choices made Having decided that new MODEs and STRUctures were needed, I instead implemented the whole thing as a single new TYPE. After all, I rationalize, checksumming the data on the network (MODE) and representing the data in the processing system as a checksummed TYPE are really just a matter of where you draw the imaginary line between the net and the data. Also, a single new TYPE command reduced the size of the surgery required on the FTP user and server programs. 4 Implementation details The name of the new TYPE is "XTP". I propose this as a standard for all the Key Letter class of FTP commands: the "X" stands for "experimental" -- agreed on between cooperating sites. The letter after the "X" is signed out from the protocol deity by an implementor for a given system. In this case, "T" is for TENEX. Subsequent letter(s) distinguish among possibly multiple private values of the FTP command. Here "P" is "Paged" type. TYPE XTP is only implemented for STRU F, BYTE 36, and MODE S. Information of TYPE XTP is transfered in chunks (I intentionally avoid the words RECORD and BLOCK) which consist of a header and some data. The data in a chunk may be part of the data portion of the file being transferred, or it may be the FDB (File Descriptor Block) associated with the file. 5 Diversion: the TENEX Disk File For those not familiar with the TENEX file system, a brief dissertation is included here to make the rest of the implementation meaningful. A TENEX disk file consists of four things: a pathname, a page table, a (possibly empty) set of pages, and a set of attributes. The pathname is specified in the RETR or STOR verb. It includes the directory name, file name, file name extension, and version number. The page table contains up to 2**18 entries. Each entry may be EMPTY, or may point to a page. If it is not empty, there are also some page-specific access bits; not all pages of a file need have the same access protection. A page is a contiguous set of 512 words of 36 bits each. The attributes of the file, in the FDB, contain such things as creation time, write time, read time, writer's byte-size, end of file pointer, count of reads and writes, backup system tape numbers, etc. NOTE: there is NO requirement that pages in the page table be contiguous. There may be empty page table slots between occupied ones. Also, the end of file pointer is simply a number. There is no requirement that it in fact point at the "last" datum in the file. Ordinary sequential I/O calls in TENEX will cause the end of file pointer to be left after the last datum written, but other operations may cause it not to be so, if a particular programming system so requires. In fact both of these special cases, "holey" files and end-of-file pointers not at the end of the file, occur with NLS data files. These files were the motivation for the new TYPE. 6 Meanwhile, back at the implementation,... Each chunk of information has a header. The first byte, which is the first word (since TYPE XTP is only implemented for BYTE 36) of the chunk, is a small number, currently 6, which is the number of following words which are still in the header. Next come those six words, and then come some data words. The six header words are: Word 1: a checksum. This is a one's complement sum (magnitude and end-around carry) of the six header words and the following data words (but not the leading "6" itself). The sum of all words including the checksum must come out + or - zero. Word 2: A sequence number. The first chunk is number 1, the second is number 2, etc. Word 3: NDW, the number of data words in this chunk, following the header. Thus the total length of the chunk is 1 (the word containing NHEAD) + NHEAD +NDW. The checksum checks all but the first of these. Word 4: Page number. If the data is a disk file page, this is the number of that page in the file's page map. Empty pages (holes) in the file are simply not sent. Note that a hole is NOT the same as a page of zeroes. Word 5: ACCESS. The access bits associated with the page in the file's page map. (This full word quantity is put into AC2 of an SPACS by the program reading from net to disk.) Word 6: TYPE. A code for what type of chunk this is. Currently, only type zero for a data page, and type -3 for an FDB are sent. After the header are NDW data words. NDW is currently either 1000 octal for a data page or 25 octal for an FDB. Trailing zeroes in a disk file page will soon be discarded, making NDW less than 1000 in that case. The receiving portions of FTP server and user will accept these shortened pages. The sender doesn't happen to send them that way yet. Verification is performed such that an error is reported if either: The checksum fails, The sequence number is not correct, NDW is unreasonable for the given chunk type, or The network file ends at some point other than immediately following the data portion of an FDB chunk. 7 Closing comments This FTP server and user are in operation on all the BBN systems and at some other sites -- the user being more widely distributed since fewer sites have made local modifications to the user process. I believe the issues of checksumming and sequencing should be addressed for the "NEW" protocol. I hope the dissertation on TENEX files has been useful to users of other systems. It may explain my lack of comprehension of the "record" concept, for example. A TENEX file is just a bunch of words pointed to by a page table. If those words contain CRLF's, fine -- but that doesn't mean "record" to TENEX. I think this RFC also points out clearly that net data transfers are implemented like the layers of an onion: some characters are packaged into a line. Some lines are packaged into a file. The file is broken into other managable units for transmission. Those units have compression applied to them. The units may be flagged by restart markers (has anyone actually done that?). The compressed units may be checksummed, sequence numbered, date-and-time stamped, and flagged special delivery. On the other end, the process is reversed. Perhaps MODE, TYPE, and STRU don't really adequately describe the situation. This RFC was written to allow implementors to interface with the new FTP server at TENEX sites which install it. It is also really a request for comments on some of these other issues.