== Notice == This document is on hold. It may be used for the format for saving meta-data inside the encrypted file itself, but will not be used for the "filelist" file. The filelist file will only contain a simple filename conversion. This document is here purely for archival reasons. == The "filelist" File == If the -m (--meta-encrypt) option is given, the file names, as well as other meta-data about the files, is garbled. All files are issued as standard permission files (according to the umask on Unix like systems). Their names are turned into a random stream of characters, and their ownership is that of the user running rsyncrypto. The real information about all of tha above is stored inside a special file, called "filelist". Of course, filelist is encrypted according to the standard rsyncrypto encryption. It is treated in a special way, it does not go through the same series of name changes as the rest of the files. This is done so that the file list itself could be identified and restored first, in case of a restore where the symmetric keys themselves have been lost. The format of filelist was designed to allow maximal coverage of OS specific features, while not giving up on platform inter-operability. Also, in order to keep parsing of filelist simple, it is in binary format. == Number Format == Whenever a multi-byte value is mentioned in the file format, this number is stored in network byte order, as used in TCP/IP. Network byte order is also known as "Big Endian". == File Format == Filelist contains a sequence of "chunks". Each chunk describes a different file that is in the backup. The chunks are sorted according to the encrypted file name. This was necessary for leaking as little information as possible about the real file name, while forcing a uniform order to maintain rsync and rsyncrypto efficiency. The first four bytes of filelist are a magic number identifying it and its version. The magic number for version 1 is 0x15c8fe60. After the magic number, the rest of the file is just a series of chunks, one after the other. Each chunk describes a certain file entry. The files must be ordered such that all files that reside in a single directory must be listed together. In addition, file entries inside the directory must be sorted alphabetically. Writers may order the directories in the file in any order thay wish, and readers may not assume any specific order to the directories themselves. Writers may use an algorithm that prepends the nesting level of the entry to the file name, and then sorts in the standard way (gives breadth first tree scan order). Readers may not assume that this is the algorithm used. == Chunk Format == Each chunk is composed of a series of specific data (data blocks). The data blocks have two types. The mandatory blocks, which must appear in each and every chunk, and the optional blocks, which may be present or not. For ease of parsing, the mandatory blocks must appear first in the chunk, in increasing block type order. The end of chunk is marked using a special "end of chunk" data block. == Data Block Format - Overview == The first two bytes of each data block are the block's size. The next two bytes are the block's type indentification. Types 0000 (platform), 0001 (original file) and 0002 (encoded file name) are mandatory. In addition, certain file types imply additional mandatory blocks. Implementations that do not implement support for said file types, however, need not handle the derived mandatory blocks. Actual block data follows after the first two bytes. A read confronted with an unknown block type should simply skip that block, and continue with the next block. A writer must always issue all mandatory blocks for the file version generated by it (as determined by the magic number at the start of the file). All strings are NULL terminated. Non ASCII characters are specified in either local encoding or UTF-8 encoding. All blocks start on a file offset that is 4 bytes aligned. If a natural block size is not a multiple of 4, writers must pad the block with zero (null) bytes. The block length must include the padding, and must divide by 4. === Mandatory Block === == Block 0000 - Platform == Block structure: 2 bytes : block length 2 bytes : block type, always 0000 1 byte : platform code 1 byte : directory seperator character string : Textual platform name (uname -o or equivalent) - ASCII The Platform block holds an identification of the OS from which the file was taken. The platform codes are defined in "filelist_format.h". Some codes: Posix - 03 Win32 - 0B MaxOS - 07 == Block 0001 - Original File Name == 2 bytes : block length 2 bytes : block type, always 0001 1 byte : File type 1 byte : File name encoding string : The name of the file File type is the type of the file, according to the following File name encoding values: 00 - File name encoding is unknown 01 - File name is UTF-8 encoded All other values are reserved. Readers should treat all other values as "00". Writers must not generate them. The file name field uses the same directory seperator as defined in the platform block. Comment - this specification leaves no room for multi-byte directory seperator. The author is not aware of any platform in which a directory seperator requires more than one byte in UTF-8 (i.e. - non-ASCII directory seperator). == Block 0002 - Encoded File Name == 2 bytes : block length 2 bytes : block type, always 0002 string : The name of the file (ASCII) The encoded file name must be ASCII encodeable. It must not contain any directory seperators. === Optional Blocks === == Block 0003 - Posix File Permission == 2 bytes : block length 2 bytes : block type, always 0003 4 bytes : UID 4 bytes : GID 4 bytes : file mode === Administrative Blocks === == Block FFFE - NOP == 2 bytes : block length 2 bytes : block type, always FFFE Any number of bytes : random data Readers should skip this block type without processing it. == Block FFFF - End of Chunk == 2 bytes : block length, always 4 2 bytes : block type, always FFFF Writers must place this block at the end of each chunk. Readers should assume that any data after this chunk is the begining of the next chunk.