Particle

Design concepts for the particle file format.

Primary design goals:

  • Should cover "manifest" files (complete metadata, but no payload), "full data" files (complete metadata and payload), "delta data" files (partial metadata plus new payload)
  • Each directory tree should be identified by a cryptographic checksum, that is generated from the data and metadata
  • Reproducible: regardless how data is transported/packed/unpacked, payload and metadata of identical file system trees must result in the same particle file and checksum
  • Fuzzy verification: must be able to deal with the fact that not all file systems provide the same feature set (some lack nsec timestamps, others btrfs subvols, hard links, ...)
  • Must be streamable, i.e. not require random access to particle file when packing or unpacking
  • Should be possible to generate from all common Linux file systems, and be applicable to all common Linux file systems
  • Extensible

Secondary design goals:

  • Files should be composable: a particle file of a directory is just the concatenation of its metadata plus the particle files of its directory entries
  • Should cover all kinds of modern and exotic file system objects (btrfs subvols, chattr flags, btrfs crc32, xattr, acls, nsec times, sparse files, hard links, ...)
  • Delta image generation should be efficient on btrfs snapshots
  • Content addressable: if you have a file with random access (instead of just stream access) should be efficient to jump to specific files)
  • Delta compression
  • Should be able to cover individual files as well ad directory trees
  • Should support references to external particles for directory subtrees
  • Should support universal addressing of files (as particle hash + relative path)
  • Delta should be efficient to apply to btrfs subvolumes, and should be transformable into unionfs delta trees

Prior art:

  • tar (in all its versions), pax
  • cpio
  • docker's tarball series
  • zip
  • BSD's mtree
  • Atomic's/os-tree's data transfer
  • vcdiff/xdelta
  • ChomeOS' Courgette asssembly delta stuff
  • btrfs send/recv
  • UNIX "dump"
  • rsync
  • rdiff
  • zsync (http://zsync.moria.org.uk/paper/
  • qcow2 delta
  • Microsoft RDC (http://research.microsoft.com/en-us/um/people/gurevich/Opera/183.pdf)
  • bsdiff (http://www.daemonology.net/papers/bsdiff.pdf)
  • lbfs (http://cis.poly.edu/cs623/lbfs.pdf)
  • TarSum (https://github.com/docker/docker/blob/master/pkg/tarsum/tarsum_spec.md)
  • Continuity (https://github.com/stevvooe/continuity)
  • Mozilla Archives (https://wiki.mozilla.org/Software_Update:MAR)
  • tr-split (https://github.com/vbatts/tar-split)