sabato 2 gennaio 2010

Linux & BTRFS: an example of layout

BTRFS is one of the most interesting filesystem in the Linux ecosystem. It has a lot of features[1], and for me the most interesting is its "snapshot" capability.

In my opinion "snapshot" is not a correct name. Snapshot means a static copy at a specific time. But he BTRFS snapshots are classified as "writable". So instead of "snapshot", "fork" would be a better name. But in order to avoid confusion I will use the "snapshot" name.

In order to evaluate the BTRFS snapshot capability, I switched my ubuntu system from ext3 to btrfs. And now I will describe how I did.

First of all, I have to say that BTRFS is in a development phase: don't use it in production enviroment. In fact I experienced a nasty bug [2], which had lead to an OOPS [2]. Even though I never seen a kernel crash o lost any valuable data. In any case I left my home under ext3.
Finally I have to highlight that I had problem even with ext4. But it was a Ubuntu bug (see ext4 and jaunty problem [3]).

Objectives:

For this study my goals were:
  • to create a snapshot
  • ability to access to the snapshot
  • possibility to switch the system to an old snapshot temporarily and permanently

before highlighting the solution, I have to introduce how the BTRFS tools manage the snapshot.

Creating and destroying snapshot


The BTRFS filesystem may be partitioned in "subvolumes". In fact when a BTRFS filesystem it is created, it consist of one volume called "." (dot). After the creation a BTRFS filesystem may be populated by other subvolumes. Every subvolume is placed in the filesystem and may be renamed, moved or destroyed.

btrfsctl is the tool that creates and destroy a subvolume:
  • Create a sub volume named "foo" under the directory /bar
    btrfsctl -S foo /bar

    It must be noted that the subvolume is like a directory. It may be renamed or moved (but not destroyed) with the "mv" command

  • Destroy a sub volume named "foo" under the directory /bar
    btrfsctl -D foo /bar

The subvolumes have two key properties:
  1. the ability to mount the subvolume directly via the "subvol=" mount option. For example supposing to have a subvolume named "foo" under the root of the filesystem, it is possible to mount the subvolume using the following syntax:
    mount -t btrfs -o subvol=foo /dev/sdX /mntPoint
    It must be noted that a subvolume may be mounted only if it is created under the root of the BTRFS filesystem.

  2. a subvolume may be snapshoted
    Remember: only a subvolume may be snapshoted. If you want to create a snapshot (named "foo") of the subvolume "bar", the syntax is:
    btrfsctl -s foo /bar

Moreover it must be noted that a snapshot of a subvolume doesn't touch a nested subvolume.
Pay attention to the fact that a snapshot doesn't create copy of the files. The copy is performed only if the file is updated from the original subvolume or from the snapshotted subvolume.

Filesystem layout


On the basis of the information of the paragraph above, I organized my filesystem as:
/                (root of the btrfs filesystem)
/rootfs (root of the filesystem)
/snap-YYYYMMDD (snapshot of the root of the filesystem)
/snap-YYYYMMDD (2nd snapshot of the root of the filesystem)
/snap-YYYYMMDD (another snapshot of the root of the filesystem)

Where "rootfs" is a subvolume containing the filesystem (/sbin, /bin, /usr, etc.); "snap-..." are snapshots of the "rootfs" subvolumes.
The key is that the system is contained in a subvolume; the BTRFS root is used only to manage the snapshot and is not mounted as root.

My "/etc/fstab" contain lines like:
/dev/sdX   /            btrfs subvol=rootfs,defaults 
/dev/sdX /var/btrfs btrfs subvol=.,defaults

And in my grub config file there is a line like:
kernel          /vmlinuz root=/dev/sdX ro rootflags=subvol=rootfs
Note the file above is strictly debian/ubuntu specific. In fedora the line above should be
kernel          /vmlinuz root=/dev/sdX ro rootfsflags=subvol=rootfs

How to snapshot


As explained above, the root of the BTRFS filesystem is placed in /var/btrfs. Under this directory there is the rootfs subvolume and its snapshots. If I want to a create new snapshot I have to do:
# cd /var/btrfs
# btrfsctl -s snap-<date> rootfs
If I want to access to an old file, then I can pick it from the snap-<date> subvolumes.

How to switch to a snapshot


There are two method to switch to a snapshot. In every case I have to reboot the machine
  1. rename the snapshot (permantely method)
    This method requires to rename the snapshot:
    # cd /var/btrfs
    # mv rootfs old-rootfs
    # mv snap-<date> rootfs
    Remember ? I said that a subvolume may be renamed with a simple "mv" command. During the next reboot the system will use the "rootfs" subvolume, which is the renamed "snap-<date>" subvolume.
  2. using a different subvolume ( temporarily method)
    This method requires to handle the grub boot entry. If I replace the part "subvol=rootfs" with "subvol=snap-<date>", the system will reboot with the old snapshot as filesystem.
    The entry may:
    1. be edited during the boot time (grub permits that)
    2. added as further boot entry in the grub menu list. So at the boot time the system leaves the user to select the real filesystem or an old snapshots

Conclusion


For my box(es) I created a script which handles the snapshot creation and deletion, and the adding of the entries in the grub config file.
I am studing how manage the home(s) in the subvolume. It may be useful to switch to a "system" snapshot without affecting the user home directories and viceversa. The idea is to create a subvolume per user. Every user should have the ability to create a snapshot of its home.
For desktop system it is easy and funny, for server system it has to be evaluated also the space consumption. If you remeber the snapshotting doesn't create a copy of file. The copy is performed only if the file is update (COW semantics). That means that if I change an already snapshotted file without touching the size, I create a copy of the old content even if I doesn't alter the file-size. This kind of problem are today uncommon, and it will require time to be fully understood and handled properly by the system administrators.


[1] See http://btrfs.wiki.kernel.org/index.php/Main_Page#Features
[2] See http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg03588.html
[3] See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/330824

My little patches...

Below a list of my patches spread on different projects: Linux kernel [all] 2018-02-01 iversion: Rename make inode_cmp_iversion{+raw}...