sabato 2 gennaio 2010

Linux & BTRFS: an example of layout

BTRFS is one of the most interesting filesystem in the Linux ecosystem. It has a lot of features[1], and for me the most interesting is its "snapshot" capability.

In my opinion "snapshot" is not a correct name. Snapshot means a static copy at a specific time. But he BTRFS snapshots are classified as "writable". So instead of "snapshot", "fork" would be a better name. But in order to avoid confusion I will use the "snapshot" name.

In order to evaluate the BTRFS snapshot capability, I switched my ubuntu system from ext3 to btrfs. And now I will describe how I did.

First of all, I have to say that BTRFS is in a development phase: don't use it in production enviroment. In fact I experienced a nasty bug [2], which had lead to an OOPS [2]. Even though I never seen a kernel crash o lost any valuable data. In any case I left my home under ext3.
Finally I have to highlight that I had problem even with ext4. But it was a Ubuntu bug (see ext4 and jaunty problem [3]).

Objectives:

For this study my goals were:
  • to create a snapshot
  • ability to access to the snapshot
  • possibility to switch the system to an old snapshot temporarily and permanently

before highlighting the solution, I have to introduce how the BTRFS tools manage the snapshot.

Creating and destroying snapshot


The BTRFS filesystem may be partitioned in "subvolumes". In fact when a BTRFS filesystem it is created, it consist of one volume called "." (dot). After the creation a BTRFS filesystem may be populated by other subvolumes. Every subvolume is placed in the filesystem and may be renamed, moved or destroyed.

btrfsctl is the tool that creates and destroy a subvolume:
  • Create a sub volume named "foo" under the directory /bar
    btrfsctl -S foo /bar

    It must be noted that the subvolume is like a directory. It may be renamed or moved (but not destroyed) with the "mv" command

  • Destroy a sub volume named "foo" under the directory /bar
    btrfsctl -D foo /bar

The subvolumes have two key properties:
  1. the ability to mount the subvolume directly via the "subvol=" mount option. For example supposing to have a subvolume named "foo" under the root of the filesystem, it is possible to mount the subvolume using the following syntax:
    mount -t btrfs -o subvol=foo /dev/sdX /mntPoint
    It must be noted that a subvolume may be mounted only if it is created under the root of the BTRFS filesystem.

  2. a subvolume may be snapshoted
    Remember: only a subvolume may be snapshoted. If you want to create a snapshot (named "foo") of the subvolume "bar", the syntax is:
    btrfsctl -s foo /bar

Moreover it must be noted that a snapshot of a subvolume doesn't touch a nested subvolume.
Pay attention to the fact that a snapshot doesn't create copy of the files. The copy is performed only if the file is updated from the original subvolume or from the snapshotted subvolume.

Filesystem layout


On the basis of the information of the paragraph above, I organized my filesystem as:
/                (root of the btrfs filesystem)
/rootfs (root of the filesystem)
/snap-YYYYMMDD (snapshot of the root of the filesystem)
/snap-YYYYMMDD (2nd snapshot of the root of the filesystem)
/snap-YYYYMMDD (another snapshot of the root of the filesystem)

Where "rootfs" is a subvolume containing the filesystem (/sbin, /bin, /usr, etc.); "snap-..." are snapshots of the "rootfs" subvolumes.
The key is that the system is contained in a subvolume; the BTRFS root is used only to manage the snapshot and is not mounted as root.

My "/etc/fstab" contain lines like:
/dev/sdX   /            btrfs subvol=rootfs,defaults 
/dev/sdX /var/btrfs btrfs subvol=.,defaults

And in my grub config file there is a line like:
kernel          /vmlinuz root=/dev/sdX ro rootflags=subvol=rootfs
Note the file above is strictly debian/ubuntu specific. In fedora the line above should be
kernel          /vmlinuz root=/dev/sdX ro rootfsflags=subvol=rootfs

How to snapshot


As explained above, the root of the BTRFS filesystem is placed in /var/btrfs. Under this directory there is the rootfs subvolume and its snapshots. If I want to a create new snapshot I have to do:
# cd /var/btrfs
# btrfsctl -s snap-<date> rootfs
If I want to access to an old file, then I can pick it from the snap-<date> subvolumes.

How to switch to a snapshot


There are two method to switch to a snapshot. In every case I have to reboot the machine
  1. rename the snapshot (permantely method)
    This method requires to rename the snapshot:
    # cd /var/btrfs
    # mv rootfs old-rootfs
    # mv snap-<date> rootfs
    Remember ? I said that a subvolume may be renamed with a simple "mv" command. During the next reboot the system will use the "rootfs" subvolume, which is the renamed "snap-<date>" subvolume.
  2. using a different subvolume ( temporarily method)
    This method requires to handle the grub boot entry. If I replace the part "subvol=rootfs" with "subvol=snap-<date>", the system will reboot with the old snapshot as filesystem.
    The entry may:
    1. be edited during the boot time (grub permits that)
    2. added as further boot entry in the grub menu list. So at the boot time the system leaves the user to select the real filesystem or an old snapshots

Conclusion


For my box(es) I created a script which handles the snapshot creation and deletion, and the adding of the entries in the grub config file.
I am studing how manage the home(s) in the subvolume. It may be useful to switch to a "system" snapshot without affecting the user home directories and viceversa. The idea is to create a subvolume per user. Every user should have the ability to create a snapshot of its home.
For desktop system it is easy and funny, for server system it has to be evaluated also the space consumption. If you remeber the snapshotting doesn't create a copy of file. The copy is performed only if the file is update (COW semantics). That means that if I change an already snapshotted file without touching the size, I create a copy of the old content even if I doesn't alter the file-size. This kind of problem are today uncommon, and it will require time to be fully understood and handled properly by the system administrators.


[1] See http://btrfs.wiki.kernel.org/index.php/Main_Page#Features
[2] See http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg03588.html
[3] See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/330824

5 commenti:

  1. This is very a interesting use of the btrfs.
    Since I believe that btrfs or ZFS is the future I set up one of my laptop with Ubuntu 9.10 and wanted to use btrfs as the root file system.

    It works well (except hypernation) if I don't use a snapshot or a subvolume to boot from. If I choose - as in your example - rootfs as the boot subvolume, HAL, dbus and rsyslog constantly crash.

    I do use the same kernel as you do (compared it to the thread in linux-btrfs of you) 2.6.31-17-generic.

    Did you do anything special, like patching the kernel?

    Cheers,
    Thomas

    RispondiElimina
  2. Hello Thomas,

    I used a patched btrfs module on the basis of the latest (at the time) btrfs git tree. But AFAICT that shouldn't matter. With the standard ubuntu module you lost only the possibility to delete a snapshot.
    However now I used the 2.6.32-10-generic + the git commit "3a1abec".

    One problem that I encountered was the incorrect permission bit in the root of the filesystem. As default btrfs creates a subvolume with the following permission mask

    $ sudo ls -ld subvolume/.
    total 0
    drwx------ 1 root root 0 2010-01-20 22:29 .

    That doesn't permit to the standard user to access the filesystem.

    BR
    G.Baroncelli

    RispondiElimina
  3. Hi Goffredo,

    You're a master :) ... that did indeed solve my problem. Set the permissons to drwxr-xr-x and of it goes.

    Strange thing was that it worked when I converted an ext3 to btrfs using the btrfs-convert tool.
    And I checked the permissons on the top level directories but it never came to my mind to check to root dir itself!

    I'm now on 2.6.32.3 and still had no luck with deleting snapshots or subvolumes. Have to go higher, I assume.

    Thanks alot!
    Thomas

    RispondiElimina
  4. The 2.6.32 has the support for the snapshot removal.

    The syntax of the btrfsctl utility is very ugly:

    btrfsctl -D 'snapname' 'dir'

    where
    'snapname' name of the snapshot/subvolume
    'dir' dir where is the snapshot/subvolume

    RispondiElimina
  5. I was indeed not getting the syntax, which is - as you mentioned - very ugly.

    Your proposal on the btrfs mailing list is a very good one and for sure necessary.

    Thanks for your help and patience.

    RispondiElimina

My little patches...

Below a list of my patches spread on different projects: Linux kernel [all] 2018-02-01 iversion: Rename make inode_cmp_iversion{+raw}...