domenica 27 maggio 2012

An 8E filesystem

Update 2012-10-06.

Introduction

This small post shows how create a 8 exabyte filesystem on a linux box.
8 Exabyte are 263 bytes. See this link [1] to wikipedia for knowing more on the number prefix. Anyway 8 Exabyte are 8192 Petabyte, which are 8388608 Terabyte..
Of course you we not be able to obtain a real 8E filesystem, but you will be able to simulate it.

Prerequisites

The following prerequisites are needed:
  • a Linux box with a modern kernel ( I tested with a 3.4 kernel)
  • a BTRFS filesystem,

Creating the file system

The idea is to create a "sparse" file[3], to mount it with the loopback device a then to create the filesystem.
  • Create the "sparse" file
    dd if=/dev/zero of=8E-file bs=1 count=1 seek=$(((1<<63)-2))
    1+0 records in
    1+0 records out
    1 byte (1 B) copied, 0.000268343 s, 3.7 kB/s
    ghigo@venice:/tmp$ ls -lh 8E-file 
    -rw-r--r-- 1 ghigo ghigo 8.0E Dec 25 23:15 8E-file
    
    The trick here is the option seek of the command dd. This option move the file pointer forward without writing (and without consuming)any byte.

  • Formatting the "sparse" file
    ghigo@venice:/tmp$ /sbin/mkfs.btrfs 8E-file
    WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
    WARNING! - see http://btrfs.wiki.kernel.org before using
    
    fs created label (null) on 8E-file
     nodesize 4096 leafsize 4096 sectorsize 4096 size 8.00EB
    Btrfs Btrfs v0.19
    
  • Creating the loopback device
    ghigo@venice:/tmp$ sudo losetup -f 8E-file 
    ghigo@venice:/tmp$ sudo losetup -a
    /dev/loop0: [0012]:583736 (/tmp/8E-file)
    
    It must be point out to "formatting" the file and not the loopback device. This because the mkfs.btrfs issues a BLKDISCARD ioctl which hangs mkfs.btrfs when the loopback device is used (or may be it requires a bit of time to process 8EB of data :-) ).
    Update 2012-10-06: Now mkfs.btrfs has the option '-T' to avoid issuing a BLKDISCARD ioctl.

  • Mount the loopback device and test it
    ghigo@venice:/tmp$ sudo mount /dev/loop0 /mnt/test
    ghigo@venice:/tmp$ df -h /mnt/test
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/loop0      8.0E   56K  8.0E   1% /mnt/test
    

The maximum btrfs file size

The btrfs filesystems specifications state that the maximum file size is 16EB, the same limits is applied to the filesystem size.
However the linux kernel has a lower limit. In fact I was never able to create a file (even sparse) greater than 8EB. In the file header include/linux/fs.h of the linux kernel is reported:

  /* Page cache limit. The filesystems should put that into their
     s_maxbytes limits, otherwise bad things can happen in VM. */

  #if BITS_PER_LONG==32

  #define MAX_LFS_FILESIZE \
 (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)

  #elif BITS_PER_LONG==64

  #define MAX_LFS_FILESIZE 0x7fffffffffffffffUL

  #endif
This means that in a x86 environment (BITS_PER_LONG == 32) the maximum file size is about 8TB; instead in a x86-64 bit machine, the limit is 8EB ( == 0x7fffffffffffffff)

Note

This idea of post started from a message in the btrfs mailing list [2].

Reference

My little patches...

Below a list of my patches spread on different projects: Linux kernel [all] 2018-02-01 iversion: Rename make inode_cmp_iversion{+raw}...