sabato 29 novembre 2014

Metacity window manager: changing the window color border

Few months ago I started to use MATE [1] as desktop. I like it because it is simple and clean. My preferred theme is ClearLooks. Another setting that I adopt everywhere in linux is the so called mouseover window focusing coupled with the raise on click disabled.
Basically I like to work with windows even partially obscured. The focus has to be independent by the window placing.
To bring up a windows I (left) click on the title bar. To put back a window I have to middle-click the title bar.
Metacity, the default window manager of Mate, allows that and I am happy.
To work well I have to quick find the title bar. Unfortunately today the designers think that a title bar of an unfocused window is not an important thing, so its color is the same of the other GUI elements (like buttons)...
To solve this I had to make a small change to the theme file.
What I am describing here is valid for the Metacity window manager, and worked for Mate; but I suppose that it should work also for other desktop environments which use Metacity.
I started from the ClearLooks theme, but I suppose that the same applies for other themes too.
  1. As first step I copied the theme in my home:
    
    $ cd
    $ mkdir .themes
    $ cp -rf /usr/share/themes/ClearLooks .theme/ClearLooksGray
    $ mv 
    
    The name ClearLooksGray is an arbitrary name. Choice what you want.
  2. Now you have to edit the file .theme/ClearLooksGray/index.theme. You will notice that most of the lines are like Name[<country code>]=<something> or Comment[<country code>]=<something>. For simplicity I removed all these line until I got:
    
    [X-GNOME-Metatheme]
    Name=ClearlooksGray
    Type=X-GNOME-Metatheme
    Comment=Attractive Usability
    Encoding=UTF-8
    GtkTheme=Clearlooks
    MetacityTheme=ClearlooksGray
    IconTheme=gnome
    
    Basically I removed all the Name/Comment pairs in the different language and I leaved only the necessary ones. Then I update the Name and the MetacityTheme values to reflect the new name (ClearlooksGray).
  3. Then I changed the file .themes/ClearLooksGray/metacity-1/metacity-theme-1.xml: inside the tag <info>, I update the tags:
    • name: changed the value in ClearLooksGray
    • authors: added my name
    • description: update the value to reflect the new name

    Now the complex part. Inside the file you have to find a tag called <draw_ops name="bevel_unfocused">. This tag is responsible to drawing the unfocused title bar. Inside this tag there are the following lines:
    
            <gradient type="vertical" x="2" y="top_height/2" width="width-4" height="top_height/2-1">
                    <color value="shade/gtk:bg[NORMAL]/0.93"/>
                    <color value="shade/gtk:bg[NORMAL]/0.89"/>
            </gradient>
            <gradient type="vertical" x="2" y="2" width="width-4" height="top_height/2-2">
                    <color value="shade/gtk:bg[UNFOCUSED]/0.99"/>
                    <color value="shade/gtk:bg[UNFOCUSED]/0.95"/>
            </gradient>
    
    These have to be changed in:
    
            <gradient type="vertical" x="2" y="top_height/2" width="width-4" height="top_height/2-1">
                    <color value="shade/#cccccc/0.93"/>
                    <color value="shade/#cccccc/0.89"/>
            </gradient>
            <gradient type="vertical" x="2" y="2" width="width-4" height="top_height/2-2">
                    <color value="shade/#cccccc/0.99"/>
                    <color value="shade/#cccccc/0.95"/>
            </gradient>
    

    Basically I changed "gtk:bg[NORMAL]" in "#cccccc" and "gtk:bg[UNFOCUSED]" in "#cccccc". These changes have to be re-done also inside the tag:<draw_ops name="bevel_maximized_unfocused">. This tag is responsible to drawing the unfocused title bar for the maximized window. So the original code is:
    
            <gradient type="vertical" x="0" y="top_height/2" width="width" height="top_height/2-1">
                    <color value="shade/gtk:bg[NORMAL]/0.93"/>
                    <color value="shade/gtk:bg[NORMAL]/0.89"/>
            </gradient>
            <gradient type="vertical" x="2" y="2" width="width-4" height="top_height/2-2">
                    <color value="shade/gtk:bg[UNFOCUSED]/0.99"/>
                    <color value="shade/gtk:bg[UNFOCUSED]/0.95"/>
            </gradient>
    

    and ahve to changed in:
    
            <gradient type="vertical" x="0" y="top_height/2" width="width" height="top_height/2-1">
                    <color value="shade/#cccccc/0.93"/>
                    <color value="shade/#cccccc/0.89"/>
            </gradient>
            <gradient type="vertical" x="2" y="2" width="width-4" height="top_height/2-2">
                    <color value="shade/#cccccc/0.99"/>
                    <color value="shade/#cccccc/0.95"/>
            </gradient>
    

    #cccccc is the gray HTML color. You can choice a different color of course.
Below two screen shot which highlight the differences: the ClearLooksGray have a darker border that I like more.

ClearLooksGray

ClearLooks [standard]

venerdì 22 agosto 2014

Booting a powermac

Preamble


I bought an used powermac, I was ever interested to this architecture: both the OpenFirmware and the powerpc processor fascinating me.
The machine was an old dual PowerPc @ 1GHz, equipped with 512MB of ram. The model was the one called PowerMac MDD (Mirrored Drive Doors).


Because it was an old and used machine, the machine had some problems:
  • after few minutes, the system shut-down unconditionally, both under Mac OS X and Linux. I solved replacing the CPU-Board. I suspected that a temperature sensor drifted.
  • time to time I noticed that some application under OS X crashed; some packages under linux were currupted. I solved replacing the memory.
Apart these problems, I am happy. And the to be honest, I like to resolve these kinds of issues :-)

Upgrading Mac OS X


When I tried to upgrade Mac OS X from 10.3 to 10.4, I had to use an USB stick (even the dvd reader didn't worked well) : first I copied the DVD image to the USB stick, then I faced the problem on HOW boot the machine from USB.
On a pc (I.E. a x86 one), it would be sufficient to select from the boot menu the USB stick as source. But in this kind of Macintosh, the things are different .... and more interesting !

OpenFirmware


Instead of the BIOS, these machine have "OpenFirmware". From an user point of view, OpenFirmware seems less capable of a BIOS. But OpenFirmware has its points of forces: for example it is an environment fully programmable, and it is also capable to access file inside a filesystem.
In this post I will introduce some OpenFirmware concepts, but definitely it has not to be considerable a course on OpenFirmware.

Access OpenFirmware


To access the OpenFirmware interface, you have to press 4 keys: CMD + OPTION + O + F. To access the BIOS interface usually it is needed to press one key (typically DEL !).
But there is another more interesting way to access OpenFirmware: via telnet ! This fact is enough to consider OpenFirmware a lot better than a classic BIOS !
To allow the telnet access to the OpenFirmware console, once you pressed the CMD-OPTIONS-O-F keys, you have to do the following commands:
0 > dev /packages/telnet  ok
0 > " enet:telnet,192.168.10.47" io
Where 192.168.10.47 will be the IP used by the telnet server; you of course this IP have to be compatible with your network. In bold I highlights the prompt of OpenFirmware (typically 0 > and ok). Pay attention to the space after " (double quote).
Now to access OpenFirmware it is sufficient doing a telnet to the ip 192.168.10.47.

Devices tree


Usually on the OpenFirmware system, the hardware configuration are exported via an hierarchical structure which resemble a filesystem. This kind of structure is called device tree.
To see the device tree, you have to use the ls and dev commands. ls is like the ls unix command (or dir "dos" command); instead dev is like the cd command.

0 > dev /  ok
0 > ls
ff87f638: /cpus
ff87f8e8:   /PowerPC,G4@0 
ff87fd10:     /l2-cache
ff87ff48:       /l2-cache
ff880ba8:   /PowerPC,G4@1
ff880fd0:     /l2-cache
ff881208:       /l2-cache
ff881518: /chosen
ff881730: /memory@0
ff8819d0: /openprom
ff881b78:   /client-services
ff882e60: /rom@ff800000
ff883060:   /boot-rom@fff00000
ff883290:   /macos
ff883398: /options
ff8834a0: /aliases
ff8854a8: /packages
ff885588:   /deblocker
ff885f20:   /disk-label
ff8869e8:   /obp-tftp
ff8902c8:   /telnet       
ff890bc8:   /mac-parts
ff892478:   /mac-files
ff895430:   /hfs-plus-files
ff89a4b0:   /fat-files
ff89c2a8:   /iso-9660-files
ff89d108:   /bootinfo-loader
ff89ed78:   /xcoff-loader
ff89f810:   /pe-loader
ff8a0260:   /elf-loader
ff8a1908:   /usb-hid-class
ff8a4498:   /usb-ms-class
ff8a70e8:   /usb-audio-class
ff914c60:   /sbp2-disk
ff917820:   /ata-disk
ff919d60:   /atapi-disk
ff91bef0:   /bootpath-search
ff922870:   /terminal-emulator
ff922980: /firewire-disk-mode
ff9383b0: /pseudo-hid 
ff9384b0:   /keyboard
ff938ba8:   /mouse
ff939140:   /eject-key
ff939610: /pseudo-sound
ff939940: /multiboot
ff94d758: /diagnostics
ff94d838: /nvram@fff04000
ff94e488: /uni-n@f8000000
ff94e960:   /i2c@f8001000
ff94f478:     /fan@58
ff9502f8:     /i2c-hwclock@ca
ff950ae0:     /temp-monitor@92
ff9511c8:     /cereal
ff9518f8: /pci@f0000000
ff99cde0:   /uni-north-agp@b
ff99d0f0:   /ATY,PheonixParent@10
ff9b9048:     /ATY,Pheonix_A
ff9baa50:     /ATY,Pheonix_B
ff952b58: /pci@f2000000   
ff9555e8:   /mac-io@17
ff95c0c8:     /interrupt-controller@40000
ff95c318:     /gpio@50
ff95c548:       /extint-gpio1@9
ff95c838:       /programmer-switch@11
ff95cae0:       /gpio5@6f
ff95cd00:       /extint-gpio15@67
ff95cf88:       /gpio6@70
ff95d1a8:       /extint-gpio16@68
ff95d498:       /extint-gpio14@66
ff95d720:       /gpio12@76
ff95d938:       /gpio11@75
ff95db50:     /escc-legacy@12000
ff95ddc0:       /ch-a@12004
ff95dfc0:       /ch-b@12000
ff95e1c0:     /escc@13000
ff95e448:       /ch-a@13020
ff95ee70:       /ch-b@13000
ff95f808:     /i2s@10000  
ff95fa38:       /i2s-a@10000
ff95fd68:         /sound
ff960600:     /timer@15000
ff9607f8:     /via-pmu@16000
ff963ec0:       /pmu-i2c
ff964ce0:         /i2c-hwclock@1d4
ff965590:         /i2c-hwclock@1c8
ff965e10:       /rtc
ff966508:       /power-mgt
ff9bea18:         /usb-power-mgt
ff966818:     /i2c@18000
ff9672d0:       /cereal
ff967a00:       /deq
ff967b40:     /ata-4@1f000
ff96a880:       /disk
ff96b0c0:     /ata-3@20000
ff96de00:       /disk
ff97a580:   /usb@18
ff9bc960:     /hub@1      
ff9bcb68:       /mouse@1
ff9bce88:       /device@3
ff9bd000:         /keyboard@0
ff9bd390:         /interface@1
ff982208:   /usb@19
ff9bc630:     /disk@1
ff953dd8: /pci@f4000000
ff989fd0:   /ata-6@d
ff98d1b8:     /disk
ff98d7f0:   /firewire@e
ff9979f8:   /ethernet@f
ff9beee0:     /ethernet-phy
ff955038: /vsp@f9000000
ff955328:   /veo@f9080000
ff955488:   /veo@f9180000
 ok
In the list above note the PCI/AGP/USB bus and their children. Note also inside the "/packages" directory the packages

ff895430:   /hfs-plus-files
ff89a4b0:   /fat-files
ff89c2a8:   /iso-9660-files
as reporte above the Apple OpenFirmware is capable to access some filesystems. In this case these packages allow to access:
  • HFS+
  • FAT filesystem
  • ISO-9660 (the one used in the cdrom)
Pay also attention to the leaves "disk@NN" below ata-X and usbX: these represent the disks.

Another interesting command is devalias, which lists some devices and their alias:

0 > devalias 
pci0                /pci@f0000000
agp                 /pci@f0000000
pci1                /pci@f2000000
pci2                /pci@f4000000
ui2c                /uni-n/i2c
ui2c-serial         /uni-n/i2c/cereal
keyboard            /pseudo-hid/keyboard
mouse               /pseudo-hid/mouse
sound               /pseudo-sound
eject-key           /pseudo-hid/eject-key
nvram               /nvram
enet                /pci@f4000000/ethernet
fw                  /pci@f4000000/firewire
pci                 /pci@f2000000
usb0                /pci@f2000000/usb@18
usb1                /pci@f2000000/usb@19
mac-io              /pci@f2000000/mac-io@17
mpic                /pci@f2000000/mac-io@17/interrupt-controller
hd                  /pci@f4000000/ata-6@d/disk@0
cd                  /pci@f2000000/mac-io@17/ata-3@20000/disk@0
ide0                /pci@f2000000/mac-io@17/ata-3@20000/disk@0
ide1                /pci@f2000000/mac-io@17/ata-3@20000/disk@1
ultra0              /pci@f4000000/ata-6@d/disk@0
ultra1              /pci@f4000000/ata-6@d/disk@1
scca                /pci@f2000000/mac-io@17/escc/ch-a
sccb                /pci@f2000000/mac-io@17/escc/ch-b
ki2c                /pci@f2000000/mac-io@17/i2c
ki2c-serial         /pci@f2000000/mac-io@17/i2c/cereal
via-pmu             /pci@f2000000/mac-io@17/via-pmu
rtc                 /pci@f2000000/mac-io@17/via-pmu/rtc
pi2c                /pci@f2000000/mac-io@17/via-pmu/pmu-i2c
wireless            /pci@f2000000/mac-io@17/@30000
ultra2              /pci@f2000000/mac-io@17/ata-4@1f000/disk@0
ultra3              /pci@f2000000/mac-io@17/ata-4@1f000/disk@1
cd1                 /pci@f2000000/mac-io@17/ata-3@20000/disk@1
fan                 /uni-n/i2c/fan
veo0                /vsp@f9000000/veo@f9080000
veo1                /vsp@f9000000/veo@f9180000
last-boot           /pci@f4000000/ethernet@f
screen              /pci@f0000000/ATY,PheonixParent@10/ATY,Pheonix_A
 ok

Typically hd is the first disk, ultra1 is the secondary disk; usb0/usb1 are the usb ports (my powermac has only two usb ports !).

Give a more deeper look to the usb interface. From the device-tree list above, we know that under usb0 (/pci...usb@18) there is the apple mouse and keyboard. Instead under usb1 (/pci...usb@19) there is a disk.
To access a disk the full path is like:
/pci@f2000000/usb@19/disk@1
( or /pci@f4000000/ata-6@d/disk@0 considering an ata disk)
Adding ":nn" at the end of these "path", we identify a partition; as example the dir command will show the content of a filesystem inside the 2nd partition:

0 > dir /pci@f2000000/usb@19/disk@1:2,\ 

     Size/        GMT                      File/Dir
     bytes   date     time   TYPE CRTR     Name
        82  3/21/ 5  4:25:52              ._Install%20Mac%20OS%20X
     12292 10/10/ 5 22: 4:45              .DS_Store
            8/ 4/14 17:53:21              .Spotlight-V100
            8/ 4/14 18:49:47              .Trashes
            3/21/ 5  0: 1:29              .vol
            3/28/ 5  5:26:30              Applications
            3/28/ 5  6:42:12              bin
            3/23/ 5  2:36:40              dev
        11  3/28/ 5  6:23:29  slnk rhap   etc
            3/28/ 5  4:21:28              Install%20Mac%20OS%20X
            3/18/ 5 21: 4:34              Japanese%20-%20%u65e5%u672c%u8a9e
            3/28/ 5  6:32:40              Library
        11  3/28/ 5  6:23:29  slnk rhap   mach
   4313028  3/26/ 5 22:15:41              mach_kernel
            3/26/ 5  1: 0: 2              Optional%20Installs.mpkg
            3/28/ 5  6:23:29              private
            3/21/ 5  3: 7:27              Read%20Before%20You%20Install.app
            3/28/ 5  6:41:12              sbin
            3/28/ 5  5:26: 8              System
        11  3/28/ 5  6:23:31  slnk rhap   tmp
            3/28/ 5  5:30:43              usr
        11  3/28/ 5  6:23:32  slnk rhap   var
            3/21/ 5  0: 1:29              Volumes
            3/21/ 5  3: 7:27              Welcome%20to%20Tiger.app
            3/28/ 5  5:19:56              HFS+%20Private%20Data
 ok
Note the using of the backslash as filesystem root (or path separator) after a comma.
To look inside a directory:

0 >dir /pci@f2000000/usb@19/disk@1:2,\System 

     Size/        GMT                      File/Dir
     bytes   date     time   TYPE CRTR     Name
            3/28/ 5  5:30:51              Installation
            3/28/ 5  7:13:30              Library
 ok

To give a look to the boot loader

0 > dir /pci@f2000000/usb@19/disk@1:2,\System\Library\CoreServices 
     Size/        GMT                      File/Dir
     bytes   date     time   TYPE CRTR     Name
        82  3/28/ 5  6:55:33              ._Volume%20Name%20Icon
      1445  8/ 4/14 18:49:47  tbxj chrp   .disk_label
        23  8/ 4/14 18:49:47              .disk_label.contentDetails
    174276  3/28/ 5  6:55:35  tbxi chrp   BootX
            3/26/ 5  5:31:25              CharacterSets
....

Where BootX is the Mac OS X boot loader (note the "tbxi" in the 4th column).

Booting

To boot from a usb stick, we have to do:

0 > boot /pci@f2000000/usb@19/disk@1:2,\System\Library\CoreServices\BootX
Where:
  • boot is the command to boot (which else ?)
  • /pci@f2000000/usb@19/disk@1 is the path which identifies a USB stick.
  • :2 this is the partition which contains the filesystem where is the boot loader
  • , (comma) is a separator (which else ?)
  • \System\Library\CoreServices\BootX is the path of the bootloader inside the filesystem.
Few notes:
  • Note the use of the slash (/) as separator in the device tree elements, instead it is used the backslash as separator of the elements of a filesystem path
  • the path of the disk and the partition number may vary depending of your configuration and disk partitioning.

To simplify a bit the things,we can use also the devalias aliases. For example we can do:

0 > dir usb1/disk@1:2,\System\Library\CoreServices
because usb1 is an .. alias of /pci@f2Note the use of the backslash as separatator instead of the slash, which is used as separator of the device-tree000000/usb@19 (see above). So if we would look at the 9th partition of the firs disk (alias hd) do:

0 > dir hd:9,\
Where "hd" identifies the first disk, ":9" the 9th partition, and ",\" identifies the root of the filesystem.

When using the "boot" command, instead of a fully path, you can pass "\\:tbxi". "\\" means the system folder, and ":tbxi" means a file type "tbxi". "\\:tbxi" is an alias for th Apple bootloader (which is under a system folder and it is marked as tbxi file type). But these are Apple/hfs+ specific things.

Conclusion


This post highlighted some concepts about the devices in the OpenFirmware. Also it has shown how boot a PowerMac from an usb stick.
In my opinion OpenFirmware is a very flexible firmware, and has a lot of capability.
I hoped to help a bit to understand how manage these capabilities.

Goffredo Baroncelli
- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

domenica 15 giugno 2014

BTRFS and systemd-journal

Preface


The BTRFS filesystem and Systemd are two news projects in the Linux eco-system. Systemd is the new init system, which is becoming default for the most distributions; in fact even Debian and Ubuntu have program to switch to it.
The same is true for the BTRFS filesystem, which is considered the next gen linux filesystem.
In this post I will analyse a performance problem which happens when the Systemd log system (journald) is used on a Btrfs filesystem.

Introduction


Recently I switched from Debian to Fedora, because I was interested in Systemd, and the one shipped by Debian was quite old (204 vs 208). From the beginning I noticed that the Fedora boot time was longer than the Debian one. Initially I thought the better Debian performances were due to the fact that Debian is more light than Fedora (less number of service enabled). But after some tailoring the boot time still was longer.
However I didn't care it enough, because the boot is performed only one or two times per day. So this didn't hurt me too much
But recently I looked at this bug Bug 1006386 - Journal flushing often slow, can prevent system booting correctly[1]. Here was reported a performance problem due to the flushing of the journal on the permanent storage. So I decided to investigate a bit more deeply this issue.

Systemd journal


Systemd introduced a new log daemon called journald. It has some nice properties, one of the most important is that it is started from the beginning (it is even present in the initramfs). Due to the fact that during the early stages of the boot, there is no availability of a persistent storage, all the log were stored in a tmpfs filesystem, and only when a permanent storage is available all the information are flushed to the disk.
It must be noticed that other option are available, but this is the Fedora default.
It seems that during this flushing, all daemons which are trying to log something are blocked by journald.
My tests revealed that this flushing may requires a long time. This is due to the BTRFS COW filesystem nature which doesn't behave well with the log file structure of Journald.
However defragmenting the log file of Journald reduces the boot time.
Doing
# btrfs filesystemd defrag /var/log/journal/189323cd4cc348159b9fd5b32b566b05/system.journal
leaded a boot time reduction by 20 second on three different machines. Is an huge value, which suggested me to perform further tests. (note 189323cd4cc348159b9fd5b32b566b05 is the machine-id and it is likely different on each machine).
Of course I have to point out that this results was due to the bad interaction between BTRFS and the Journald log file. Other setup might lead to different results.

My tests


I took an old machine (a P4 2.5GHz with 512MB of ram) where was present a fresh installation of a Fedora 20 and I measured the boot time during several reboots (up to 70). The results were very impressive. I tested the following scenarios
  1. standard (without defragmenting any file, without readahead)
  2. defragment the journal file at the end of the boot
  3. defragment the journal file before the flushing
  4. mark as NOCOW the journald log file
  5. enable the systemd-readahead
  6. remove the fsync(2) call from journald
  7. remove the posix_fallocate(3) call from journald
  8. do a defrag when posix_fallocate(3) is called
  9. do a defrag when the journal log file is opened
Each batch of tests started with an empty log file. The time measured was the boot time as reported by systemd-analyze (the userspace time). On each chart was also reported the number of extents as reported by the filefrag command

1) Standard
This test was performed without any strategies to mitigate the performance problem. I repeated the reboot about 70 times, doing two defrag in the middle (at test #37 and #53). At the beginning the boot time was less than 20 seconds, then it increases up to 55-60 seconds. Defragmenting the journal file helped, reducing the boot time of about 20 seconds. What was impressive is the number of extents: at the end of test these were near 8000. The journald log file size was 64MB.


2) Defragment the journal at the end of the boot
The test was performed doing a defrag at the end of each boot (after 30 seconds). The number of reboots were 56. The boots times were always between 15s and 20s. The number of extents were between 2000 and 3000 extents.

3) Defragment the journal before flushing the data to the disk
The test was performed doing a defrag before flushing the data to the disk. The number of reboots were 60. The boots times were between 15s and 20s. The number of extents were between 2000 and 3000.

4) Mark NOCOW the journald log file
This test was performed marking the journal file NOCOW. This disable the COW behaviour increasing the speed. Unfortunately this also removes the checksum of the file. The boots times were a bit less than the previous tests. The extents were two orders of magnitudes lower (30 vs 3000).

5) Enable the systemd-readahead
All test were performed with systemd-readahead disabled. Then Kai Krakow [2] pointed me that the systemd readahead service is able to defrag. So I tested also this configuration. The result was very bad like the first one. The reason is that systemd readahead doesn't take care of the Journald log file for some reason. Further investigation is needed.
Update 2014-06-17: systemd-readahead doesn't consider file bigger than 10M; this is the reason because the system.journal is never defragmented.


6) Remove the fsync(2) system call from journald
I tweaked the source of systemd-journald removing the fsync(2) call. But after 36 reboot I didn't notice any improvement, with the exception of a smaller number of extents.

7) Remove the posix_fallocate(3) system call from journald

I tweaked the source of systemd-journald removing the posix_fallocate(3) call. But after 34 reboots I didn't notice any improvement: the boot time was up to 40s and the number of extents was greater than the previous test.


8) Do a defrag before calling the posix_fallocate(3) system call

I tweaked the source of systemd-journald, so before the posix_fallocate(3) call the file is defragmented. The chart shows the decrement of the number of extents when the posix_fallocate(3) is called. However the boot time reached a (bad) value of 50 seconds.

9) Do a defrag when the journal file is opened

I tweaked the source of systemd-journald in order to defrag the journal file each time it is opened. This test reached the same (good) result of the test #3; the boots times is between 15 and 30 seconds.

Conclusions


My tests confirmed the bad interaction between the Systemd log daemon with the BTRFS filesystem. The log file fragments quickly and the performance decrease (see test #1).
Doing a periodic defragmentation, the boots times don't increase too much (See test #2, #3 and #9), both if the defragmentation is performed before the journal flushing and if it is performed at the end of the boot.
Another good strategy is to mark the file NOCOW [3] (test #4); however it must be pointed out that also the checksumming protection is lost. This could be a limit in a multi-disk (RAID) btrfs filesystem scenario, because the checksum is used to discard a corrupted sector.
Systemd implements, in its readahead-* helpers, a defrag strategy which would alleviate this kind of problem. But I was not able to get it working properly (test #5). Further investigation is needed. However I discovered systemd-readahead ignores files greater than 10M. To understand that I had look at the code.
I also tried to change the source of systemd-journald removing the fsync() or the posix_fallocate() call, to verify if these are a cause of the problem. But the tests #6 and #7 seem to suggests that the problem is elsewhere.
In the last tests (#8 and #9) I tweaked the source in order to do the defrag from Journald daemon. Doing a defrag during the open of the journal file seems to have the same (good) results got in the test #3.

I decided to adopt a strategy like the test #2: I added a new job which defrags all the files greather than 10MB under /var each day. So I covered other cases were the files are higly fragmented. Below my .service and .timer systemd unit. Be aware that I am not a systemd expert, so I am open to suggestion on how improve these units.

# file defrag-var.service
[Unit]
Description=Defrag the /var subdirs

[Service]
Type=simple
ExecStart=/bin/bash -c 'find /var/ -xdev -size +5M | xargs -l btrfs fi defrag'


# file /etc/systemd/system/defrag-var.timer 
[Unit]
Description=Defrag the /var subdirs

[Timer]
OnBootSec=1m
OnUnitActiveSec=1d

[Install]
WantedBy=multi-user.target


References


My little patches...

Below a list of my patches spread on different projects: Linux kernel [all] 2018-02-01 iversion: Rename make inode_cmp_iversion{+raw}...