Bcache

Last modified by Sebastian Marsching on 2022/03/27 13:57

Setting up Bcache on an existing system

When setting up Bcache on an existing system, there are two possible ways: One can try to migrate the existing partitions using blocks or one can create new partitions and migrate data to them.

In my case, I wanted to enable for a whole LVM volume group (effectively caching the physical volume), so blocks did not work for me. However, the existing volume group still had a lot of free space, so I create a logical volume that acts as the physical volume for a new volume group and used Bcache with that logical volume.

Here is the step-by-step guide of what I did (on a system running Ubuntu 16.04 LTS). On the original system, I have one volume group (with the name vg0) that is backed by a single physical volume (/dev/md1). The device that is supposed to be used as a cache device is /dev/md2. All volume groups use the default extent size of 4 MiB.

First, I create the logical volume that serves as the backing device for the new bcache device:

lvcreate -L 32G -n vg2-backend vg0

I also create a volume group for the cache device. This is not necessary, but it makes the setup a bit more flexible (e.g. it is possible to put the swap partition completely on the cache device).

pvcreate /dev/md2
vgcreate vg1 /dev/md2
lvcreate -L 4G -n vg2-cache vg1

Now, we have a backing device (/dev/vg0/vg2-backend) and a cache device (/dev/vg1/vg2-cache) that can be used to create a bcache device.

The backing device resides on a (Linux) RAID-6 array with a chunk size of 512 KiB (and four drives, so a stripe size of 1 MiB). I am not so sure whether the block size matters so much, but according to the bcache documentation it is very important that the data offset (the position where the data starts after the super-block) is a multiple of the RAID stripe size. I chose an offset of 4 MiB, so the data is not only aligned to the stripe size, but also to the LVM extents. Please note that unlike the other numbers, the offset is specified as the number of 512 bytes sectors, not as the number of bytes. I am not sure whether this matters, but it for sure does not hurt. The bucket size is most likely not used for a backing device, but I still specified it to be the same as the block size.

make-bcache -B -b 524288 -w 524288 -o 8192 /dev/vg0/vg2-backend

The caching device is a RAID-1 of two SSDs, so there is no relevant chunk or stripe size. For this reason, I use a block size of 4 KiB, which should be okay for most SSDs. The bucket size should be chosen according to the SSDs erase block size. Unfortunately, I do not know the erase block size, so I simply choose 4 MiB because it seems that there are currently no SSDs with a larger erase block size. Regarding the data offset, I again wanted it to be aligned with the LVM extends (even though this most likely does not matter):

make-bcache -C -b 4194304 -w 4096 -o 8192 /dev/vg1/vg2-cache

Finally, the cache device has to be registered with the bcache device created for /dev/vg0/vg2-backend. If there are no other bcache devices, this should be /dev/bcache0. We can find out the cache-set UUID of the caching device by issuing

bcache-super-show /dev/vg1/vg2-cache | grep cset.uuid

We then send this UUID to /sys/block/bcache0/bcache/attach:

echo UUID >/sys/block/bcache0/bcache/attach

By default, the bcache device works in write-through mode. If we want to enable write-back mode, we have to do this explicitly:

echo writeback >/sys/block/bcache0/bcache/cache_mode

Now, we can create the physical volume for our new volume group. We use a data-alignment of 4 MiB, so that we effectively are aligned with the LVM extents of the original physical volume.

pvcreate --dataalignment 4m /dev/bcache0
vgcreate vg2 /dev/bcache0

The new volume group can be used to create logical volumes in the usual way. The data from the existing logical volumes can be copied over to the new ones (e.g. using dd). When booting the system from a live CD (e.g. to move partitions that would otherwise be in use), beware of two problems when using Ubuntu 16.04 LTS (Xenial Xerus): First, the server installer CD does not contain the bcache kernel module, so it is not possible the data on the bcache device. Second, the desktop live DVD contains the module, but the kernel on the DVD (at least for Ubuntu 16.04.1 LTS) has a bug, that in my case made it impossible to activate the bcache device. For these rasons, I recommend using a live DVD with Ubuntu 16.10.

When moving the root file-system to the new volume group, one has to be aware of a problem with the initial RAM FS. The scripts that are part of the lvm2 package and that are included by initramfs-tools only try to activate the volume holding the root file-system. With the kind of setup described in this guide, this is not sufficient because that volume can only be activated after activating the backend and cache volumes for the bcache device. For this reason, I added a script /etc/initramfs-tools/scripts/local-top/lvm2-custom:

PREREQ="mdadm mdrun multipath"

prereqs()
{
       echo "$PREREQ"
}

case $1 in
prereqs)
        prereqs
       exit 0
        ;;
esac

if [ ! -e /sbin/lvm ]; then
       exit 0
fi

lvchange_activate() {
        lvm lvchange -aay -y --sysinit --ignoreskippedcluster "$@"
}

lvchange_activate /dev/vg0/vg2-backend
lvchange_activate /dev/vg1/vg2-cache
lvchange_activate /dev/vg2/root

The last line would not be necessary if this script was run before the lvm2 script, but there is no way to guarantee this, so simply adding it to this script is easier.

After adding the script and making it executable, one has to update the initial RAM FS:

update-initramfs -u -k all

After making more space in volume group vg0 the backing device can be grown with lvresize. After a reboot, the new size is automatically reflected by the /dev/bcache0 device and the space can be made available to vg2 by running pvresize /dev/vg0/vg2-backend.