Wiki source code of Bcache

Last modified by Sebastian Marsching on 2022/03/27 13:57

Show last authors
1 ## Setting up Bcache on an existing system
2
3 When setting up Bcache on an existing system, there are two possible ways: One can try to migrate the existing partitions using [blocks](https://github.com/g2p/blocks) or one can create new partitions and migrate data to them.
4
5 In my case, I wanted to enable for a whole LVM volume group (effectively caching the physical volume), so blocks did not work for me. However, the existing volume group still had a lot of free space, so I create a logical volume that acts as the physical volume for a new volume group and used Bcache with that logical volume.
6
7 Here is the step-by-step guide of what I did (on a system running Ubuntu 16.04 LTS). On the original system, I have one volume group (with the name `vg0`) that is backed by a single physical volume (`/dev/md1`). The device that is supposed to be used as a cache device is `/dev/md2`. All volume groups use the default extent size of 4 MiB.
8
9 First, I create the logical volume that serves as the backing device for the new bcache device:
10
11 ```bash
12 lvcreate -L 32G -n vg2-backend vg0
13 ```
14
15 I also create a volume group for the cache device. This is not necessary, but it makes the setup a bit more flexible (e.g. it is possible to put the swap partition completely on the cache device).
16
17 ```bash
18 pvcreate /dev/md2
19 vgcreate vg1 /dev/md2
20 lvcreate -L 4G -n vg2-cache vg1
21 ```
22
23 Now, we have a backing device (`/dev/vg0/vg2-backend`) and a cache device (`/dev/vg1/vg2-cache`) that can be used to create a bcache device.
24
25 The backing device resides on a (Linux) RAID-6 array with a chunk size of 512 KiB (and four drives, so a stripe size of 1 MiB). I am not so sure whether the block size matters so much, but according to the [bcache documentation](https://www.kernel.org/doc/Documentation/bcache.txt) it is very important that the data offset (the position where the data starts after the super-block) is a multiple of the RAID stripe size. I chose an offset of 4 MiB, so the data is not only aligned to the stripe size, but also to the LVM extents. Please note that unlike the other numbers, the offset is specified as the number of 512 bytes sectors, not as the number of bytes. I am not sure whether this matters, but it for sure does not hurt. The bucket size is most likely not used for a backing device, but I still specified it to be the same as the block size.
26
27 ```bash
28 make-bcache -B -b 524288 -w 524288 -o 8192 /dev/vg0/vg2-backend
29 ```
30
31 The caching device is a RAID-1 of two SSDs, so there is no relevant chunk or stripe size. For this reason, I use a block size of 4 KiB, which should be okay for most SSDs. The bucket size should be chosen according to the SSDs erase block size. Unfortunately, I do not know the erase block size, so I simply choose 4 MiB because it seems that there are currently no SSDs with a larger erase block size. Regarding the data offset, I again wanted it to be aligned with the LVM extends (even though this most likely does not matter):
32
33 ```bash
34 make-bcache -C -b 4194304 -w 4096 -o 8192 /dev/vg1/vg2-cache
35 ```
36
37 Finally, the cache device has to be registered with the bcache device created for `/dev/vg0/vg2-backend`. If there are no other bcache devices, this should be `/dev/bcache0`. We can find out the cache-set UUID of the caching device by issuing
38
39 ```bash
40 bcache-super-show /dev/vg1/vg2-cache | grep cset.uuid
41 ```
42
43 We then send this UUID to `/sys/block/bcache0/bcache/attach`:
44
45 ```bash
46 echo UUID >/sys/block/bcache0/bcache/attach
47 ```
48
49 By default, the bcache device works in write-through mode. If we want to enable write-back mode, we have to do this explicitly:
50
51 ```bash
52 echo writeback >/sys/block/bcache0/bcache/cache_mode
53 ```
54
55 Now, we can create the physical volume for our new volume group. We use a data-alignment of 4 MiB, so that we effectively are aligned with the LVM extents of the original physical volume.
56
57 ```bash
58 pvcreate --dataalignment 4m /dev/bcache0
59 vgcreate vg2 /dev/bcache0
60 ```
61
62 The new volume group can be used to create logical volumes in the usual way. The data from the existing logical volumes can be copied over to the new ones (e.g. using `dd`). When booting the system from a live CD (e.g. to move partitions that would otherwise be in use), beware of two problems when using Ubuntu 16.04 LTS (Xenial Xerus): First, the server installer CD does not contain the bcache kernel module, so it is not possible the data on the bcache device. Second, the desktop live DVD contains the module, but the kernel on the DVD (at least for Ubuntu 16.04.1 LTS) has a bug, that in my case made it impossible to activate the bcache device. For these rasons, I recommend using a live DVD with Ubuntu 16.10.
63
64 When moving the root file-system to the new volume group, one has to be aware of a problem with the initial RAM FS. The scripts that are part of the `lvm2` package and that are included by `initramfs-tools` only try to activate the volume holding the root file-system. With the kind of setup described in this guide, this is not sufficient because that volume can only be activated after activating the backend and cache volumes for the bcache device. For this reason, I added a script `/etc/initramfs-tools/scripts/local-top/lvm2-custom`:
65
66 ```bash
67 PREREQ="mdadm mdrun multipath"
68
69 prereqs()
70 {
71 echo "$PREREQ"
72 }
73
74 case $1 in
75 prereqs)
76 prereqs
77 exit 0
78 ;;
79 esac
80
81 if [ ! -e /sbin/lvm ]; then
82 exit 0
83 fi
84
85 lvchange_activate() {
86 lvm lvchange -aay -y --sysinit --ignoreskippedcluster "$@"
87 }
88
89 lvchange_activate /dev/vg0/vg2-backend
90 lvchange_activate /dev/vg1/vg2-cache
91 lvchange_activate /dev/vg2/root
92 ```
93
94 The last line would not be necessary if this script was run before the `lvm2` script, but there is no way to guarantee this, so simply adding it to this script is easier.
95
96 After adding the script and making it executable, one has to update the initial RAM FS:
97
98 ```
99 update-initramfs -u -k all
100 ```
101
102 After making more space in volume group `vg0` the backing device can be grown with `lvresize`. After a reboot, the new size is automatically reflected by the `/dev/bcache0` device and the space can be made available to `vg2` by running `pvresize /dev/vg0/vg2-backend`.