Encrypted Linux setup across an SSD and an HDD

, .

This post describes how you can install an encrypted Linux system across an SSD and an HDD, utilizing the SSD’s speed without a complicated partition setup. The storage setup we’re going to use looks something like this:

/boot swap /
dm-crypt LUKS2
SwapVG RootVG
/dev/sda1 /dev/sda2 /dev/sda3 /dev/sdb1
/dev/sda (smallish SSD) /dev/sdb (largish HDD)
Storage layout, from physical (bottom) to mount points (top). Widths not proportional.

The first partition on the SSD is the ESP, which we directly use as /boot. We also have a swap partition on the SSD, whose encryption is reinitialized on each boot, and the rest of the space is shared with the HDD to form one large LVM Volume Group, onto which the rest of the system is installed within a LUKS encryption layer. To make use of the SSD’s improved speed, we set up a cache LV within the root VG, where the data is mainly stored on the HDD but the SSD is used to cache frequently used blocks and metadata. A more detailed view of this part of the layout is:

cryptroot
cryptroot_cache
cryptroot_cache_cmeta cryptroot_cache_cdata cryptroot_corig
RootVG
/dev/sda3 /dev/sdb1
Legend: PV, VG, LV (internal). Widths still not proportional.

A note before we start: I used this setup to install a system with Arch Linux, which is a distribution with a manual, shell-based installation process. The basic setup should work for any other distribution as well, but I don’t know how well it integrates with other installers, especially graphical ones.

Also, you’ve backed up everything, right? ☺

Partitioning

First, we need to produce the physical partitions to use. Do this with any partition manager you like; I found cgdisk to be fairly pleasant to use, but if you’re in a graphical setup during this step, gparted is even better.

In any event, you should end up with the following partitions (using GPT, not MBR):

LVM setup

We can start with the swap setup, since that’s easier.

Swap

Initialize the Physical Volume:

pvcreate /dev/sda2

Create the Volume Group:

vgcreate SwapVG /dev/sda2

Create the Logical Volume:

lvcreate -l 100%FREE -n cryptswap SwapVG

Root

Initialize the Physical Volumes:

pvcreate /dev/sda3 /dev/sdb1

Create the Volume Group:

vgcreate RootVG /dev/sda3 /dev/sdb1

2021-07-27 note: Recent versions of LVM may warn at this step about different block sizes on the two PVs. If I understand correctly, mixing different block sizes in a regular LV is not supported, but for a cache LV, it seems to work.

Create the “origin” Logical Volume, only on the HDD:

lvcreate -l 100%PVS -n cryptroot RootVG /dev/sdb1

Create the “cache pool” Logical Volume, only on the SSD:

lvcreate --type cache-pool -l 100%PVS -n cryptroot_cache RootVG /dev/sda3

Combine them into a single cache Logical Volume:

lvconvert --type cache --cachepool RootVG/cryptroot_cache RootVG/cryptroot

2022-07-03 note: I would now suggest adding --cachemode writeback to this command; see my follow-up blog post.

Encryption setup

Set up LUKS on the new root Logical Volume:

cryptsetup luksFormat --type luks2 /dev/RootVG/cryptroot

Unlock it so we can work with it:

cryptsetup open /dev/RootVG/cryptroot root

Create the file system (choose what you want, I like btrfs):

mkfs.btrfs -L / /dev/mapper/root

Mount it somewhere so you can continue the rest of the installation:

mount /dev/mapper/root /mnt

Boot setup

We also need to set up the ESP:

mkfs.fat -F32 -n BOOT /dev/sda1

And mount it under /mnt for the rest of the installation:

mkdir /mnt/boot
mount /dev/sda1 /mnt/boot

Installation

At this point, you should be ready to continue with the installation of your system: install basic packages into /mnt (debootstrap/pacstrap/…), chroot there, set up locale, hostname, non-root users, bootloader, etc.

At some point, you will need to register the swap Logical Volume. Append the following line to /etc/crypttab:

swap	/dev/SwapVG/cryptswap	/dev/urandom	swap,cipher=aes-xts-plain64,size=256

And the following line to /etc/fstab:

/dev/mapper/swap	none	swap	sw	0	0

You also need to set up the kernel command line so that the root volume can be decrypted by the initramfs. The process for this varies by initramfs, but in systemd-based ones you need to add this to the kernel command line:

rd.luks.name=UUID=root root=/dev/mapper/root

where UUID is the UUID of the cryptroot Logical Volume, which you can discover with the following command:

find /dev/disk/by-uuid/ -lname "../$(readlink /dev/RootVG/cryptroot)" -printf '%f\n'

And that’s it! After the installation, a reboot should hopefully drop you into the new, encryted system. If it doesn’t work, you’ll need to get into some kind of rescue shell (probably from the installation medium) and debug from there. You can find the commands to unlock and mount the volumes above, in the “encryption setup” and “boot setup” sections (skip the luksFormat and mkfs parts, of course).

Full Arch Linux Installation

What follows is the notes I took while installing my own system using the setup described above. I’m mainly including them for my own future reference, but perhaps you’ll find parts of them useful as well.

Booting

Download the ISO, write it to a USB flash drive, boot from it.

I use the Neo keyboard layout, so first of all load that.

wget lucaswerkmeister.de/neo.map.gz
loadkeys neo

Partitioning

Used cgdisk to produce partition layouts:

Afterwards, I needed a partprobe /dev/sda to make the kernel re-read the SSD’s partition table. (For /dev/sdb, this was not necessary – apparently cgdisk tries to do that itself but failed in the case of /dev/sda? Not sure.)

LVM setup

General structure:

Swap

pvcreate /dev/sda2
vgcreate SwapVG /dev/sda2
lvcreate -l 100%FREE -n cryptswap SwapVG

Root

pvcreate /dev/sda3
pvcreate /dev/sdb1
vgcreate RootVG /dev/sda3 /dev/sdb1
lvcreate -l 100%PVS -n cryptroot RootVG /dev/sdb1
lvcreate --type cache-pool -l 100%PVS -n cryptroot_cache RootVG /dev/sda3
lvconvert --type cache --cachepool RootVG/cryptroot_cache RootVG/cryptroot

Encryption setup

We’re not getting into fstab/crypttab territory yet, that’s later in the installation process.

cryptsetup luksFormat --type luks2 /dev/RootVG/cryptroot
cryptsetup open /dev/RootVG/cryptroot root
mkfs.btrfs -L / /dev/mapper/root
systemd-mount --discover /dev/mapper/root /mnt

Boot setup

The VFAT file system on the ESP actually survived the repartitioning (I guess the old and new partition started at the same offset, and I never wiped it), but let’s recreate it to avoid confusion.

mkfs.fat -F32 -n /boot /dev/sda1
mkdir /mnt/boot
systemd-mount --discover /dev/sda1 /mnt/boot
ls /mnt/boot # empty

Installation

We can now continue with a standard Arch installation for a while.

Select the mirrors

The default /etc/pacman.d/mirrorlist starts with two servers in Germany, so I think that’s good enough for me. I can improve it with reflector after the install is done.

Install packages

Might as well install some extra groups now, so:

pacstrap /mnt base{,-devel} gnome{,-extra} texlive-most

fstab

We’ll probably have to tweak or replace it later, but doesn’t hurt:

genfstab -U /mnt | tee /dev/stderr >> /mnt/etc/fstab

(Turns out it didn’t need replacing, just a small tweak for swap later.)

chroot

arch-chroot is a fancy wrapper around chroot; I’m not sure what it does, but it seems to at least enter new namespaces and mount API file systems.

arch-chroot /mnt

Time zone

I’m not sure what the hwclock part is for, but I assume it doesn’t hurt.

ln -sf /usr/share/zoneinfo/Europe/Berlin /etc/localtime
hwclock --systohc

Locale

sed -i '/^#de_DE\.UTF-8/ s/^#//; /^#en_US\.UTF-8/ s/^#//;' /etc/locale.gen
locale-gen
printf 'LANG=en_US.UTF-8\n' > /etc/locale.conf
printf 'KEYMAP=neo\n' > /etc/vconsole.conf

We also need to store the Neo keyboard somewhere so that it can be included in the initramfs later. (There’s an AUR package for it, but where nowhere near AUR helpers yet, so we do it manually.)

mkdir /usr/share/kbd/keymaps/neo/
curl -o/usr/share/kbd/keymaps/neo/neo.map.gz https://lucaswerkmeister.de/neo.map.gz

Hostname

I’ll set the pretty hostname later, once we’re properly booted and systemd-hostnamed is running.

printf 'theoden.lucaswerkmeister.de\n' > /etc/hostname

Root password

passwd

Boot loader

I prefer to use EFISTUB, where the firmware directly executes the kernel with the necessary arguments. We can set this up using efibootmgr. We also need to specify the right parameters to get encryption working.

pacman -S efibootmgr
efibootmgr \
    --disk /dev/sda \
    --part 1 \
    --create \
    --label 'Arch Linux' \
    --loader /vmlinuz-linux \
    --unicode 'initrd=initramfs-linux.img cryptdevice=/dev/RootVG/cryptroot:root root=/dev/mapper/root' \
    --verbose

Note: I later changed to systemd-based initramfs instead, so now the bootloader is:

sudo efibootmgr \
    --disk /dev/sda \
    --part 1 \
    --create \
    --label 'Arch Linux systemd' \
    --loader /vmlinuz-linux \
    --unicode 'initrd=initramfs-linux.img rd.luks.name=ab1d62d5-8dd5-4866-bec2-b8ab3bb70fea=root root=/dev/mapper/root' \
    --verbose

2018-07-22 note: It turns out that Intel microcode updates require an additional initrd to be read, so now the bootloader is:

sudo efibootmgr \
    --disk /dev/sda \
    --part 1 \
    --create \
    --label 'Arch Linux systemd ucode' \
    --loader /vmlinuz-linux \
    --unicode 'initrd=intel-ucode.img initrd=initramfs-linux.img rd.luks.name=ab1d62d5-8dd5-4866-bec2-b8ab3bb70fea=root root=/dev/mapper/root' \
    --verbose

2018-08-21 note: I also want to use the unified cgroup hierarchy (cgroups v2), so now the bootloader is:

sudo efibootmgr \
    --disk /dev/sda \
    --part 1 \
    --create \
    --label 'Arch Linux systemd ucode cgroupsv2' \
    --loader /vmlinuz-linux \
    --unicode 'initrd=intel-ucode.img initrd=initramfs-linux.img rd.luks.name=ab1d62d5-8dd5-4866-bec2-b8ab3bb70fea=root root=/dev/mapper/root systemd.unified_cgroup_hierarchy' \
    --verbose

2021-07-15 note: I now have an AMD CPU; I don’t want to forget to change the microcode on the next CPU change, so let’s just load both, it doesn’t hurt. Now the bootloader is:

sudo efibootmgr \
    --disk /dev/sda \
    --part 1 \
    --create \
    --label 'Arch Linux systemd ucode2 cgroupsv2' \
    --loader /vmlinuz-linux \
    --unicode 'initrd=amd-ucode.img initrd=intel-ucode.img initrd=initramfs-linux.img rd.luks.name=ab1d62d5-8dd5-4866-bec2-b8ab3bb70fea=root root=/dev/mapper/root systemd.unified_cgroup_hierarchy' \
    --verbose

2023-03-12 note: Following several other changes, my command to add the boot loader entry is now:

sudo efibootmgr \
    --disk /dev/nvme1n1 \
    --part 1 \
    --create \
    --label 'Arch Linux systemd ucode2' \
    --loader /vmlinuz-linux \
    --unicode 'initrd=amd-ucode.img initrd=intel-ucode.img initrd=initramfs-linux.img rd.luks.name=9192457b-77e7-4080-a3fa-6adc3d08c57e=root root=/dev/mapper/root systemd.unified_cgroup_hierarchy' \
    --verbose

We also need to set up the initramfs so that it will be able to load and decrypt the root volume.

sed -i '
  /^HOOKS=/ {
    # insert hooks to get a working early console
    s/autodetect modconf/autodetect keyboard keymap consolefont modconf/
    # insert hooks for loading and decrypting the root volume
    s/block filesystems/block lvm2 encrypt filesystems/
    # remove keyboard block near the end (already inserted earlier)
    s/filesystems keyboard fsck/filesystems fsck/
  }
' /etc/mkinitcpio.conf
pacman -S linux # easiest way to regenerate initramfs that I know

Again, that was for busybox-based initramfs, for systemd-based I changed it to:

sed -i '
  /^HOOKS=/ {
    # use systemd
    s/base udev autodetect/base systemd autodetect/
    # use sd-vconsole instead of keymap and consolefont
    s/keyboard keymap consolefont modconf/keyboard sd-vconsole modconf/
    # use sd-encrypt and sd-lvm2 instead of lvm2 and encrypt
    s/block lvm2 encrypt filesystems/block sd-encrypt sd-lvm2 filesystems/
  }
' /etc/mkinitcpio.conf
pacman -S linux # easiest way to regenerate initramfs that I know

And we also need to set up the encrypted swap, which will be re-initialized each boot.

printf 'swap\t/dev/SwapVG/cryptswap\t/dev/urandom\tswap,cipher=aes-xts-plain64,size=256\n' >> /etc/crypttab
printf '/dev/mapper/swap\tnone\tswap\tsw\t0\t0\n' >> /etc/fstab

Reboot

And now we should be ready to reboot! Exit the chroot shell, and then:

systemctl stop /mnt/boot
systemctl stop /mnt
systemctl reboot

Further setup

The system automatically boots the Arch Linux entry (it was the first entry in the BootOrder according to efibootmgr; YMMV), and after entering the root disk password, we are presented with… a getty. Let’s log in as root and fix that.

systemctl enable gdm
systemctl reboot

That’s better. GDM unfortunately doesn’t use Neo, but we can fix that later. But let’s stay in the TTY for now while we create the non-root user.

useradd lucas
mkdir ~lucas
passwd lucas

To populate the home directory, I want to clone a git repository, which requires networking.

systemctl enable --now systemd-networkd
systemctl enable --now NetworkManager
# wait a few seconds
pacman -S git
git clone https://github.com/lucaswerkmeister/home.git ~lucas
git -C ~lucas config remote.origin.url ghlw:home.git
chown -R lucas: lucas

And that’s the point where I stopped logging things. I think I did another reboot pretty soon, and after that, the system was essentially working and I could start getting all my files back from the backups I’d done earlier.

Addendum: performance

Added : I found out how you can get some statistics about the cache device, so let’s have a look at how it’s performed over the past two months.

$ sudo dmsetup status RootVG-cryptroot
0 1953513472 cache 8 2718/11264 256 837191/902368 6977701 9842187 21079799 518402 0 10 0 2 metadata2 writethrough 2 migration_threshold 2048 smq 0 rw -

You can find explanations for all those numbers in the Linux kernel documentation, but the most important ones are: 6 977 701 read hits, 9 842 187 read misses; 21 079 799 write hits, 518 402 write misses. So about 40% of reads and 98% of writes are hitting the cache. I’m not sure how meaningful the number for writes is, since I don’t understand the smq cache policy (its documentation is completely useless because it was written as a comparison to the original mq policy, whose documentation was subsequently removed when mq was made an alias for smq) and I think the “writethrough” operating mode means those writes still have to wait for the HDD, but the number for reads looks pretty good to me. (Those numbers persist after a reboot, by the way, so I assume they apply to the lifetime of the cache device, not just to the current boot.)

2022-07-03 note: I’ve written another blog post specifically about the cache behavior, now that I think I understand it better.