Adding a New HDD to a Debian System

Last updated on: 15 November, 2021

Summary: Some ramblings mixed with instructions explaining how to add a new encrypted hard disk to a Debian system.

Situation

For some time I kept running low on hard disk space when working on projects that involve non-linear editing of video sequences. Video files transcoded to DNxHD would easily balloon and take up almost all available storage space. So far, when it comes to pure 3D projects, it rarely happened, usually only with simulations that I needed to cache, or if I had to deliver offline-rendered longer sequences. This was the reason I have not looked for expanding my storage space.

The other reason is that I don’t like changing hard disks too frequently. My experience with these products taught me that there’s always a high probability of an HDD breaking in the first several months after purchase. And when it breaks mid-project, it’s a waste of precious hours that will have to be spent on restoring backups.

But if a new HDD survives for that period, it will usually serve reliably for many years to come. With statistical data about hard disk failure rates I can probably be proven wrong, but this still wouldn’t convince me, because that’s how it was and still is since my first 120 MB WD Caviar. Call me superstitious. I buy new disks only if I find myself constantly running out of storage space, or if S.M.A.R.T. starts reporting worrying diagnostic information.

My current workhorse drives (not counting system or backup disks) are two WD6400AAKS HDDs, which I’ve been using for over a decade. Heavily utilized, they’ve lived through two generations of my workstations and their storage capacity was sufficient for working with normal 3D projects up to this day. You might think that 640 GB per disk is not a great amount of space, but in a pipeline, where completed assets are immediately pushed to other drives, this was more than enough. I didn’t complain, so there was no need for an upgrade. But the time has come to start expanding the storage.

I have plenty of space in my full tower case and 8 SATA connectors on the motherboard (with two of them still free), so I decided to keep the old production drive and use it as an additional storage for shared resources. Or as a cache disk. The new drive would replace the old one’s logical mount point, and the old one would be mounted elsewhere in the file system.

Deciding on The Brand And Model

When it comes to hard drive brands, I always held HGST (a subsidiary of Hitachi) in high esteem. They produced high quality enterprise-grade drives, and although their hardware wasn’t cheap, it gave you some comfort of knowing that the probability of drive crashing out of a sudden, is lower than in standard desktop drives. I have a 2 TB Ultrastar disk mounted as one of backup drives in my workstation, and so far it operates reliable (knocks on wood). The company was later acquired by Western Digital and, according to Wikipedia, made defunct in 2018, though you can still see some of their drives on the market. New Ultrastar series of hard drives are now produced under WD brand, but I’m not sure if the technology and resilience of those disks remains the same. I guess checking Backblaze reports from years before and after the acquisition would shed some light on this matter.

HGST was a Japanese company, and it goes without saying that Japanese people are very dedicated to their work. Seeing them bought and then extinguished by a US company was saddening, so as a form of a small personal protest, I decided to buy a drive of another Japanese brand — Toshiba. Besides, I didn’t need another enterprise-quality drive.

My twin WD6400AAKS are 7200 rpm desktop drives. It seems that their counterparts in Toshiba world are P300 models, which come as 5400 rpm and 7200 rpm, and also as shingled magnetic recording (SMR) and conventional magnetic recording (CMR) HDDs. I’m not going to explain those terms in detail as they can be easily looked up on the Internet, but the main difference between the two is that SMR disks can pack more storage space at the cost of significantly reduced writing speed when compared to CMR disks. For a production drive I needed a fast one of medium capacity, so 2 TB HDWD120EZSTA or HDWD120UZSVA CMR drives were good candidates. The only difference between the two is that the EZSTA comes in a retail box, while UZSVA — in bulk packaging. I bought the former, but only to reduce the possibility of disk damage in transport. I’ve seen far too many videos of couriers throwing parcels around or forcing oversized packages into parcel locker socket of inadequate size. Better safe than sorry.

Now as I’m writing this post, I realized that P300’s model ID — “HDWD” might indicate that the drive is somehow tied to Western Digital. I skimmed the Internet in search for an answer and found a decade-old information about some kind of deal that both companies were about to strike at that time. WD was to buy Toshiba’s subsidiary manufacturer of 2,5" HDDs, and in exchange Toshiba would get a production line for manufacturing 3,5" drives. Or something like that. Oh, well…

Drive Preparation

When configuring the drive and my system, I relied heavily on the marvelous Arch Wiki. There’s a whole chapter on all topics on this subject.

My plan was as follows:

Make a backup of the production drive (duh!).
Connect the new drive to a free SATA port and to power (this turned out to be troublesome, but more on this later).
Create a new partition.
Encrypt it.
Change mounting point of the old drive.
Update all symlinks and important config files that are pointing to the old mount location.
Configure crypttab and fstab to enable automatic partition mounting.

I have scripts set up for creating daily, weekly, monthly and quarterly backups of all important drives, so the first step was a breeze. I just had to wait several seconds for rsync to do its job of bringing yesterday’s backup up to date.

The second step turned out to be more complicated than I thought. While there I had no issues in finding the two empty Marvell SATA 6 GB ports, finding a free SATA power cable was a different thing, since all of them were already taken. I even thought of removing a DVD burner to free up one power plug, which would leave me with only a Blu-ray burner. This wouldn’t be a bad move, because this DVD drive is older than the BRD, and I don’t use it too often. Still, the layout of my PC case would prevent me from connecting the hard disk with that cable anyway, because the second power plug of this cord was several centimeters apart from the plug connected to the other disc drive. Both drives are mounted in the top compartment of the case and even by moving the BRD to lower mount point, there was no way I could reach HDD cages with the second plug.

Fortunately my PSU (Corsair AX 750) is modular, and I remembered that I still should have some spare cables stashed in a bag that came with the hardware. It took several hours to pinpoint the location of the box the PSU originally shipped in, but I eventually found the cord and plugged it in. Lucky me, because new cable sets for Corsair power supply units are not cheap.

At this point, if this was a drive which previously had some data on it whether encrypted or not, I would probably run a secure wipe and return the next morning after having a good sleep. But because this was a brand-new HDD, wipe was unnecessary.

Partitioning The Drive

Partitioning, formatting, duplicating and other operation on drives on GNU/Linux always gives me chills because by making a simple mistake, like entering a wrong letter, one can easily nuke his entire file system. Thereby, I always triple check all potentially devastating commands and think five times before issuing them. I even had ideas of performing these operations from QEMU/KVM, by passing physical drive I intend to operate on directly to a guest virtual machine as this, I believe, would prevent me from doing any damage to host’s file system in case of a human error.

But this time I got the guts up and partitioned the drive using normal means.

Beware!

If you, dear reader, want to follow the procedure, take extra care and don’t blame me if your system explodes or your data is irrevocably erased.

So, the first thing I had to do is find out is under which block device operating system has registered the drive. A useful command for printing information about connected drives is lsblk:

lsblk -io KNAME,TYPE,SIZE,MODEL

This returns something similar to:

KNAME TYPE    SIZE MODEL
sda   disk    1.8T HGST HUS724020ALA640
sda1  part    1.8T 
sdb   disk  167.7G INTEL SSDSC2CW180A3
sdb1  part    487M 
sdb2  part   29.8G 
[...]
sde   disk    1.8T TOSHIBA HDWD120
[...]

With lsblk I could print partition mount points of all drives:

NAME         MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda            8:0    0 167.7G  0 disk
├─sda1         8:1    0   487M  0 part  /boot/efi
[...]
sde            8:64   0   1.8T  0 disk
sr0            11:0   1  1024M  0 rom
sr1            11:1   1  1024M  0 rom

In my case the new drive was designated as /dev/sde (yours can be different), but for this tutorial let’s assume that the drive is /dev/vdc. I already have my drive partitioned and encrypted, and I don’t want to break it, so in this tutorial I’ll be operating on a virtual drive and this kind of drives are registered as vd*.

For partitioning, I personally prefer parted over fdisk, but it shouldn’t matter what you choose because nowadays, both programs have support of GPT (GUID Partition Table), so the choice is arbitrary. Starting parted and telling it to operate on a specific drive is simple, but requires root privileges:

parted /dev/vdc

This launches the program’s interface (I marked cursor position with underscore character).

GNU Parted 3.4
Using /dev/vdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) _

The program informs that help information is available via help command, but it’s possible to use a shorthand command h. In fact most of the commands are reachable by their shorter forms. Also, h command prints information about a specific command.

First thing to do in parted was to ask it about information on the disk the program was told to operate on. I always do it because it lets me check physical sector size and double-check that I have not made a mistake while entering the block device file name and I’m indeed modifying the correct disk. The print command (shorthand: p) is responsible for displaying information about the disk and its partitions.

(parted) p
Error: /dev/vdc: unrecognised disk label
Model: Virtio Block Device (virtblk)
Disk /dev/vdc: 8590MB
Sector size (logical/physical): 512B/4096B
Partition Table: unknown
Disk Flags:
(parted) _

The error on the first line means that the hard disk does not have a disk label yet. The label will need to be created in the next step.

The Model of a P300 Toshiba disk will be displayed as ATA TOSHIBA HDWD120 (scsi). Its size would also be different: 2000 GB instead of 8 GB. Drive’s physical sector size is 4 KB (this will be important later). Finally, parted informs that the drive does not have any partition table present, which is understandable as it was not created yet.

Before creating a new partition, I created a disk label (a.k.a. partition table). There’s a nice selection of partition layouts to choose from, but for an ext4 file system, I was interested only in gpt, as it offers better disk managing capabilities than mbr:

(parted) mklabel gpt
(parted) p
Model: Virtio Block Device (virtblk)
Disk /dev/vdc: 8590MB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:
(parted) _

Note that it’s also possible to use a shorthand command mkl.

With partition table present on the disk, I proceeded to creating the actual partition. I needed only one, which would take full disk space. The structure of mkpart command (shorthand: mkp) is:

mkpart PART-TYPE [FS-TYPE] START END

START and END parameters can be supplied as percentages of total disk space. If I need the drive to hold only one partition, I usually start from 0% and end with 100%. This way the program handles the partition alignment.

Note that while MBR partitions can be primary, logical and extended, there’s no such division when it comes to GPT disks. Here PART-TYPE acts merely as a GPT partition label. The partition was intended to hold my projects, so I named it accordingly:

(parted) mkpart "projects" ext4 0% 100%
(parted) p
Model: Virtio Block Device (virtblk)
Disk /dev/vdc: 8590MB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name       Flags
 1      1049kB  8589MB  8588MB  ext4         projects
(parted) _

For the real Toshiba HDWD120, the result looked like this:

Model: ATA TOSHIBA HDWD120 (scsi)
Disk /dev/sde: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name      Flags
 1      1049kB  2000GB  2000GB               projects

That’s it, the partition was created and the drive wa ready to be encrypted.

Encryption

Many years ago I had experienced a head crash of a 1 TB Seagate Barracuda drive. It happened several weeks after I bought the disk, so the hardware was still under warranty. However, having already copied all of my projects to this drive, and some of them under NDA, I could not simply return the disk to the store with all that stuff residing unencrypted on magnetic platters. I had to destroy the drive, and since then, I’m always encrypting every single drive that I buy, and I strongly encourage everyone to do the same.

To encrypt the drive I first needed to find out if I were dealing with an Advanced Format drive or a standard one.

OK. I lied. I knew it right from the start, long before making the purchase. Toshiba informs about it on P300’s product page. The difference which distinguishes those drives from older technology is that in AF disks physical sector size exceeds 512 bytes. In Advanced Format drives sector size is 4096 bytes (4KB), which is supposed to improve error correction and increase bit density per track, resulting in average format efficiency gains of 8.6% (up to 11%). Parted confirms that the drive is indeed using Advanced Format:

Sector size (logical/physical): 512B/4096B

You may wonder why its logical sector size is only 512 bytes. This simply means that the drive offers a compatibility emulation layer for operating systems that do not support 4 KB physical sector sizes. That’s something I don’t necessarily require, as native AF support has been present in the Linux kernel for several major versions.

To encrypt an AF partition, I needed to pass a proper --sector-size option to cryptsetup:

cryptsetup luksFormat --sector-size 4096 /dev/vdc1

I did not provide any additional options to cryptsetup and just went with the defaults. Naturally, you may want to use different settings.

Before proceeding further, the program displays a warning about a destructive action it’s about to perform and asks for user confirmation. Then for a passphrase which will be used for unlocking the partition.

WARNING!
========
This will overwrite data on /dev/vdc1 irrevocably.

Are you sure? (Type 'yes' in capital letters): YES
Enter passphrase for /dev/vdc1:
Verify passphrase:

With LUKS container set up, I could open it for formatting:

cryptsetup open /dev/vdc1 projects

This command has mapped the decrypted partition to /dev/mapper/projects. I could then format it with standard ext4 formatting tools, but in the case of an AF drive, I needed to explicitly define the desired logical sector size of the partition:

mkfs.ext4 -L projects -b 4096 /dev/mapper/projects

The final step was to close the decrypted partition:

cryptsetup luksClose /dev/mapper/projects

Auto-Mount on Boot

At this point I could already manually mount the partition, but doing this after each boot would be cumbersome and simply — annoying. I expect the system to do its job and always auto-mount it for me. This required a few more additional steps, which were mostly limited to editing a couple of configuration files. I started by acquiring the UUID of the new disk:

blkid | grep "vdc"

/dev/vdc1: UUID="a13db7f0-2acf-4764-9873-2c7c7f8946be" TYPE="crypto_LUKS" PARTLABEL="projects" PARTUUID="dba8b379-60d1-4a64-8d8b-8c94131f7cb9"

I yanked UUID of the disk and pasted it into a new line of /etc/crypttab.

# /etc/crypttab
# <target name>  <source device>                              <key file>  <options>
projects         UUID="a13db7f0-2acf-4764-9873-2c7c7f8946be"  none        timeout=90

The structure and purpose of crypttab configuration file is very nicely explained in man crypttab. To keep it short, it’s a file that contains information about encrypted devices. All lines in the file are processed sequentially, so their order matters. Available fields (columns) of each encrypted device are:

target name is a file in /dev/mapper in which the decrypted container will be stored in. Providing projects as target will tell the system to create a /dev/mapper/projects virtual file.
source device for a GPT drive should be its UUID.
key file is used when the container is configured to be unlocked with key files. In my case it’s passphrase, therefore key file is set to none.
options contain various options. Consult man crypttab for more information. The timeout which I use sets the maximum time the system will wait for a passphrase to be entered. ArchWiki states that inputting more than 90 seconds is futile, as systemd has its own independent timer, which will make the system wait (by default) for a maximum of 90 seconds, that is unless the value is changed with x-systemd.device-timeout in fstab.

Now that I had told Debian about my encrypted file system, I still needed to order it actually to mount the partition at boot time. Similarly to crypttab, fstab is also well explained in man pages. The order of lines also matters here, so I decided to insert description of my new partition near the end of the file, just before optical drives:

# /etc/fstab
# <file system>       <mount point>  <type>       <options>    <dump>  <pass>
[...]
/dev/mapper/projects  /mnt/projects  ext4         defaults     0       2
/dev/sr0              /media/dvdrom  udf,iso9660  user,noauto  0       0
/dev/sr1              /media/bluray  udf,iso9660  user,noauto  0       0

Some explanation of the fields:

If the drive is encrypted, the file system column describes its mapping location (as specified in target name column from /etc/crypttab). For non-encrypted GPT drives it will be their UUID.
The mount point is a path on the root file system where the partition is to be mounted.
Type holds partition’s file system type (ext4 in my case).
Options is a comma-separated list of mount options (as described in man mount). I used default options set which are rw, suid, dev, exec, auto, nouser, async.
Dump determines which file systems are to be dumped. I don’t use dump for backup, so I left it at 0.
Pass tells the operating system in what order it should run fsck to check file system errors. Disk containing the root partition should always be the first, so it gets the 1. All other drives should be set to 2. Equal number for non-root drives will allow them to be processed in parallel.

And that’s it. I rebooted my workstation and after GRUB I was prompted for the drive’s LUKS passphrase. After providing it with the correct phrase, the drive mounted correctly and became visible in the file system.

Epilogue

Well, I guess that’s a lot of text to describe a relatively simple operation. I don’t recommend rushing through potentially dangerous operations like those described, but if you take this post and cut out the cruft in form of my ramblings and long explanations, you’ll notice that the process of partitioning and encryption is actually very simple and logical, and can be finished with several commands in less than two minutes. That is, if you don’t need to securely wipe disk surface to erase existing data, which can easily take half a day.

Moreover, this process can be scripted, and then run with one push of the Enter key. Though personally, I would be hesitant to do so…

Hopefully it this post would be helpful for someone. And remember, if in doubt, consult man pages, your distribution’s manual, and naturally — the well of all wisdom, the Arch Wiki.