January 03, 2017
Configure Backup Storage with ZFS
There are some neat filesystems out there, but none that really hold a candle to ZFS, specifically OpenZFS. You get data integrity, scalability, native snapshotting and efficient storage with the built-in blockchain it provides. Oversimplifying the definition of a blockchain, it is essentially a way to check a list of known things against items being sent, otherwise known as a checksum. With every data block created with ZFS there is a checksum associated to verify that the data written to the filesystem is the correct data that is to be used. Datto uses ZFS with Ubuntu and gets a lot of power as a result. We consistently try to change, update, and invent new ways to backup and restore using functionality that Ubuntu and ZFS provide. One example of this is the way backup storage can be configured using ZFS for better performance and easier configuration/management.
KZFS: A kernel based version of ZFS, which is significantly more stable than zfs-fuse (another options for testing).
Loop Device: A way to use a file as a block device to then format, similar to plugging in a USB drive.
Using ZFS on Linux with Ubuntu 14.04:
Install KZFS (on 14.04 but not terribly different for other versions).
The OS is Linux and the storage array is using ZFS.
Using thin provisioned files mapped to loop devices to act as storage media.
A ZFS storage array can be comprised of 3 things:
A Zpool: The overall storage capacity of your array created from a group of VDEVs.
Virtual Devices (VDEV): A logical device in a Zpool. These include files, physical drives, a mirror, ZFS software RAID, Hot Spares, L2 Read Cache, ZFS’s intent log
A ZFS Filesystem: Thin provisioned storage location with the capacity constraints of the Zpool
A ZFS Snapshot: A point-in-time reference of data that existed within a ZFS filesystem
Why are Zpools so cool?
They use one command to configure the RAID and filesystem. With other OSs, you have to boot the machine into the BIOS and set up all the RAID specifications and then reboot to install the OS, and then configure the storage locations (different volumes or mount points within the OS).
Integrity checks can be performed without having to unmount the Zpool. Meaning the integrity check occurs when regular operations are happening. Integrity checks also happen faster, since ZFS only needs to hash through used space.
Using software RAID, you can do an array rebuild with one command while regular operations are happening.
How do I create a Zpool?
Let’s say you have seven loop devices that are 16GB in sparse image size. Five of them will be part of a raidz2, which is effectively RAID 6 for ZFS, VDEV and loop 6 will be part of a log VDEV. Loop 0 will be used later. In the real world, you would ONLY use real disks or RAIDs for the VDEVs. However, in the example below I am using loop devices to show how disks would act.
COMMAND TEXT: sudo zpool create [poolName] raidz2 /dev/loop log /dev/loop6
So what just happened? Well in one command we created a storage array with two parity disks and a specific location for the ZFS intent log, created a thin provisioned storage location for data (effectively a partition within the storage array), and mounted that partition to an empty directory. Unfortunately, they all share the same name.
Zpool Storage Array: iCanHazPool
ZFS Filesystem: iCanHazPool
Linux Mountpoint: /iCanHazPool
To show the replacement capabilities of ZFS, while writing random data to the /iCanHazPool mountpoint, I triggered a swap of “failed” loop2 with loop0
COMMAND TEXT: Sudo zpool replace [poolName] [oldMedia] [newMedia]
ZFS only needs to check through the array space that is used and ingests the new loop0 into the location where loop2 existed previously on the fly. If there is a lot of data to get through, the resilvering process will run in the background until complete.
In ZFS, “resilvering” is the process of fixing/scrubbing arrays.
What is cool about ZFS Filesystems?
ZFS filesystems are thin provisioned storage locations that can be automatically mounted to an empty directory of the same name, in which you can then store anything your heart’s desire. Datto uses them to store backup data or as a NAS location. Since the filesystem is thin provisioned, the storage array space only allocates when you add data to the filesystem. This allows you to have N number of filesystems within a storage array that can be used for any reason, for example, as a file share, a hypervisor datastore, etc.
The true power of ZFS is its ability to be snapshotted. We’ll explore ZFS snapshots in depth in my next blog.