Reading Up On ZFS…

Attempting to implement ZFS and the decisions that go into it is enough to make any newbie give up (and/or get terribly confused)…

Unfortunately, Google does not help, what with old information (e.g. installation instructions: old vs. new; possibly outdated information) and different “package variations” of ZFS (e.g. Ubuntu-native ZFS packages vs. ZFS-native), and the endless arguments on the “correct number of disks” for RAIDZ/2 (mirrored vdevs only? RAIDZ? RAIDZ2? mirrored RAIDZ/2? block size?)…

I read through all those articles linked above, including some other helpful(?) ones:

Some other “associated” links/articles/comments/posts that I came across:

My decision?

With all that information swimming around in my head, I finally decided on… *drum roll*

RAIDZ2 across all 6 drives using 4K-byte aligned partitions with a zpool using ashift of 12 (i.e. 4KB), and with all datasets using LZ4 compression*… and a prayer that not more than 2 drives fail during any error and recovery period (or at least completion of total off-line backup), together with a 4GB SLOG partition on the boot SSD**.

The thought of using a striped, 3x mirror vdev had the horror that any two failures in a single vdev meant I kissed my data goodbye…

Well, that was all the time I had – next up, actually installing and building the RAIDZ2!

* The thought did cross my mind to use 6x 4GB RAIDZ2 partitions (for static data) + 6x 2GB striped mirrored partitions (for fast AV files access); but:

  • I felt that the premium on storage space loss for the AV files (i.e. 50%) was not worth the value nor access speed requirements of the data
  • the “mixed” mode performance was a huge unknown (i.e. was ZFS able to optimally handle “mixed mode” RAID access all the spindles?)

** The reasoning for all of this was as follows:

  • sizing:
    • assuming a 300MB/s transfer rate (approximate single internal SATA 7200RPM drive performance) and the periodic 5s ZIL flush, the needed capacity would actually be 300MB/s * 5s = 1.5GB, hence a 4GB SLOG should be sufficient
    • however, a 10GbE connection would saturate a 4GB SLOG (i.e. 10Gb/s * 5s ÷ 8bits ÷ 230bytes/GB = 5.82+GB)… but it will be awhile before I get any further than a 802.3ad LACP 1GbE connection…
  • partition vs. whole device:
    • you could definitely control the 4K alignment (both starting and ending), compared to “automatic detection” by ZFS utilities which may or may not work
    • partition sizing could guarantee that the “maximum” size could have some safety margin (e.g. not all 6TB drives have n blocks/sectors)
    • as per the ZIL sizing above, a single SSD (even a 120GB one) would be way overkill (although arguably could also be used as L2ARC); and USB 3.0 thumbdrives would both be slower and likely die faster (given the high volumes of writes)
    • I do not have many spare SSDs, and more importantly, spare SATA ports, so a partition was my only choice
    • cloning the partition and placing it elsewhere due to a device replacement is supposedly easier with a partition than with an entire device – supposedly, you cannot remove an SLOG without breaking the zpool (although this claims the contrary – I have not tested as yet)
    • using a partition for the SLOG that does not fill the entire SSD (instead of using entire device) guarantees that hardware/controller wear-leveling and other low-level optimisations are able to perform at optimal capacity (see over provisioning)

Leave a Reply