Attempting to implement ZFS and the decisions that go into it is enough to make any newbie give up (and/or get terribly confused)…
Unfortunately, Google does not help, what with old information (e.g. installation instructions: old vs. new; possibly outdated information) and different “package variations” of ZFS (e.g. Ubuntu-native ZFS packages vs. ZFS-native), and the endless arguments on the “correct number of disks” for RAIDZ/2 (mirrored vdevs only? RAIDZ? RAIDZ2? mirrored RAIDZ/2? block size?)…
I read through all those articles linked above, including some other helpful(?) ones:
- https://wiki.ubuntu.com/ZFS (proven to be mostly incorrect/misleading info!)
- http://sphinx.freenas.org/zfsprimer.html#zfs-primer
- https://icesquare.com/wordpress/how-to-improve-zfs-performance/
- https://icesquare.com/wordpress/zfs-performance-mirror-vs-raidz-vs-raidz2-vs-raidz3-vs-striped/ (probably not realistic due to prandom files)
- https://github.com/zfsonlinux/zfs/wiki/faq
Some other “associated” links/articles/comments/posts that I came across:
- https://forums.freebsd.org/threads/21005/#post-124327 (might be useful to force 4K alignment?)
- https://icesquare.com/wordpress/how-to-improve-rsync-performance/
My decision?
With all that information swimming around in my head, I finally decided on… *drum roll*
RAIDZ2 across all 6 drives using 4K-byte aligned partitions with a zpool using ashift of 12 (i.e. 4KB), and with all datasets using LZ4 compression*… and a prayer that not more than 2 drives fail during any error and recovery period (or at least completion of total off-line backup), together with a 4GB SLOG partition on the boot SSD**.
The thought of using a striped, 3x mirror vdev had the horror that any two failures in a single vdev meant I kissed my data goodbye…
Well, that was all the time I had – next up, actually installing and building the RAIDZ2!
* The thought did cross my mind to use 6x 4GB RAIDZ2 partitions (for static data) + 6x 2GB striped mirrored partitions (for fast AV files access); but:
- I felt that the premium on storage space loss for the AV files (i.e. 50%) was not worth the value nor access speed requirements of the data
- the “mixed” mode performance was a huge unknown (i.e. was ZFS able to optimally handle “mixed mode” RAID access all the spindles?)
** The reasoning for all of this was as follows:
- sizing:
- assuming a 300MB/s transfer rate (approximate single internal SATA 7200RPM drive performance) and the periodic 5s ZIL flush, the needed capacity would actually be 300MB/s * 5s = 1.5GB, hence a 4GB SLOG should be sufficient
- however, a 10GbE connection would saturate a 4GB SLOG (i.e. 10Gb/s * 5s ÷ 8bits ÷ 230bytes/GB = 5.82+GB)… but it will be awhile before I get any further than a 802.3ad LACP 1GbE connection…
- partition vs. whole device:
- you could definitely control the 4K alignment (both starting and ending), compared to “automatic detection” by ZFS utilities which may or may not work
- partition sizing could guarantee that the “maximum” size could have some safety margin (e.g. not all 6TB drives have n blocks/sectors)
- as per the ZIL sizing above, a single SSD (even a 120GB one) would be way overkill (although arguably could also be used as L2ARC); and USB 3.0 thumbdrives would both be slower and likely die faster (given the high volumes of writes)
- I do not have many spare SSDs, and more importantly, spare SATA ports, so a partition was my only choice
- cloning the partition and placing it elsewhere due to a device replacement is supposedly easier with a partition than with an entire device – supposedly, you cannot remove an SLOG without breaking the zpool (although this claims the contrary – I have not tested as yet)
- using a partition for the SLOG that does not fill the entire SSD (instead of using entire device) guarantees that hardware/controller wear-leveling and other low-level optimisations are able to perform at optimal capacity (see over provisioning)