Did not take long for one of the cheap 300GB disks to fail.
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Dec 25 13:17:30 2016
183G scanned out of 612G at 18.3M/s, 6h39m to go
45.9G resilvered, 29.98% done
NAME STATE READ WRITE CKSUM
pool2 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
replacing-0 OFFLINE 0 0 0
luks-80ec16d0-3d1f-4ed5-a4f0-0ee564547280 OFFLINE 52 191 0
luks-fd4c98cc-d1be-4812-ae33-3650fdc984c0 ONLINE 0 0 0 (resilvering)
luks-e7868ceb-fe11-4e40-a5e3-b6e56129b380 ONLINE 0 0 0
luks-31f81d2e-36eb-4213-b101-e17d13907df5 ONLINE 0 0 0
luks-c265d6da-5c40-435b-9f95-c2512c5a8bc5 ONLINE 0 0 0
errors: No known data errors
Had to take the 300GB from another pool until I get a replacement.
I think I’ll wait for the missing hardware for the additional 8 bays and then migrate from 4x300GB RAIDZ1 to something else. Maybe 6-disk RAIDZ2 with online spare. Better to use those disk that I have. Or maybe get a bunch of small SSD disks from work.
“Ordinary” spinning disks use PMR but it has become near its limits and manufacturers are, and have been, developing new replacing magnetic technologies.
HGST has it’s Helium drives which utilize SMR technology in place of more common PMR and the Seagate Archive 8TB is also SMR based hard drive.
This of course was not mentioned on local store’s web pages but luckily I did some research into the drive.
It turns out these drives they have internal firmware based software to take care of all the intricacies of SMR, since it is somewhat complicated, and there should be no problems using them like any other hard drive, but because of their differences they may behave differently.
That is a highly technical talk from HGST and some discussion with OpenZFS developers. I watched this yesterday before falling a sleep and couldn’t really catch how SMR differes but it seems that it is better at sequential writes than random writes, which are problematic, and reads, be that random or sequential, should be relatively fast.
But the bottleneck is random writes for which I believe these drives have internal “normal” PMR type section which is used to do some tasks related to that, to not completely destroy the performance.
But since ZFS is copy-on-write filesystem, random writes should not pay that big of a part. That is my understanding.
But these drives are so cheap, and since they are meant more or less for cold data, that I would probably myself go for the Seagate Archive 8TB, which is available from Germany for 233 € which is pennies.
Cheapest 3TB drive that I can find is 88 € so it is exactly the same price per Gigabyte but with less drives, or with room for more 8TB drives and more storage capacity in a single machine.
HGST uses Helium because of it’s density and how lower density reduces friction which reduces heat and makes the spinning platters more stable, and hence the drive more reliable; I am not sure if Seagate uses Helium – probably not, but they are still able to offer 36 months (3 years) warranty.
So probably not for ordinary usage, but for cold storage should be fine. Place the data once, rarely if ever modify, and read once in awhile.
Cache type and size: The drives use a persistent disk cache of 20GiB and 25GiB on the 5TB and 8TB drives, respectively, with high random write speed until the cache is full. The effective cache size is a function of write size and queue depth.
This here illustrates the basic operating principle:
To decrease the bit size further, SMR reduces the track width while keeping the head size constant, resulting in a head that writes a path several tracks wide. Tracks are then overlapped like rows of shingles on a roof.
So the write head writes over multiple tracks while writing one single track, and the tracks which are being written over as a side effect will be copied to different place prior to overwriting them.
When drive gets full there are no empty tracks spaces left and all new data must first copy tracks to safe places, which might have to copy data to safe places, which might have to copy data to safe places [et cetera] and then the new data can be stored.
And all this because the write head cannot be made any smaller.
And what was said one paragraph before is said differently in this document as well; about the cascading relocations of data:
Modifying any of this data, however, would require reading and re-writing the data that would be damaged by that write, and data to be damaged by the re-write, etc. until the end of the surface is reached. This cascade of copying may be halted by inserting guard regions —tracks written at the full head width—so that the tracks before the guard region may be re-written without affecting any tracks following it, as shown in Figure 2.
Looking into moving some sort of Web based personal storage system, such as https://pydio.com/en which looks extremely good on the surface, haven’t had chance to try it yet.
Another one which came about was ownCloud but to be perfectly honest it doesn’t seem as finished and clean as the other one.
One thing all these must have is they must use the underlying filesystem and store files in some sensible way. By which I mean that in case anything ever goes wrong there must be simple way to migrate the data to another system or ditch any management systems all together.
In other words, any system worth considering should simply store the files on filesystem and act as Web based GUI to it, with some intelligence such as indexing and searching and other things of that nature.
Or if not then the system must be absolutely fool proof, exactly like any ordinary filesystem pretty much is.
But I think I am sold for Pydio because if you look what they say, it seems pretty awesome:
Because just as I am writing this text at not home, and storing these images on Mega.nz cloud, so that the files are available also at home; Pydio has this exact same feature which makes it extremely powerful!
So you can create your own personal storage cloud and use it everywhere, securely.
And they have had time to paste the license at the beginning but not enough to document the code properly. Also the question is why haven’t they done this with some existing framework but instead rolled their own?
If all goes well I will be adding one or two extremely powerful and new servers in the coming months.
Those servers use 2.5″ disks so the only question is how to implement large scale storage system. I have an old E6600 based server which would be perfectly fine if two 1Gbit connections were trunked together to get 2Gbit iSCSI connection.
2TB in 2.5″ form factor seems to be most cost effective, and prices for 3TB are beyond economical. So if one server could take 4 disks that would in mirrored configuration give 2TB of storage with some faster storage in form of SSD; left over from L2ARC and SLOG.
The old DL360 G3 would be dedicated to only work as firewall and traffic shaper and routing and switching would be moved to dedicated managed gigabit switches.
Also now all servers boot from NFS which has proven to be good, but problematic in case of failure in that NFS server, which has potential to either lock or bring down all the other servers. So NFS would be removed in favor SSD based mirrored ZFS root.
One question mark is my current networking setup which relies heavily on Linux, and which would need to be ported to managed switches. It shouldn’t be a problem, though, since it is technically all VLAN based with some bridges with more specific rules; so those would need to addressed somehow.
Also something like pfSense could be considered. But with firewall and router, if such system is used, I would like to move from i386 to 64bit architecture because currently there have been problems with not enough memory. HP ProLiant DL380 G5 might suit the purpose perfectly as a low cost server.
Quad gigabit PCIe network cards seem to be quite cheap so with three slots it would act as 12-port gigabit router. That would enable either the current Linux-based routing scheme or transition to something like BSD based pfSense. BSD has a reputation of being network oriented system and some studies have demonstrated that it performs extremely well as a router.
But one thing to remember with Linux/BSD based routers is to make absolutely certain that the driver support for network cards is perfect. Otherwise the stack will fall apart. Dedicated routing hardware works perfectly because it has been built to match perfectly with what it was built to be — router and nothing more.
So if the new QEMU/KVM hypervisor would set me back 400 €, disks perhaps 500 €, router 300 € and one or two additional small switches yet another 200 € and 1400VA UPS 250 € then the price tag woud be 1 550 € which isn’t too bad.
That cost would hopefully give me room for another 3 years at least and 2TB of storage and possibility to expand that storage to 14TB by using the router as FC based storage node by dropping 4 gigabit ports to accomodate for the FC card.
Everything so far has gone simply by clicking buttons and choosing what to install. This is completely 360° difference to Linux.
IIS was installed by choosing features and PHP can be installed similarly easy by Microsoft produced installer. The only downside is that at least for this version of IIS they offer 5.3.24 which is old.
This is certainly easier than setting up Linux, at least at this level it is. But I am sceptical in a sense that I am certain it isn’t like that. But these are apples and oranges so it really makes no difference.
And I sort of like this that this gives one the ability to do nothing but actually what’s needed to be done. I mean, in Linux you can fiddle the rest of your life and it changes little. So Linux by this impression is 1000 times more versatile but it comes with downsides.
And then when this happens in Windows I don’t like it because it never gives you any explanation and the help it offers has no real use:
Found information that this requires NET 3.5 and oddly it wasn’t installed along with this. So currently installing that.
In less than 24 hours it has written 240GB into 20GB partition. That’s quite a hell of an impact on such a small space on an SSD, but I assume much of this is because I had to move large amounts of data back and forth.
But this is definitely something that must be monitored because my daily backups could theoretically eat away that SSD quite fast. Especially since I am in process of making new backup system which would verify large amounts of previous backups every single day.
Also the hit-ratio is extremely poor:
l2_hits 4 2496
l2_misses 4 5801535
So it might not even be smart to use L2ARC at all for this pool. Seems more random than ZFS can make use of.
Works just fine unlike the HDD tests previously. I have no ZIL nor L2ARC for that pool but because DDT is on SSD and is therefore fast, the problem of evicted DDT from ARC doesn’t become such an issue.
DDT-sha256-zap-duplicate: 130595 entries, size 286 on disk, 141 in core
DDT-sha256-zap-unique: 841291 entries, size 301 on disk, 157 in core
dedup = 1.14, compress = 2.36, copies = 1.00, dedup * compress / copies = 2.68
Small 120GB SSD so that additional 14% saving there becomes handy.
Edit: and after all the images were copied, the deduplication went up quite a bit, along wit the compression:
# zdb -D vm
DDT-sha256-zap-duplicate: 277827 entries, size 288 on disk, 141 in core
DDT-sha256-zap-unique: 1251538 entries, size 303 on disk, 158 in core
dedup = 1.36, compress = 2.49, copies = 1.00, dedup * compress / copies = 3.39
So it’s storing much more data than the whole drive is in size. Giving me essentially 170GB SSD for the price of 120GB. The server and setup used isn’t high-end and there is no need for superior performance so the hit from deduplication combined with heavy compression doesn’t affect me much.
The additional things that I can do with that extra 50GB is warmly welcomed.