Disk failure

Did not take long for one of the cheap 300GB disks to fail.

  pool: pool2
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Dec 25 13:17:30 2016
    183G scanned out of 612G at 18.3M/s, 6h39m to go
    45.9G resilvered, 29.98% done
config:

        NAME                                             STATE     READ WRITE CKSUM
        pool2                                            DEGRADED     0     0     0
          raidz1-0                                       DEGRADED     0     0     0
            replacing-0                                  OFFLINE      0     0     0
              luks-80ec16d0-3d1f-4ed5-a4f0-0ee564547280  OFFLINE     52   191     0
              luks-fd4c98cc-d1be-4812-ae33-3650fdc984c0  ONLINE       0     0     0  (resilvering)
            luks-e7868ceb-fe11-4e40-a5e3-b6e56129b380    ONLINE       0     0     0
            luks-31f81d2e-36eb-4213-b101-e17d13907df5    ONLINE       0     0     0
            luks-c265d6da-5c40-435b-9f95-c2512c5a8bc5    ONLINE       0     0     0

errors: No known data errors

Had to take the 300GB from another pool until I get a replacement.

zpool offline pool2 luks-80ec16d0-3d1f-4ed5-a4f0-0ee564547280
zpool replace -o ashift=9 pool2 luks-80ec16d0-3d1f-4ed5-a4f0-0ee564547280 luks-fd4c98cc-d1be-4812-ae33-3650fdc984c0

I think I’ll wait for the missing hardware for the additional 8 bays and then migrate from 4x300GB RAIDZ1 to something else. Maybe 6-disk RAIDZ2 with online spare. Better to use those disk that I have. Or maybe get a bunch of small SSD disks from work.

Seagate Archive 8TB: SMR technology and ZFS

“Ordinary” spinning disks use PMR but it has become near its limits and manufacturers are, and have been, developing new replacing magnetic technologies.

HGST has it’s Helium drives which utilize SMR technology in place of more common PMR and the Seagate Archive 8TB is also SMR based hard drive.

This of course was not mentioned on local store’s web pages but luckily I did some research into the drive.

It turns out these drives they have internal firmware based software to take care of all the intricacies of SMR, since it is somewhat complicated, and there should be no problems using them like any other hard drive, but because of their differences they may behave differently.

That is a highly technical talk from HGST and some discussion with OpenZFS developers. I watched this yesterday before falling a sleep and couldn’t really catch how SMR differes but it seems that it is better at sequential writes than random writes, which are problematic, and reads, be that random or sequential, should be relatively fast.

But the bottleneck is random writes for which I believe these drives have internal “normal” PMR type section which is used to do some tasks related to that, to not completely destroy the performance.

But since ZFS is copy-on-write filesystem, random writes should not pay that big of a part. That is my understanding.

But these drives are so cheap, and since they are meant more or less for cold data, that I would probably myself go for the Seagate Archive 8TB, which is available from Germany for 233 € which is pennies.

Cheapest 3TB drive that I can find is 88 € so it is exactly the same price per Gigabyte but with less drives, or with room for more 8TB drives and more storage capacity in a single machine.

HGST uses Helium because of it’s density and how lower density reduces friction which reduces heat and makes the spinning platters more stable, and hence the drive more reliable; I am not sure if Seagate uses Helium – probably not, but they are still able to offer 36 months (3 years) warranty.

seagate_archive_8tb_sata_main_4kwrite_avglatency

So probably not for ordinary usage, but for cold storage should be fine. Place the data once, rarely if ever modify, and read once in awhile.

And found this great looking paper:

fast15-paper-aghayev
Cache type and size: The drives use a persistent disk cache of 20GiB and 25GiB on the 5TB and 8TB drives, respectively, with high random write speed until the cache is full. The effective cache size is a function of write size and queue depth.
This here illustrates the basic operating principle:
To decrease the bit size further, SMR reduces the track width while keeping the head size constant, resulting in a head that writes a path several tracks wide. Tracks are then overlapped like rows of shingles on a roof.
So the write head writes over multiple tracks while writing one single track, and the tracks which are being written over as a side effect will be copied to different place prior to overwriting them.
When drive gets full there are no empty tracks spaces left and all new data must first copy tracks to safe places, which might have to copy data to safe places, which might have to copy data to safe places [et cetera] and then the new data can be stored.
And all this because the write head cannot be made any smaller.
And what was said one paragraph before is said differently in this document as well; about the cascading relocations of data:
Modifying any of this data, however, would require reading and re-writing the data that would be damaged by that write, and data to be damaged by the re-write, etc. until the end of the surface is reached. This cascade of copying may be halted by inserting guard regions —tracks written at the full head width—so that the tracks before the guard region may be re-written without affecting any tracks following it, as shown in Figure 2.

Attachments

Modernization of storage Part II (ownCloud)

So in previous article I talked about my need to modernize my storage system and move to cloud based one, and then discussed briefly about ownCloud and deemed Pydio to be of poor quality.

Then I wanted to look if the one other alternative, ownCloud was any better and to my surprise its code base is much cleaner and it has “deeper” structure.

And because ownCloud provides the same sync feature and it provides clients for basically every platform, mobile ones included, this is definitely better choice.

But more on this hopefully later at some point.

Modernizing personal data storage system

Looking into moving some sort of Web based personal storage system, such as https://pydio.com/en which looks extremely good on the surface, haven’t had chance to try it yet.

Another one which came about was ownCloud but to be perfectly honest it doesn’t seem as finished and clean as the other one.

One thing all these must have is they must use the underlying filesystem and store files in some sensible way. By which I mean that in case anything ever goes wrong there must be simple way to migrate the data to another system or ditch any management systems all together.

In other words, any system worth considering should simply store the files on filesystem and act as Web based GUI to it, with some intelligence such as indexing and searching and other things of that nature.

Or if not then the system must be absolutely fool proof, exactly like any ordinary filesystem pretty much is.

But I think I am sold for Pydio because if you look what they say, it seems pretty awesome:

pydio-features

Because just as I am writing this text at not home, and storing these images on Mega.nz cloud, so that the files are available also at home; Pydio has this exact same feature which makes it extremely powerful!

So you can create your own personal storage cloud and use it everywhere, securely.

Pydio also has demo available at https://demo.pyd.io/ (demo / demo)

But the source of at least for this driver does not impress me at all:

https://github.com/pydio/pydio-core/blob/develop/core/src/plugins/access.fs/class.fsAccessDriver.php

Check the size of those functions. So while it on surface looks just fine, under the hood it is a mess. And I very much detest using switch-statement like that.

That’s a fast way to develop because it doesn’t require any thought into structure of the code. But the result is very poor and quality suffers.

And after seeing that I am not that trusting towards this system any more.

The same style continues elsewhere: https://github.com/pydio/pydio-core/blob/develop/core/src/index.php

And they have had time to paste the license at the beginning but not enough to document the code properly. Also the question is why haven’t they done this with some existing framework but instead rolled their own?

So those are some improvement points.

Studying CakePHP: Internal engine

After quite some time I am back to working with pure MVC framework and this time CakePHP.

Doing hands-on studying and found this nice diagram describing the internal logic from request to response.

Looks straight forward. But wouldn’t be so obvious if there wasn’t such image available.

typical-cake-request

Still lot to learn because it isn’t intuitive how the framework should be used and what the structure of application should be; which goes where and which is the most optimum way of doing things.

Revisiting the home data center architecture

If all goes well I will be adding one or two extremely powerful and new servers in the coming months.

Those servers use 2.5″ disks so the only question is how to implement large scale storage system. I have an old E6600 based server which would be perfectly fine if two 1Gbit connections were trunked together to get 2Gbit iSCSI connection.

2TB in 2.5″ form factor seems to be most cost effective, and prices for 3TB are beyond economical. So if one server could take 4 disks that would in mirrored configuration give 2TB of storage with some faster storage in form of SSD; left over from L2ARC and SLOG.

The old DL360 G3 would be dedicated to only work as firewall and traffic shaper and routing and switching would be moved to dedicated managed gigabit switches.

Also now all servers boot from NFS which has proven to be good, but problematic in case of failure in that NFS server, which has potential to either lock or bring down all the other servers. So NFS would be removed in favor SSD based mirrored ZFS root.

One question mark is my current networking setup which relies heavily on Linux, and which would need to be ported to managed switches. It shouldn’t be a problem, though, since it is technically all VLAN based with some bridges with more specific rules; so those would need to addressed somehow.

Also something like pfSense could be considered. But with firewall and router, if such system is used, I would like to move from i386 to 64bit architecture because currently there have been problems with not enough memory. HP ProLiant DL380 G5 might suit the purpose perfectly as a low cost server.

Quad gigabit PCIe network cards seem to be quite cheap so with three slots it would act as 12-port gigabit router. That would enable either the current Linux-based routing scheme or transition to something like BSD based pfSense. BSD has a reputation of being network oriented system and some studies have demonstrated that it performs extremely well as a router.

But one thing to remember with Linux/BSD based routers is to make absolutely certain that the driver support for network cards is perfect. Otherwise the stack will fall apart. Dedicated routing hardware works perfectly because it has been built to match perfectly with what it was built to be — router and nothing more.

So if the new QEMU/KVM hypervisor would set me back 400 €, disks perhaps 500 €, router 300 € and one or two additional small switches yet another 200 € and 1400VA UPS 250 € then the price tag woud be 1 550 € which isn’t too bad.

That cost would hopefully give me room for another 3 years at least and 2TB of storage and possibility to expand that storage to 14TB by using the router as FC based storage node by dropping 4 gigabit ports to accomodate for the FC card.

Windows Server 2012 isn’t that bad after all (IIS and PHP)

Everything so far has gone simply by clicking buttons and choosing what to install. This is completely 360° difference to Linux.

IIS was installed by choosing features and PHP can be installed similarly easy by Microsoft produced installer. The only downside is that at least for this version of IIS they offer 5.3.24 which is old.

This is certainly easier than setting up Linux, at least at this level it is. But I am sceptical in a sense that I am certain it isn’t like that. But these are apples and oranges so it really makes no difference.

snap1204

And I sort of like this that this gives one the ability to do nothing but actually what’s needed to be done. I mean, in Linux you can fiddle the rest of your life and it changes little. So Linux by this impression is 1000 times more versatile but it comes with downsides.

And then when this happens in Windows I don’t like it because it never gives you any explanation and the help it offers has no real use:

snap1205

Found information that this requires NET 3.5 and oddly it wasn’t installed along with this. So currently installing that.

snap1206

And the odd problems with Windows Update are apparently solved with this: https://support.microsoft.com/en-us/kb/2734782

Or so they say:

snap1207

I am seriously starting to dislike Windows.

http://blogs.technet.com/b/askcore/archive/2012/05/14/windows-8-and-net-framework-3-5.aspx

This is beginning to seem like absolutely retarded thing. Why haven’t they fixed this? Their system isn’t working!

Finally after surrendering and attaching the installation media, NET 3.5 finally was installed.

And after I uninstalled the Web Platform Installer because it failed, it no longer lets me install it again. What a poor system this is.

One good thing came about:

snap1208

And there is everything from CakePHP to Drupal and Joomla. Presumably you can drop these in for configured domains and hosts.

MySQL 5.5 didn’t install but after 8 hours was still stuck on Installing. So there is something wrong with something definitely. Either in Windows or my setup.

And second attempt fails like this:

snap1209

It is becoming clear there is something deep within Windows that I am not currently getting. Or then it just sucks, but I find that hard to believe.

But this doesn’t matter too much, because I will be using SQL Server anyway.

FTP was easy to enable. Resided within IIS.

ZFS: L2ARC and SSD wear leveling

I noticed this in arcstats:

l2_write_bytes                  4    260231528448

In less than 24 hours it has written 240GB into 20GB partition. That’s quite a hell of an impact on such a small space on an SSD, but I assume much of this is because I had to move large amounts of data back and forth.

But this is definitely something that must be monitored because my daily backups could theoretically eat away that SSD quite fast. Especially since I am in process of making new backup system which would verify large amounts of previous backups every single day.

Also the hit-ratio is extremely poor:

l2_hits                         4    2496
l2_misses                       4    5801535

So it might not even be smart to use L2ARC at all for this pool. Seems more random than ZFS can make use of.

233 Media_Wearout_Indicator 0x0032   000   000   000    Old_age   Always       -       655

 

ZFS: Deduplication on SSD pool

Works just fine unlike the HDD tests previously. I have no ZIL nor L2ARC for that pool but because DDT is on SSD and is therefore fast, the problem of evicted DDT from ARC doesn’t become such an issue.

DDT-sha256-zap-duplicate: 130595 entries, size 286 on disk, 141 in core
DDT-sha256-zap-unique: 841291 entries, size 301 on disk, 157 in core

dedup = 1.14, compress = 2.36, copies = 1.00, dedup * compress / copies = 2.68

Small 120GB SSD so that additional 14% saving there becomes handy.

Saving old, practically identical CentOS images seem to deduplicate quite nicely.

Edit: and after all the images were copied, the deduplication went up quite a bit, along wit the compression:

# zdb -D vm
DDT-sha256-zap-duplicate: 277827 entries, size 288 on disk, 141 in core
DDT-sha256-zap-unique: 1251538 entries, size 303 on disk, 158 in core

dedup = 1.36, compress = 2.49, copies = 1.00, dedup * compress / copies = 3.39

So it’s storing much more data than the whole drive is in size. Giving me essentially 170GB SSD for the price of 120GB. The server and setup used isn’t high-end and there is no need for superior performance so the hit from deduplication combined with heavy compression doesn’t affect me much.

The additional things that I can do with that extra 50GB is warmly welcomed.