Building more backups

So I have quite a good plan for Linux server and Linux desktop backups but virtual machines and especially VMware has been neglected and I even thought maybe I don’t need to do a backup but because that’s fundamentally a bad idea and because the VMware I have is the core of the infrastructure it is required to have a backup.

So Veeam was something that I had heard and decided to give it ago.

snap1470-2

Not fast with current setup but it doesn’t matter much because it will still finish in couple of hours. With compression it will probably take less than 20GiB for good amount of guests.

Also my Hetzner backup rent server was delivered today and looking at upload speeds it seems that I have approximately 1000GiB of daily upload capacity so network will not become a bottleneck for quite a while. I am expecting to be using somewhere around 20-35GiB daily.

The server had so much power that I can easily compress everything with gzip-9. Also storage isn’t going to be a problem because I expect to use roughly 500GiB for daily backups which will leave me with over 2500GiB unused space.

And I must say that the other systems I have built are pretty damn good as well. This is a clear step forward in terms of versatile backup plan.

Notes

So that I remember later.

# lvcreate -Z y -n thin1 -l100%FREE -T -r 0 vg0_sda5
# lvcreate -Z y -n thin1 -l100%FREE -T -r 0 vg0_sdb5
# lvcreate -n pool -r 0 -T -V 500G vg0_sda5/thin0
# lvcreate -n pool -r 0 -T -V 500G vg0_sdb5/thin0

And that is how thin pool is created. I use this method to create the filesystem on top of these thinly provisioned volumes to enable couple tricks and make the setup more versatile.

Virtualized FreeNAS thoughts

I will quickly summarize some of the problems that have come with such.

The main problem is that because VMs are stored on iSCSI which is virtualized, the concept of clean shutdown without any major hassle doesn’t really exist. Because, you see, VMware doesn’t allow me to detach the iSCSI storage when there are VMs registered from it.

Also I believe if even disks are used and VM configuration files reside elsewhere, there could be a problem, but of that I am not sure.

But the alternative is to have the disks somewhere else and even then there would be the same problem. So there is little that can be done about this, unless there is some way of detaching the storage without removing the VMs from inventory, because that is way too much work.

On FreeNAS side I get about 110MB/s read and write and occasional 140-150MB/s peaks so there isn’t much penalty from using RDM disks.

Encryption and GZIP9 is extremely heavy even with 6 core Opteron 2431 so secondary CPU will have to be added and there is also the question should the two be upgraded to some faster version.

Made some changes to provisioning of space before migration and getting quite good compressions, essentially doubling the available space:

snap1448

In archive there is highest compression because that is something that very rarely gets touched, and should in fact be stored somewhere else, because those are old VM images which most likely won’t see the light of day.

Then there is lz4 on iscsi/esxi1/live which is for fast 15K storage used for databases and desktops and other systems which need fast disk access.

Then iscsi/esxi1/slower is on different pool which has 10K drives and also uses GZIP6 and this is for systems which generally don’t care about disk speed like routers and very static Linux systems and so on. Systems which after they have loaded themselves into the RAM do not really use disk except for logging which isn’t something that would cause any problems with slower disk and higher compression.

All space used

Practically all free space has been allocated or used and the 600 odd gigabytes which I managed to fit into that server were just enough to get everything that I needed.

Some data has already been migrated and waiting for the secondary storage node for the final migration; there is some 700GiB of cold or archived data that must be stored somewhere else. I will get a 3TB drive for that and take use of all the old smaller drives to get some sort of 4-5TB archival system running.

I believe I have the following disks to spare after I get another 3TB drive:

  • 3TB
  • 2TB
  • 1.5TB
  • 1TB
  • 1TB

Tha latter three are extremely old drives and they will fail soon but until that I will take full use of them in some sort of setup.

Maybe something like this:

  • raidz1-0
    • 1500GB
    • 1000GB
    • 1000GB
  • mirror-0
    • 3000GB
    • 2000GB

Then the leftover 1TB could be used for some temporary data.

CloudStack as a management server

So after cursing around hopelessly, somehow this happened unexpectedly:

snap1276

I was having problems with adding the host but then it suddenly worked.

And after hours and hours of small incremental steps towards working first instance I now have this:

snap1278

But there are so many things I know nothing about. Networking is one of those things.

https://github.com/apache/cloudstack-docs-install/blob/master/source/hypervisor/kvm.rst

https://docs.cloudstack.apache.org/projects/cloudstack-installation/en/4.7/qig.html

And here we have Windows Server 2008 R2 being installed:

snap1279

Performance-wise the DL365 G1 with RAID1 10KRPM HP SAS drives is excellent as I get 160MB/s write – with encryption. So the Opteron 2356 is quite heavy performed in that sense too.

I am beginning to like CloudStack:

snap1280

It has quite clean firewall built in. Obviously not very configurable but for basic usage.

And on Hypervisor I immediately started saving memory with ksm:

Shared  Sharing Unshared        Volatile        Sharing:Shared  Unshared:Sharing        Saved
32,467  1,029,836       438,788         293,152         31.710000:1     0.420000:1              4,022M

And since swap is compressed I get some benefits there as well:

# zfs get compressratio dpool/swap
NAME        PROPERTY       VALUE  SOURCE
dpool/swap  compressratio  3.06x  -

Current problems

So the problem now is that because CloudStack will not allow over-subscription of thinly provisioned primary storage (more or less) the 68 available gigabytes won’t be enough for everything I want. So two 300GB SAS disks must be purchased ASAP to rectify the situation.

Also I am doubting my use of two servers, because to have one just for ESXi management is stupid. It can be used for other things but to have to have one to get ESXi souns like bad design. But I am not certain whether CloudStack is stable and quite frankly, interesting enough to only use that. ESXi is, still, a proper enterprise system.

One alternative to the 300GB disks is to use 4 73GB disks on RAIDZ1 which would be sub-optimal solution but might work. This would give me little over 200GiB of storage which would be enough.

Disk change

So I made the changes I discussed above and I am getting excellent write performance:

               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
dpool       25.1G   247G      0    807    357  15.2M
dpool       25.1G   247G      0  1.24K      0   157M
dpool       25.1G   247G      0  1.29K      0   164M
dpool       25.1G   247G      0  1.61K    511   204M
dpool       25.1G   247G      0  1.30K    511   165M
dpool       25.1G   247G      0  1.80K      0   229M
dpool       25.2G   247G      0  3.27K      0   117M
dpool       25.2G   247G      0  12.3K      0  50.6M
dpool       25.2G   247G      0  12.2K      0  66.8M
dpool       25.2G   247G      0  9.29K      0  33.3M
dpool       25.2G   247G      1  9.59K   1022  30.2M

There is some “thing” with ZFS and QEMU or QCOW2, that makes it impossible to use RAW ZFS with some settings of QEMU, or what not, and so because I didn’t want to go there, I just planted ext4 on top of thinly provisoned ZFS block device. Should not affect performance at all and could even improve it.

And these speeds are with gzip-9 compression and individual LUKS encryption on each of the four disks. And I am just about CPU bound. So I am extremely impressed with this AMD Opteron 2356 and it isn’t the fastest Opteron I got available.

snap1281

A lot of free space now.

Great score!

Scored HP ProLiant DL385 G6 for 100 €. Easily worth 200 €. It only appears to have some problems with one of the Broadcom chips and cannot PXE boot from those, nor is Linux able to configure them. lspci shows the two NICs/devices but they do not appear in system. But that’s a small problem because with one additional riser card I can have 6 PCIe expansion cards and the other two NICs do work so installation will be easy.

Also included was two 160GB Hitachi SATA disks and two 36GB Seagate SAS disks. But considering that I paid 100 € for DL365 G1, this was much, much better deal.

It has single hexa-core AMD Opteron 2431, and according to AMD that should be 50% faster than previous generation (23xx) Quad-core, which the DL365 G1 has. So ESXi will be moving to this server.

Also I am wishful, as I read, that these six-core Opterons have AMD IOMMU and I can do true passthrough for all the disks, bypassing any ESXi.

And thumbs up for HP engineer who programmed or engineered very nice soft ramp-up for those fans; it sounds professional when the fans don’t go full on immediately but they ramp up nicely. Same when they settle down.

But again, as it was with one 146GB SAS which I ordered, this too has disks which contain data. Apparently people in general aren’t too concerned about security. Of course these disks will be backed up for further forensic studies.

How to set it up?

With 8 bays I can now use two disks for ESXi in hardware RAID0, two disks for ZFS ZIL and then configure the remainding four as I wish. L2ARC probably doesn’t make sense to striped mirrors might be nice option.

I should also be able to move the existing 16GB to this machine, which luckily is PC2-6400P, since this server supports those speeds. And with 16 slots I can utilize those and get more, but bigger sticks. Fully loaded memory configuration would be 192GB which is more than I currently have disk space available for this machine. With 192GB it would be possible to do pretty much everything.

Server now has DVD-RW drive but I might want to replace it with tape drive and start taking backups to tape. Tapes are relatively cheap, easy to handle, and when stored correctly, can last for 30 years.

The broken NIC issue

It seems the problem is with Broadcom firmware: http://www.wooditwork.com/2014/04/25/warning-hp-g2-g7-server-nics-killed-firmware-update-hp-spp-2014-02/

So it might simply be because of this. It would fit well because the NIC is complaining it cannot start. And my server has

  • HP NC382i DP Multifunction Gigabit Server Adapter

Which can be affected.

Problems

I have no cache on Smart Array P400 so it is not possible to create enough logical drives on RAID that I need. So I might go and get me a P800 with batteries and cache.

Upgrades

Following upgrades are likely to be made:

  • Secondary power supply 40€
  • 64GB PC2-5300P 50€
  • 512MB BBWC + Battery 30€

Then later on also CPU upgrade to 12 cores total:

  • AMD Opteron 2431 ~40€
  • Heatsink 20€
  • Two fans for second CPU 30€

(40 + 50 + 30 + 40 + 20 + 30)€ total of 210€ so more or less fully equipped and redundant the total price for this DL385 DL6 would be 310€ which is quite cheap. Same from eBay would be twice that.

Another issue is the dust. Because couple days ago I had a disk fault on one server and when I replaced the disk the old one was full of what ever flows in air.

Infrastructure in the making

So messy with new infrastructure given birth to.

Got the new HP ProLiant DL365 G5 sitten on the table; bunch of new Opterons coming in the mail, and negiotiating a good deal for 15K 2.5″ 146GB SAS disk.

It seems VMware won’t allow me to use USB as storage space so I had to spare two of the disks for local storage.

Using rest of the disks like this: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2046370

Performance

The disks are now in striped mirror configuration and without encryption the peak write performance is about 80MB/s which is pretty good. But with encryption it drops to about 25MB/s.

Hopefully the four additional cores will rectify that. But regardless whether it will or won’t, this is the price that we will have to pay for secure storage and compact system.

But I will be trying different configurations to get better performance. RAIDZ1 is something to consider if it works fast enough. For example 3 drives in RAIDZ1 and then one as a spare. That’s 144GB and if performance will suffer when we close in on 70% full it would then give us about 100GB to store all the important things, which with compression could very well rise back to 144GB effective usable space.

Talking about disk space and requirements for it, they aren’t that plentiful;

  • Firewall 5GB
  • Router1 5GB
  • Router2 5GB
  • Web Server 30GB
  • E-Mail Server 30GB
  • DHCP Server 5GB
  • DNS Server 5GB

Which at 85GB and with 30% compression would use just 60GB, leaving quite a bit of room to keep the performance in check.

I also have that one 15K 146GB disk coming in so it could be used to replace one of the 10K drives.

Testing

I have previously done things without testing and then later on realized that something wasn’t perhaps easy or possible or good idea, so I just pulled one physical disk (single disk in RAID0) out, and Solaris didn’t like it. I might have been able to shut it down cleanly but I didn’t try that. And then I shut the hypervisor off for the night and now I will see how the pool looks like and if I can replace the bad drive and then re-silver and scrub.

Seems to work just fine, and is quite a bit more thoroughly thought than any Linux (just saying):

# fmadm repaired zfs://pool=90885cc145609703/vdev=f483d79088288cd4/pool_name=data0/vdev_name=id1,sd@n6000c295d6bfe41de7098e7e23d6060a/a
fmadm: recorded repair to zfs://pool=90885cc145609703/vdev=f483d79088288cd4/pool_name=data0/vdev_name=id1,sd@n6000c295d6bfe41de7098e7e23d6060a/a
root@zfs:~# zpool status data0
  pool: data0
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function in a degraded state.
action: Wait for the resilver to complete.
        Run 'zpool status -v' to see device specific details.
  scan: resilver in progress since Wed Jan  6 15:01:19 2016
    1.60G scanned
    80.1M resilvered at 73.3M/s, 22.42% done, 17s to go
config:

        NAME        STATE     READ WRITE CKSUM
        data0       DEGRADED     0     0     0
          raidz2-0  DEGRADED     0     0     0
            c2t1d0  ONLINE       0     0     0
            c2t2d0  DEGRADED     0     0     0  (resilvering)
            c2t3d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0

errors: No known data errors

Scrub happens at 150MB/s so there isn’t too much penalty from this setup perhaps:

# zpool status data0
  pool: data0
 state: ONLINE
  scan: scrub in progress since Wed Jan  6 15:02:53 2016
    2.89G scanned out of 3.10G at 156M/s, 1s to go
    0 repaired, 93.40% done
config:

        NAME        STATE     READ WRITE CKSUM
        data0       ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            c2t1d0  ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0

errors: No known data errors

Installing iSCSI requirements:

# pkg install group/feature/storage-server
           Packages to install:  32
            Services to change:   1
       Create boot environment:  No
Create backup boot environment: Yes

DOWNLOAD                                PKGS         FILES    XFER (MB)   SPEED
Completed                              32/32     5620/5620  121.1/121.1  974k/s

PHASE                                          ITEMS
Installing new actions                     7180/7180
Updating package state database                 Done 
Updating package cache                           0/0 
Updating image state                            Done 
Creating fast lookup database                   Done 
Updating package cache                           1/1

https://docs.oracle.com/cd/E23824_01/html/821-1459/fnnop.html

# svcs -l iscsi/target
fmri         svc:/network/iscsi/target:default
name         iscsi target
enabled      true
state        online
next_state   none
state_time   January  6, 2016 04:14:47 PM EET
logfile      /var/svc/log/network-iscsi-target:default.log
restarter    svc:/system/svc/restarter:default
manifest     /lib/svc/manifest/network/iscsi/iscsi-target.xml
dependency   require_any/error svc:/milestone/network (online)
dependency   require_all/none svc:/system/stmf:default (online)
# itadm list-target -v
TARGET NAME                                                  STATE    SESSIONS 
iqn.1986-03.com.sun:02:cdc07458-cafc-490a-98fe-8fed3f415960  online   0        
        alias:                  -
        auth:                   none (defaults)
        targetchapuser:         -
        targetchapsecret:       unset
        tpg-tags:               default

Performance is appalling. 3MB/s write. From Solaris to ZFS to physical disks I get 40MB/s so the problem is somewhere between VMware and Solaris iSCSI target. 3MB/s is not acceptable. I would be willing to accept maybe even 20MB/s for data safety but 3MB/s is like writing to an old USB stick.

As a side note, what I like about Solaris is this:

ipadm create-addr -T address-type -a address/prefixlen addrobj

Everything is very similar to objects. What I mean by that: things are configured similarly to well-defined classes and objects. Tools are available to modify and create these objects. It makes the management very robust.

https://docs.oracle.com/cd/E26502_01/html/E29008/iscsi-4.html

http://thegreyblog.blogspot.fi/2010/02/setting-up-solaris-comstar-and.html

https://docs.oracle.com/cd/E23824_01/html/821-1459/fnnop.html

https://docs.oracle.com/cd/E23824_01/html/821-1458/gjwiq.html

It seems that I am CPU bound so with 4 extra cores the speed should go up. 10MB/s is currently pretty much the norm.

Solaris iSCSI target

I had an issue that one of my LUNs wasn’t visible and because another one was, I quickly realized that the encrypted one was missing.

What I had to do was to import the LU and then it became visible.

root@zfs:~# stmfadm import-lu /dev/zvol/rdsk/data0/encrypted/esxi/LUN0 
Logical unit imported: 600144F07E7943000000568D3BEE0001
root@zfs:~# stmfadm online-lu 600144F07E7943000000568D3BEE0001
root@zfs:~# 

After which I had to remove (from inventory) all the hosts using that datastore, and then re-create the datastore, and then import the guests back to inventory. So the process is so laborsome that shutting down the server is a bit of pain.

pfSense is nice

It just is. So many options, extra packages, and everything in GUI. Can easily define interface-based rules to exactly match what is needed.

snap1252

We shall see how it will handle two real physical connections and bunch of virtual connections on top of those.

ZeroShell

Testing ZeroShell as router. Looks good:

snap1254

I have cables running everywhere and I feel lucky that I bought 48-port switch back in the days.

I had thought that I would use VMware as an additional L2 switch and simply add large number of NICs but VMware has a limitation on maximum number of NICs so I will probably have to use internal 10GbE VLAN trunks instead. I can in VMware assign VLAN ids and then in router I can setup virtual interfaces for each VLAN.

And by the looks of ZeroShell firewall management window, should probably be quite capable:

snap1255

Security on higher OSI levels

One key aspect of keeping the infrastructure safe is to figure out how to secure the higher levels, because Web for instance practically goes through all firewalls if you use it on desktop within the network which is supposed to be secure.

I intend to use Squid and blacklists of addresses and services which I do not want to get in contact with, such as Google and Facebook. It was by the way impossible to find articles about blocking Google, because, as I suspect, people cannot even think such possibility would exist, but it actually is rather simple.

Google has small number (dozen) large networks in which all their services live, so blocking these would do the trick. And because Google is quite open as a company, I do not believe they would do anything behind your back, and outside their advertised address spaces.

Other dangerous things include the thousands and thousands of advertising systems which are on a regular basis used to spread malware.

VMware iSCSI storage on virtualized Linux

That’s a mouthful, and I am not even done yet.

But got the basic iSCSI running and VMware is now recognizing it.

snap1246

This will be used to test the raw iSCSI performance.

The new iSCSI storage is available to be taken into use:

snap1247

And now to the fun part: installing an operating system to this iSCSI which runs on virtualized machine on the same host that is using it as storage. So Linux is essentially a driver.

First thing that I notice is this:

snap1248

Target is sending small amount of network traffic to initiator, while installation process is practically stalled. It has not proceeded as it would have if it wasn’t iSCSI.

VMware also reports some oddity on the guest:

snap1249

CPU usage shouldn’t look like that.

Decided to abort the installation and try the disk on existing machine, but got something that I have never seen:

snap1250

But I get something:

snap1251

32MB/s isn’t too bad, considering this is underpowered server and production will have more juice and will be using raw disks instead of file based storage like this one, where the data from one guest goes to VMware, then through iSCSI to target, and from there back to VMware and to file based storage on local disk. So there is additional complexity.

How to configure the new server

Continuing on previous post on the topic.

Idea

Because the server has six SAS disks and because I am in favor of ZFS, mostly for the same reason that I am in favor of ECC memory, but because I want to install VMware, and because it doesn’t support ZFS, I came up with an alternative plan.

DL365 supports internal USB stick and can boot from it, so I figured I can install VMware on it, and then make the stick large enough to allow one Linux instance, and then give all the physical disks to that instance, use the Linux to do ZFS, and then share the ZFS as datastore for VMware over NFS.

I think that’s quite brilliant idea if it works to any degree. It should be fast because the traffic would be internal, all within the Hypervisor and the VMware kernel. So the link speed should be like 10Gbit or something.

V04_BlueArc-WP-Seven-Myths-NFS

Of course this setup would have some overhead but with physical disks it should not be much.

Edit: Found the following comment:

ZFS is well suited to storing your data safely. It doesn’t perform so well when you do things like throw tons of sync writes at it like ESXi + NFS does without significant hardware. iSCSI + ZFS with VMs can have its own problems with fragmentation(which NFS can have also) and will often need an L2ARC to handle the high iops you will want. iSCSI is spared the penalties of the sync writes, but isn’t spared of high system requirements like lots of RAM and L2ARC. ZFS has no defrag tools, so you need the pool to be able to manage its fragmentation for its free space. Our VM expert recommends pools that are at most 60% full for maximum pool performance regardless of fragmentation.

So this still needs to consideration.

NFS is good for file level access — if you need to store million small 4K files, then NFS will write each as such. Having knowledge about separate files is exactly where ZFS can benefit a lot from its variable block size feature, doing many things much more efficiently. But if you create one huge file and start “randomly” overwriting it over NFS, ZFS will have no idea about what it is and will access it all in fixed record size chunks.

Perhaps iSCSI instead:

iSCSI_design_deploy
iSCSI_Multipathing_in_Ubuntu_Server

Disk configuration

I figured I would go with RAIDZ2 because it’s the smartest way to go and the best compromise between speed, space and reliability. RAIDZ1 would technically work too but it would create a problematic situation when one disk is rebuilding, plus one disk would have to remain as a spare, so that would essentially be one disk wasted. And 6 disks of double-parity vs. 5 disks of single-parity both give 288GB. Z2 is slower but not by large enough margin to void it.

Also because I do not intend to keep 2.5″ SAS disk laying around, waiting for disk problems, it would be dangerous to have only single-parity because if one disk failed, the whole pool would then be at danger, and for as long as it would take me to get a replacement.

Or perhaps go with striped mirrors for best performance and reliability. It would “waste” half of the disks but the difference between Z2’s 288GB and striped mirrors’ 215GB is only so small that it doesn’t make much of a difference if the faster option is used.

Or three-way mirror which is sort of my favorite. 6 disks so 3-way mirror of 2-disk stripes. It would only be 146GB which is not much but the read performance would be phenomenal and the write speed should be roughly double the speed of single disk.

I think three-way mirror of 2-disk stripes is the way to go.

Hardening the network

I will also be placing a firewall in front of my network. The idea I had some time ago was that in secure network there should be multiple firewalls from different manufacturers because if there is only one firewall or if the whole infrastructure is same architecture this will create a security problem.

So the firewall would be BSD based and it would act as very general firewall and then there would be Internet router and Internal router. Both which would have zone specific rules.

So when BSD gets compromised the attack could not proceed further by using the same attack vector. It would require compromising both BSD and Linux systems which would slow the attack tremendously.

Currently the configuration is so that there is only one router which does it all. And it creates very messy configuration and makes it difficult to manage all the different configurations and aspects. So by virtualizing the one server into multiple virtualized routers it would give this clean separation of responsibilities.

If I need to route internet it would be done on Internet router and if I need to route Internal networks it would be done on Internal router.