Cataloging the tape content

So how I am going about this are couple of simple Linux commands.

First to save information about all of the files:

find -ls . >backup2.content.8.txt

Which will store the content of file 8 of backup number 2. Same process for all of the files on the tape (or logical backup, more precisely).

Then the md5 checksum of every file on the tape:

find . -type f -exec md5sum {} \; >backup2.content.8.md5

Then both of these will be stored on a archival grade CD-R discussed earlier and kept with the tapes at all times.

I will probably fill most of the whole CD-R with this and the other meta data just to be sure that if one part of the media gets damaged, then the other parts will still have this information on it.

Tapes for security

Hadn’t thought of it before, but not having the data online is a HUGE added security bonus, because I could argue that 99,999% of all data related risks come from data being connected to the internet. It is rare that someone would want the data so bad that they would actually see the trouble and the risk to physically get the data.

I have previously used encryption to manage this problem, and sealed the data that I don’t need at any particular moment, but that still isn’t as good as having the data offline.

Also security through obscurity is better with tapes, because I could probably leave the tapes unencrypted and I could still have some confidence that whomever happened to got the data, by which ever means, they would have some trouble getting the data off the tapes.

Speed vs. compress ratio compromises

Had to make some sacrifices with the compression algorithm and the level of compression, because the network failed at the end of the copy, and I do not want to wait another 15 hours to have one of the two full backups done.

So I switched from bzip2 to gzip, from pbzip2 to pigz, and instead of level 9, I am not using level 6, which should give roughly the same compress ratio as higher levels, but without sacrificing the speed too much.

The biggest slowdown are the small files that I need to copy, while the larger ones move to a tape at around 67MB/s. I am CPU bound even with pigz -6, so that is the bottleneck with speed not the tape.

Other bottlenecks include aespipe, which isn’t multi-threading. But it is still a lot faster than GnuPG, which I had to switch away from because it was way too slow.

Encryption wouldn’t be a problem if I had AES-NI, but my hardware isn’t recent enough, so this is something that just have to be dealt with.

So I am now running something like this on one machine:

tar --exclude=/abc --selinux --acls --xattrs -pcvf - /data | pigz -6 | nc 1.2.3.4 4444

And something like this on the machine responsible for the tape:

nc -l 4444 | aespipe -P /passphrase -K /key -e AES256 | pv -t -e -r -b -s 600G -p -a | dd of=/dev/nst0 bs=1616940

Also because you cannot know how much data is on a tape, pv is very handy to give you rough estimate at the end of the process.

Even with the hardware compression turned on, it should be close enough to estimate the capacity accurately.

Accurate space usage information

Got the following output from pv as input data ran out:

562GiB 6:04:01 [26.4MiB/s] [26.4MiB/s] [====================================================================================================================================================>            ] 93%

So that was 562GiB or 604GB, and the tape should have approximately 196GB left.

Also bought couple of the older LTO-3 tapes as I got them for a cheap price of about 2,1 euro cents per gigabyte. About the same as these 800GB tapes. Will not pay full price for the tapes but instead will get these new old stock tapes whenever they come for sale. People have these laying around and have no use for them so they sell them for pennies a piece.

Accuracy of the estimate

The tape should have had 196GB left but it took 205GB instead. So either the capacity really wasn’t 800GB, but 807-809GB, or the hardware compression was enabled and it was able to squeeze that 7-9GB out of encrypted data, which I cannot believe. But the method is pretty accurate and I was very, very close to backing up all that I wanted to backup onto this tape.

First tape backups

tape-backup-linux

This gives me only about 28MB/s write speeds because I am limited by CPU on this one other machine, where I send the backups from, over the SSH.

Opted for bzip2 -9 because this backup will not be accessed in many years. Xz would have given maybe few gigabytes more over the 800GB tape size versus bzip2 -9, but the time difference would have been an order of magnitude to its disadvantage, so it doesn’t pay off.

This should take about 8 hours for one tape and I need to have these in two copies.

And when it comes to storing the keys for a tape with life span of 30 years, it too becomes worth a consideration:

flash-memory-data-retention_hpca15
0015_nandflash_io_20141211

For example if you take a look at this graph, you can see how it doesn’t matter how long your tape will last, because if you didn’t know, you would not be able to read the data on the tape after only a few years:

serious-problem

So the key storage device would have to be SLC, which would then last as long as the tape itself. Of course you could refresh the data on the thumb drive and have enough copies so that the data could be re-constructed despite the corruptions, but that is an additional hassle.

One of these:

hercules-p_ufd_g4_slc_spec

But still only 10-year data retention. But because things like this are usually over-engineered these might keep the data alive a lot longer. 10 year specification could be under certain test conditions, so that too should be checked.

https://www.amtron.com/USB_flash_disk.htm

First tape drive tests!

Partial success after rescanning the bus:

[  504.487417] scsi 3:0:3:0: Sequential-Access HP       Ultrium 4-SCSI   W51D PQ: 0 ANSI: 5
[  504.487443] scsi target3:0:3: Beginning Domain Validation
[  504.494441] scsi 3:0:3:0: mptspi: ioc1: IDP:ON
[  504.494507] scsi 3:0:3:0: mptspi: ioc1: IDP:ON
[  504.494570] scsi 3:0:3:0: mptspi: ioc1: IDP:ON
[  504.494633] scsi 3:0:3:0: mptspi: ioc1: IDP:ON
[  504.494696] scsi 3:0:3:0: mptspi: ioc1: IDP:ON
[  504.494758] scsi 3:0:3:0: mptspi: ioc1: IDP:ON
[  504.494822] scsi 3:0:3:0: mptspi: ioc1: IDP:ON
[  504.494885] scsi 3:0:3:0: mptspi: ioc1: IDP:ON
[  504.494948] scsi 3:0:3:0: mptspi: ioc1: IDP:ON
[  504.500745] scsi target3:0:3: Domain Validation skipping write tests
[  504.500750] scsi target3:0:3: Ending Domain Validation
[  504.500821] scsi target3:0:3: FAST-160 WIDE SCSI 320.0 MB/s DT IU RTI PCOMP (6.25 ns, offset 64)
[  504.502881] scsi 3:0:3:0: alua: supports implicit TPGS
[  504.504309] scsi 3:0:3:0: alua: port group 00 rel port ffffffff
[  504.504964] scsi 3:0:3:0: alua: port group 00 state A preferred supports tolusnA
[  504.504969] scsi 3:0:3:0: alua: Attached
[  504.506534] scsi 3:0:3:0: Attached scsi generic sg1 type 1
[  504.534442] st: Version 20101219, fixed bufsize 32768, s/g segs 256
[  504.535363] st 3:0:3:0: Attached scsi tape st0
[  504.535370] st 3:0:3:0: st0: try direct i/o: yes (alignment 512 B)
[  504.548919] osst :I: Tape driver with OnStream support version 0.99.4

The drive was detected correctly. And I believe the second 3 is the SCSI bus id, which was set to 3.

Also getting the status:

# mt -f /dev/st0 status
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x46 (LTO-4).
Soft error count since last status=0
General status bits on (41010000):
 BOT ONLINE IM_REP_EN

Beginning of Tape, Online and:

GMT_IM_REP_EN(x): Immediate report mode. This bit is set if there are no guarantees that the data has been physically written to the tape when the write call returns.
It is set zero only when the driver does not buffer data and the drive is set not to buffer data.

Which I suppose isn’t a problem.

But hardware encryption does not seem to work:

# ../../../bin/stenc -f /dev/st0 -e on
Enter key in hex format:
Re-enter key in hex format:
Provided key length is 128 bits.
Key checksum is 550.
Set encryption using this key? [y/n]: y
Turning on encryption on device '/dev/nst0'...
Sense Code:              Illegal Request (0x05)
 ASC:                    0x26
 ASCQ:                   0x00
 Additional data:        0x00000000000000000000000000000000
Error: Turning encryption on for '/dev/nst0' failed!

Which I sort of expected as the drive doesn’t have the Encryption LED on front panel. But I can do software encryption.

The drive is making winding type of noises so we will see what will come out.

tar -cvf - /directory | pbzip2 -9 | aespipe -e AES256 -P keys.txt | dd of=/dev/st0

AES256 encrypted bzip2 compressed tar archive.

Not getting I/O errors or anything, so I suppose the data must be going somewhere.

Works

It definitely works. Wrote two separate tar archives on a tape so I am practically ready to do some serious backing up. But I am not quite comfortable doing multi-tape backups.

# sg_logs /dev/nst0 -p 12
    HP        Ultrium 4-SCSI    W51D
Sequential access device page (ssc-3)
  Data bytes received with WRITE commands: 2 GB
  Data bytes written to media by WRITE commands: 2 GB
  Data bytes read from media by READ commands: 2 GB
  Data bytes transferred by READ commands: 2 GB
  Cleaning action not required (or completed)

Very old tape drive apparently:

# sg_logs /dev/nst0 -p 20
    HP        Ultrium 4-SCSI    W51D
Device statistics page (ssc-3 and adc)
  Lifetime media loads: 1214
  Lifetime cleaning operations: 1
  Lifetime power on hours: 49012
  Lifetime media motion (head) hours: 7370
  Lifetime metres of tape processed: 50738775

I don’t even understand how 50 000 km of tape is even possible. But if that is even half true, then that says something about the validity of tape as a backup solution.

Not sure why I was getting extremely poor performance, but either changing from aespipe to openssl, or from 256k block size to 1,616,940 byte block size made a huge difference, and now this is what I am getting for writing and reading:

1913264976 bytes (1.9 GB) copied, 47.6988 s, 40.1 MB/s
1913264976 bytes (1.9 GB) copied, 41.9134 s, 45.6 MB/s

Which are pretty damn good speeds and will let me backup 800GB in 6 hours.

And this seems so good that during weekend I will probably try to see what I need to backup. I still don’t know how I should handle the 800GB per tape limit, because I do not want to have backups span multiple tapes. Unless they are actually easy to do and reliable.

Speed with large files seems to be closer to 100MB/s than the 45MB/s reported above.

But one problem is verification, and how to do it. Not absolutely necessary but better than blindly trusting the tape or the drive.

Brand new LTO-4 tapes for pennies

Five native 800GB data cartridges and one cleaning cartridge for 85,90 €.

serveimage-7

The cleaning cartridge alone is worth 60 € or thereabouts. And the data cartridges are about 30 €. And these are all brand new so that’s worth 200 € easily.

serveimage-6

But these 5 tapes will let me archive 1600GB of uncompressed data (1840GB @ 1.15x) for at least 5 or 10 years without too big of risk of loss of data. Then once that time has passed the LTO-5 or LTO-6 drives would have become cheap enough to transfer the data onto new tapes. LTO of course is two generations backwards compatible so there is little risk of not being able to retrieve the data after even 10 or 15 years.

Hard drives are so poor technology for backing up data because they require so much maintenance. They take a lot of space and a lot of power. And there is a huge risk of human errors. With tapes all of those are eliminated and you can leave the tapes as is for extended periods of time, and only care about the proper storage and constant environmental factors.

Never going back to hard drives for cold data.

And here’s more about the technology:

lto_data_tape_seminar_2012

And once up and running, need to check out this for managing the encryption: https://sourceforge.net/projects/stenc/ or perhaps Amanda or Bacula. Because I know nothing how to manage these things.

Attachments

Gaining on infrastructure

Gaining more infrastructure in form of one of these for 145 €:

serveimage (4)

And one of these for 16 €:

s-l1600 (3)

So these are obviously SAS controller and an external LTO-4 tape drive. I finally spared to spend the money to get the drive. Because this one was cheap. That is 1700 € drive new in Finland. I hope it works. It was advertised to be in great condition. And so if it works and stays in good condition for my once a month two-tape usage, then it was an excellent bargain.

Because tapes just are so much better for backups. You put them in device, you write them, take them out, and move to safe location. Then you bring old tapes back, write them, and the cycle continuous.

The price is little bit higher than that of a hard drive, but hard drives don’t have any of the tape advantages.

If it works, then what I will do is I will probably archive ALL of the historical data to tape in two copies, at two different locations, and then I don’t need to worry about all that data no more. Because data has tendency to grow in a cancerous manner. And the more you move old data around, and the more you do and test; the more old data you create.

So 800GB tapes are more than enough to each store quite a bunch of tests. For 30 € a pop.

Updates

Not 100% confident about the person who sold me the tape drive, and then said, after asking, that he shipped it, yet didn’t mark it shipped.

But at least the card is working to this extent:

48:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E PCI-Express Fusion-MPT SAS (rev 02)

Can be controlled with lsiutils, which you can download from here and is also attached to this post.

Main menu, select an option:  [1-99 or e/p/w or 0 to quit]

 1.  Identify firmware, BIOS, and/or FCode
 2.  Download firmware (update the FLASH)
 4.  Download/erase BIOS and/or FCode (update the FLASH)
 8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
 e   Enable expert mode in menus
 p   Enable paged mode
 w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 1

Current active firmware version is 01195400 (1.25.84)
Firmware image's version is MPTFW-01.25.84.00-IE
  LSI Logic
x86 BIOS image's version is MPTBIOS-6.22.00.00 (2008.04.10)
FCode image's version is MPT IBM SAS FCode Version 1.00.47 (2007.08.06)

Some more info here.

Attachments