Modern 2TB Western Digital Red vs. old U320 SCSI 15K RPM 36GB

Which one do you think is faster in IO? You guessed right, the old U320 SCSI disk. Not even a contents between these two.

graph_image (19)

Read IO average 9.1ms for the Western Digital.

graph_image (20)

And 1.8ms for the U320, which is hardware RAID0, though. So perhaps this is not fair comparison. But goes to show that old hardware used right beats the new hardware when used for something which it was not meant.

So as I earlier discussed about the possibility of RAID-Z2 with 15K SAS disks — those would really kick ass in that setup.

One paper which I found is below. Haven’t checked it yet but seems good.

Whitepaper_CloudByte_Measuring-Storage-Performance

Or to have three-way mirror with 15K SAS disks. The read performance would be absolutely stunning. Then perhaps an SSDs as sort of write cache to boost the write speeds! Now that would be awesome setup.

SSD and TRIM on RAID controllers

A someone posted a comment on one of the posts and noted that the RAID controller should be TRIM aware to get optimal performance.

After doing some quick research and reading what the TRIM actually is, this in fact may be the case. Even though, on Intel forums there are discussions that it doesn’t affect the performance that much, or practically at all.

I believe this would also not be an issue in a situation where RAID is not used, i.e. the controller is used only as a source of additional SAS or SATA ports.

In such a case I would argue the controller would not interfere with the data going between the disks and the source of that data, but simply do some sort of conversion and de-/encapsulation of data.

That is, when the controller uses PCIe and receives data it would decapsulate the data and pass it to the disk in question. That would be my theory.

When using RAID this could not be so because the controller must do some intelligence and practically keep track in RAID0 situation where to send what.

It would however be interesting to test but I believe this is how it works or at least this is how I would design and implement it.

But, perhaps that is the naive implementation of modern hardware controller raid card, and the reality is that these cards do more than pass the data right through.

If they do caching even in non-raid configuration then I guess there is possibility they somehow refactor the data.

The following document could probably provide enough information to draw some general conclusions at least.

Enabling TRIM Support in SSD RAIDs

[pdf]http://www.informatik.uni-rostock.de/fileadmin/ava/pubs/fulltext/raid-trim-tr.pdf[/pdf]

And after reading that it seems that my understanding is correct but this paper didn’t reveal if when controllers are used in non-raid configuration; is the controller simply passing the commands through.

And the TRIM is definitely very important functionality regarding SSD write speeds. Because NAND cannot be modified in-place like HDD but the cells need to be erased before anything can be written.

So when something is written for the very first time it is super fast even if there is no TRIM, because no cells need to be erased.

But when all the cells eventually get written, then the speeds without TRIM start to suffer because when these cells are tried to be written on, the SSD must erase them at that particular time, because there is no background garbage collection which does this background as issued by TRIM commands from the operating system.

And the slowness of this process comes form the fact that cell at some cases must be erased but some of it still contains data, and this data must first be moved to temporary location, after which the cell can be erased, and the previously relocated data restored to its original position.

So if this process is not done in background on regular basis (TRIM) the write speeds will start to suffer after all the cells have once been written.

RAID Cards

According to Puget Systems article the earlier mentioned LSI 9240-4i pales when compared with Intel RS2BL080 or Intel RS2BL040 both of which cost over 150 € used.

intel1

Intel RS2BL040

intel2

Intel RS2BL080

Datasheet on Intel website.

I must say the amount of these cards is unbelievable and most of them seem very very good by their specifications.

dell310

Dell Perc H310 Adapter 8-Port Internal 6Gb/s SAS+SATA RAID Controller

6Gb/s and PCIe x8 for 67 € and 8 € P&P from China. Brand new.

So how can you choose? Since this one is for Dell PowerEdge servers it probably should be somewhat well performing.

And while taking into account the possibility of these cheaper cards not performing very well with high number of IO request the direct PCIe FLASH memory devices start to sound appealing again.

revodrive3_lrg

OCZ Revodrive 3

 

The Best Solution

I have clearly been thinking too much but I think I have finally reached some sort of conclusion with the best possible performance and it comes with low price and large window for expansion.

My worry was that I have PCIe v1.x which would mean that those cheap x1 cards would run only at 250MB/s which would become bottleneck for an SSD.

And now that I actually looked at it that would have in fact been the case.

Intel 5000X chipset which I have is PCIe v1.1 which is 250MB/s just like the original PCIe is.

So, that rules out any x1 card. And quite a number of other cards as well. In fact, x8 or x16 are the only cards that would provide wide enough bandwidth, say, if I would want to run more than one SSD.

So the only viable option for me are either the expensive NAND FLASH PCIe devices or what I think is much better option, the professional battery backed SAS/SATA RAID controllers.

Exactly like I initially figured.

These usually come in as x8 so that would give me 2GB/s full duplex which would be enough for 4 SSD fully saturated. Situation very unlikely to happen.

I may even sacrifice the x16 for this since I can’t think of any other use for it.

adaptec-12port

ASR-31205 ADAPTEC 12 PORT SAS SATA PCIe RAID CONTROLLER

Which would give me 12 SAS/SATA ports. More than enough for anything I can think of.

The 2U rack mount case can’t even accomodate that much extra disks even if SSDs and how small they are so that’s the end of that problem.

But still, could easily fit in at least 6 if I needed. And split the 3-5 watts of power for those off of the two SATA power present.

And it has support in Linux.

Or ASR-2405 which at times are available for less than $30. Lacking battery but could live without one.

2405

RAID-Controller Adaptec 2405 PCI-Express 4x SAS/SATA

I could have either of these. And 4 ports would be more than enough, and would save some money which would buy me more SSD.

But this would be best because it has 6Gb/s per port and 8-lane PCIe so it could handle the SSD speed by the port speed and have it transferred forward at great speed.

There is 4-lane model but it would be just enough for two fast SSDs and might just become the bottleneck.

6805

Adaptec RAID ASR-6805 2271200-R 512MB 6Gb/s SATA/SAS 8i-Port Controller Card

Sadly this costs about 150 €. So the 8-lane 3Gb for 40 € with 375 MBps per SSD or 150 € for full potential of any SSD.

Adaptec product page.

Pure SATA vs PCIe SSD/FLASH

That is another question.

Many a company seem to produce PCIe devices that host similar FLASH memory that SSD drives are made of.

Need to look into those as well.

The best solution I believe would be to buy fast PCIe FLASH in quantities of hundreds of gigabytes and then buy as large HDDs as possible (enterprise).

But since this would cost thousands of euros the next best thing is to buy single HDD largest one can afford and stick next to it the next best SSD one can afford.

Then when money comes in, buy another HDD, simple PCIe SSD card, stick the SSD to it and if it works then fine, and if it doesn’t then get the Sonnet and get another SSD to double the space, speed and IOPS.

The cheap ones are like this:

pcie-ssd

And there is very little hardware in them. But it might just do the job, barely. Or perhaps poorly, but one would only loose 30 € and best case save 250 € for not needing to buy real hardware.

Looking at hardware differencies there isn’t really that much to talk about:

sonnet

Because it almost looks like there is exact same chip as there is on this cheap one with fifth the cost of Sonnet:

pcie-ssd2

Perhaps just little less filtering and noise reduction or something. Sonnet has the possibility to add another SSD with included expansion module.

Then there apparently also is another Sonnet which has eSata for yet another additional drives.

But I believe I will get the cheap one first to test it out and if it works then that will be fine since it is only for caching and if it doesn’t work then I will sell it away and buy a real one.

And when everything fails I get this one:

ocz-1600gb

1600 GB just over $7000.

OCZ Z-Drive R4 C Series PCI-Express SSD CM88 1.6TB ZD4CM88-FH-1.6T

OCZ_Z-Drive_R4_88

 

ZFS L2ARC

revodrive

OCZ RevoDrive 80GB PCIe

Nice hardware but sadly quite slow by modern standards: 75 000 IOPS. Modern SSD can achieve similar performance. [Edit: as mentioned later on, the figures SSD manufacturers provide are misleading so the question remains: can this device provide constant 75 000 IOPS in which case it is much much better at that]

See Wikipedia

But it gave me an idea.

sonet

This one can house two SSD. It would leave me two on-board SATA completely free and server could still have SSD.

That would be the absolute best solution. It would mean I can have two Western Digital Re 3TB with two SSD which would give me double the IOPS vs. one and hence extreme performance on my pool.

While doing research on what would be the best SSD IOPS-wise, I found out the manufacturers exaggerate their IOPS figures. According to this the figures they give can only be achieved on a new drive for very short periords of time.

Some manufacturers seem to provide Sustained figures which use some sort of standard-type agreed-upon way of measuring the real IOPS figures, while others don’t.

OCZ gladly gives these figures for their Vector 150 drives but sadly 12 000 steady random writes isn’t that good at all.

And while this is quite an entry-level SSD and there are better it is of little use until they start to provide these measurement figures.

And one cannot trust the tests done by testers unless they understand this and run their tests for extended periors of time! Because otherwise we may get skewed results.

But the best plan still would be to get as much RAM as possible, then of course get as much disk as possible, and finally have fast FLASH storage for anything that spills over the RAM.

And with ZFS and FLASH cache I could perhaps even use consumer grade SATA to save money and still have reliability and performance.

Let us calculate for the fun of it.

Western Digital Red 3 TB, 2 pieces for 250 €
Sonnet Tempo SSD Pro, 250 € delivered

And it seems by comparing OCZ Vertex 460 120GB and OCZ Vector 150 120GB that they use similar technology on both of these since steady-state random write figures are exactly the same.

Looking at other parameters we go with the Vector 150.

OCZ Vector 150 240GB, 150 €

Which would give 6TB of storage with 240GB of L2ARC for 650 €.

Compared to original plan which was to buy 3TB enterprise quality SATA and combine it with 250GB SSD which would cost 180 € for Re4 3TB and 107 € for Samsung 840 EVO 250GB for a total of 287 €.

Now that’s a difference!

Good, bad? That certainly is a very good question. But the 650 euro one would perhaps provide much greater performance.

2.3 times the price for 2.0 times the capacity. The 0.3 should then be covered by the fact that there would then be empty slot for another SSD and by the performance increase.

So I think the 650 euro deal is the better one.

And with 4TB WD Desktop Mainstream one would get 8TB of storage for an additional 100 euro.

So 750 euros for 8TB pool with 240GB cache. That’s a shitload of money.

But still, that is only 9,3 euro cents a gigabyte of high-performance and reliable storage. 9,3 cents for gigabyte 10 years ago was considered cheap, and it was for hard drive only.

Raw enterprise storage would with Re4 3TB be 6 cents. So there is definite margin there.

But comparing these (raw vs. real setup) is quite useless. One can buy storage and that’s it but one can’t then just get the performance out of it.

Upgrade the disks to 5TB, add in another 240GB cache for a total of 480GB (or even 1TB) and you have 10TB pool with 1TB cache.

What would that cost?

WD Red 5TB, 2 pieces for 410 €
OCZ Vector 150 480GB, 2 pieces for 530 €
Sonnet Tempo SSD Pro, 250 € delivered

That would be 1190 € for 10TB high-performance, high-reliability storage pool.

Talking about enterprise or business money that is peanuts and nothing.

Edit: I made a mistake where I sacrificed reliability for money saving since I figured I won’t need faster disks because I have SSD to cache. But I still need reliable disks to achieve that reliability.

So the prices will go up by perhaps 20% since reliability still requires Re4 enterprise level disks for raw storage. Bonus from this accident is increase in performance.

So, now we would have 8TB pool with 1TB cache for 480 € for 2 pieces of Re4 4TB, 530 € for 2 pieces of OCZ Vector 150 480 GB and 250 € for Sonnet Tempo SSD Pro for a total of 1260 €.

Which of course is some 70 euros more for 2 TB less and for 20% larger MTBF, the performance increase and two years longer warranty time.

WD Red

[pdf]http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-771442.pdf[/pdf]

WD Re4

[pdf]http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701338.pdf[/pdf]

Seems to also have an order of magnitude lower rate of Non-recoverable read errors so go figure.

Disks for surveillance use I would never touch because those are made for video stream which isn’t too sensitive about bit flipping and other errors which can in the context of video be considered minor errors.

3Ware 9650SE-12ML PCI E x8 SATA II (3.0Gb/s)

This one looks even better (!) because it has battery.

3ware

But doesn’t look as hardware-ish whether this is a good or a bad thing. Less components – less heat, less energy, less to fail; but more software and it brings to mind these cheap Chinese Realtek and VIA cards which I do not like at all.

Not in my machine.

But need to do more research on this one, since the battery is obvious plus for this one.

Most likely excellent support in Linux since there is support for 2.4, 2.6 and most likely for 3.x. Meaning, the card has had long time to become stable and extensively supported.

Adaptec AAR-2610SA/64MB Dell XD084 RAID Controller

I was thinking how 250 GB SSD is still a bit small and could perhaps house only half a dozen virtual machines.

And how VelociRaptor would still be fast and provide more space.

Then I realized I can have them both, even with WD Re – when I only have two bays, and two SATA ports.

Then I realized and did some eBaying and found this nice piece of hardware:

pcix-sata

http://www.ebay.com/itm/Adaptec-AAR-2610SA-64MB-DELL5-Dell-0XD084-PCI-X-SATA-RAID-Controller-Card-/131262848357

pcix-sata2

http://www.ebay.com/itm/Adaptec-AAR-2610SA-64MB-DELL5-Dell-0XD084-PCI-X-SATA-RAID-Controller-Card-/131262848357

If that might work on non-Dell hardware that is.

But it seems my DL140 G3 has PCIe x16 and x8 at the moment available, and PCI-X only with different riser card so, need to find PCIe card.

And I have no idea if these work on Linux; booting off of them might be too much to ask for, and wouldn’t be necessary either.

Now if I can only find PCIe 6-channel SATA driver it could accomodate 6 of those 250 gigabyte SSD drives, giving me 6 x 500MBps of bandwidth and 400kIO/s.

And this PCIe from Adaptec goes a bit over the board with 16 ports:

adaptec

No way it would be possible to max out those 16 drives. But what a ride it would provide. Have this one and 16 SSD taped inside, all around the case would be sight to see and hear.

PCIe x16 would be suffering, and that seems more like x8

pcie_slots400

But it would be very interesting to see the performance and where it would lack in capacity. I suspect CPU power unless it was threaded well.

And even then, DDR2-667C could not handle that much.

The another server

I got three servers one of which is running 24 hours a day. That’s my main server which mainly does routing and all the basic infrastructure. It is used to test all sorts of things mainly network related. Lately of course IPv6 and SixXS.

And that SixXS IPv6 address space they kindly gave me is by the way so big that it can accommodate 65,535 subnets each roughly 18 million trillion addresses wide.

But the other server I was going to talk about is my HP DL140 G3 which currently is being equipped. I intend to replace the current dual core low-end CPU with two Quad core high-end ones.

In addition to this it will have first 16 GiB of memory and later if needed another 16 GiB of memory.

Disks are going to be enterprise level. It has two bays so one is going to house 3 TB WD Re and the other one either generation 5 or generation 6 WD VelociRaptor 10K.

The Re is expensive one but it has over double the MTBF of any consumer grade. The VelociRaptor of course is legendary and can compete with SSD neck and neck with no trouble at all. That is going to be either 300 or 500 or maybe 600 GB depending.

That is quite an expensive investment at around $500 perhaps but combined with 8 cores and memory I should be able to run very generous amount of any sort of virtual machines.

And the enterprise level of these disks will guarantee they will last for a long time.

The rationality for having one 3 TB is to have it as a backup and long-term data whereas the faster (access time) VelociRaptor will house the working data such as virtual machine images.

And the CPU count is extremely important to me since it is hard to find (impossible) SATA drives with hardware encryption. Sadly the CPU I have do not have the x86 instruction set for Advanced Encryption Standard.

But given the CPU will be relatively modern and high-end Xeon I am expecting to get enough throughput to have the disks as a bottleneck – not the processing power.

And if one core must be sacrificed for the encryption then so be it and there will still be 7 left for actual processing.