I was doing my usual reading The Register, specifically this article and something popped into my mind and I wanted to write a brief note about it.
I came across the SPC-1 results for the Huawei OceanStor Dorado 5100 before I saw the article and didn’t think a whole lot about it.
I got curious when I read the news article though so I did some quick math – the Dorado 5100 is powered by 96 x 200GB SSDs and 96GB of cache in a dual controller active-active configuration. Putting out an impressive 600,000 IOPS with the lowest latency (by far) that I have seen anyways. Also they did have a somewhat reasonable unused storage ratio of 32.35% (I would of liked to have seen much better given the performance of the box but I’ll take what I can get).
But the numbers aren’t too surprising – I mean SSDs are really fast right. What got me curious though is the # of IOPS coming out of each SSDÂ to the front end, in this case it comes to 6,250 IOPS/SSD. Compared to some of the fastest disk-based systems this is about 25x faster per disk than spinning rust. There is no indication that I can see at least that tells what specific sort of SSD technology they are using(other than SLC). But 6,250 per disk seems like a far cry from the 10s of thousands of IOPS many SSDs claim to be able to do.
I’m not trying to say it’s bad or anything but I found the stat curious.
I went ahead and looked at another all-SSD solution the IBM V7000, this time 18 x 200GB SSDs are providing roughly 120,000 IOPS also with really good latency with 16GB of data cache between the pair of controllers. Once again the numbers come to roughly 6,600 IOPS per SSD. IBM ran at an even better unused storage ratio of just under 15%, hard to get much better than that.
Texas memory systems (recently acquired by IBM), posted results for their RamSan-630 about a year ago, with 20 x 640GB SSDs pushing out roughly 400,000 IOPS with pretty good latency. This time however the numbers change – around 20,000 IOPS per SSD here, as far as I can tell there is no RAM cache either. The TMS system came in at a 20% unused storage ratio.
While there are no official results, HP did announce not long ago that an ‘all SSD’ variant of the P10000(just realized it is kind of strange to have two sub models(V400 and V800 which were the original 3PAR models) of the larger P10000 model), which said would get the same 450,000 IOPS on 512 x SSDs. The difference here is pretty stark with each SSD theoretically putting out only 878 IOPS(so roughly 3.5x faster than spinning rust).
At least originally I know originally 3PAR chose a slower STEC Mach8IOPS SSD primarily due to cost (it was something like 60% cheaper). STEC’s own website shows the same SSD getting 10,000 IOPS (on a read test – whereas the disk they compared it to seemed to give around 250 IOPS). Still though you can tap out the 8 controllers with almost 1/4th the number of disks supported with these SSDs. I don’t know whether or not the current generation of systems uses the same SSD or not.
I’ll be the first to admit an all-SSD P10000 doesn’t make a lot of sense to me, though it’s nice that customers have that option if that’s what they want (I never understood why all-SSD was not available before that didn’t make sense either). HP says it is 70% less expensive than an all-disk variant, though are not specific whether they are using 100GB SSDs (I assume they are) vs 200GB SSDs.
Both TMS and Huawei advertise their respective systems as being “1 million IOPS”, I suppose if you took one of each and striped them together that’s about what you’d get ! Sort of reminds me of a slide show presentation I got from Hitachi right before their AMS2000-series launched one of the slides showed the # of IOPS from cache (they did not have a number for IOPS from disk at the time), which didn’t seem like a terribly useful statistic.
So here you have individual SSDs providing anywhere from 900 to 20,000 IOPS per disk on the same test…
I’d really love to see SPC-1 results for the likes of Pure Storage, Nimble Storage, Nimbus Storage, and perhaps even someone like Tintri, just to see how they measure up on a common playing field, with a non trivial utilization rate. Especially with claims like this from Nimbus saying they can do 20M IOPS/rack, does that mean at 10% of usable capacity or greater than 50%? I really can’t imagine what sort of workload would need that kind of I/O but there’s probably one or two folks out there that can leverage it.
We now take you back to your regularly scheduled programming..
I was wondering how long it would take you to weigh in on this. I’l echo your interest in seeing what the other all flash players are getting. I’ll have to ping a few friends to see if I can get some info on the down low, as true performance based specs are hard at times to come by. As a vExpert I got a good presentation on Tintri today, the concept still to me is compelling for a pure VM play. Many customers I’ve spoken too really like the system and have had production units with uptime of over a year (not impressive in the greater scheme of things, but a testament for a start up).
I now take pretty much all IOPS claims with a big grain of salt because most are doing so with a workload that no one is using. I want to see real world performance analytic data, not the output from Iometer.
Comment by gchapman — August 21, 2012 @ 9:14 pm
I have a friend who went over to Tintri as a sales rep a while back I had a pretty good talk with him on their tech, it does sound interesting though I’m still sort of confused how they do “VMware optimized storage”, I suppose mainly a marketing thing. I know they have better insight per VM, but it sort of gives the impression that it is special for vmware, when vmware of course is just a container that “real apps” run on top of. So if they are vmware optimized they are optimized for every workload out there? It’s sort of confusing. He did say they were not trying to break out of their niche and had no plans to support generic NFS directly to guest VMs or physical servers, and he sees the system being installed in parallel to a more traditional storage system for that reason rather than outright replacing. It’ll be interesting to see which approach is the most successful – using flash as a tier/cache, or 100% flash with fancy compression/dedupe etc.
Comment by Nate — August 22, 2012 @ 10:29 am
[…] the test is far from perfect, but in my opinion at least it's far better than the alternatives, like people running 100% read tests with IOMeter to show they can get 1 million […]
Pingback by 3PAR 7400 SSD SPC-1 « TechOpsGuys.com — May 23, 2013 @ 11:08 am