32GB solid state drives (SSDs) are already sampling and 50TB and 100TB are expected next year. The new 3D NAND process, coupled with in-chip die stacking, is increasing the capacity per chip at a tearaway pace. New form factors are being checked out.
As predicted, we are seeing servers with 12 or more M2 slots, which are only an inch or so wide, while Intel is offering a “ruler” drive that’s essentially a foot-long M2 with 32TB capacity prognosed for the near future. Their matching appliance design has 32 “ruler” slots, which would give a petabyte in 1U. Huawei has their own version of a petabyte unit, with a novel way of stacking drives in the appliance.
What do you do with such capacity? It’s not going to be as cheap as hard disk drives (HDD) until late next year, but when the reduction in appliance count is considered, the numbers actually favor the bulk SSD. With HDD stuck at around 10TB capacity in a 3.5 inch form factor, we are looking at 10 times the box count, and almost a full rack in space with HDD boxes for a petabyte. Those nine extra-sized appliances aren’t cheap!
So what are those huge SSDs for? It’s beginning to look like the drives will have NVM Express (NVMe) interfaces and, since all that capacity is achieved via internal parallelism; they’ll have top-end performance. That puts them firmly in primary drive space, excepting, of course, that flash and Optane non-volatile dual in-line memory module (NVDIMM) are poised to make a land-grab for that territory.
So the huge drives, and not so huge ones too, will fit more the role of shared capacity storage. In that space, where data is being down-tiered from the fastest NVDIMMs, compression will be a major factor in operations. There are two reasons for this – effective capacity is increased by factors of 3X or more, while network bandwidth gets a 3+x boost and transfers are correspondingly shorter.
The last couple of years have seen all-flash-arrays acting as front-ends to traditional networked storage, using the slow HDD-based gear as bulk, compressed capacity. With this gear already on site, in many cases, this was a very inexpensive way to boost capacity. That legacy gear costs a lot to operate, takes up a good deal of space and uses power. There are associated software licenses and, most importantly, a need to keep trained up admins specifically for that gear.
The overall cost proposition of 1U 1 petabyte boxes with low maintenance is very compelling compared with physically big array farms. It’s becoming clear that archiving space will migrate to QLC flash drives, which may increase raw capacity further. Adding deduplication to the storage flow that searches out duplicates of objects, coupled with deeper compression, and 10X compression may become the norm for non-media data. That’s perhaps 10 to 15 effective petabytes in that 1U box.
I suspect that we’ll see a move to smaller appliances rather than putting all the petabytes under one roof. This fits in with object store clusters better, where the minimum node count is usually four. We still may get 1 PB/U, but it will be in the form of four boxes fitted side-by-side in 1U. The result should be a better match of networks bandwidth to drive performance.
There are hints, though, that this is just a point on a journey. On the one hand, we have hyper-converged boxes that run virtual instances on the same appliance has the drives, while on the other, drive vendors are aggressively looking at NVMe over Ethernet as a way to make individual drives directly accessible to all servers in a cluster. Huawei announced the first NVMe over Ethernet drives just this month. In the hyper-converged case, the concept of a storage appliance is replaced by a shareable virtual compute resource with storage capacity attached. In the direct-connect model, there is no controller and in fact just a box of Ethernet drives. Both models will coexist, since the drive-only boxes will allow storage to scale independent of compute.
All of this comes at a cost. No-one will want high-performance hard drives or HDD storage appliances. That market could head south very quickly. Bulk storage HDD appliances may stick around to some mystical end-of-life, but the reality is that they too will become boat anchors quickly, due to operational costs. This will hurt suppliers of HDD parts and drives, though some are already well-hedged into SSDs. Like any major transition, it will take time, but not a long time. We won’t be saying “Disk is dead!” for long. Just remember how fast the floppy disk went into the sunset!
On a techy note, Serial Attached SCSI (SAS) and Serial Advanced Technology Attachment (SATA) will be history quite quickly, too. NVMe runs rings around them and doesn’t add much extra cost. The M2 form-factor is appearing all over the gaming desktop market, while just starting to show in servers, so 3.5 and 2.5 inch drive form-factors are also heading for the bullpen. The tremendous boosts in performance even a couple of drives bring may lead to a substantial retrofit or replacement of older gear, though fewer servers will be needed for the same workload.
Finally, those huge drives give us a place to put Big Data. Applying Parkinson’s Law liberally, we should see a major shift in average size of server farms over the next few years as we move to perpetual storage models, continuous snapshots and the storage of big data streams. Storage is fun again!