In a move towards Artificial Intelligence (AI) WorkloadsHPE has upgraded its Alletra MP storage arrays to connect double the number of servers and provide 4x more capacity in the same rack space.
A year after its initial launch, Alletra MP now has four control nodes per chassis instead of two, each with an eight-, 16-, or 32-core AMD Epyc processor. It is 2U the data nodes have now also become 1U, with 10 x 61.44 TB SSDs for a maximum capacity of 1.2 PB in 2U. Previously, Alletra MP data nodes contained 20 SSDs of 7.68 TB or 15.36 TB (up to 300 TB per node).
This increase in nodes means Alletra MP can now connect to 2x more servers and give them 4x more capacity for the same data center rack space, and similar energy consumptionaccording to HPE.
“We call this next generation Alletra MP for AI,” said Olivier Tant, HPE Alletra MP NAS expert.
“Indeed, we believe it is ideally suited to replace GPFS or BtrFS-based storage solutions, which are complex to deploy but often used for AI workloads. We also believe that our arrays are more efficient than those of DOB in HPC or Isilon in multimedia workloads.
THE SAN block access version works like the NAS file access version with 100Gbps ROCE switches that allow any controller node to access any data node.
“The huge advantage of our solution over the competition is that all nodes in the cluster communicate with each other,” Tant said. “In other words, our competitors are limited to, for example, 16 storage nodes of which three will be used by redundant data for erasure coding. This represents 15 to 20% of capacity. We can deploy a cluster of 140 nodes of which three are used for redundancy via erasure coding. We only lose about 2% of capacity, and that’s a real economic benefit.”
The secret sauce: 100 Gbps ROCE switches between nodes
“Our solution is also more efficient because, paradoxically, we do not use a cache at the controller level,” explains Michel Parent, HPE Alletra MP NAS expert. “With 100 Gbps NVMe/ROCE connectivity on all elements of the array, caching becomes counterproductive.
“Cache doesn’t speed anything up, and actually slows down the array with incessant copying and verifying operations,” he added. According to Parent, no other storage array on the market uses NVMe/ROCE at speeds as high as 100 Gbps per port.
Hosts use Ethernet or Infiniband (compatible with Nvidia GPUDirect) to access the controller node closest to them. During writes, this node performs erasure coding and shares the required data to other SSD nodes. From the perspective of network hosts, all controller nodes expose the same file volumes and block LUNs.
In NAS mode – in which Alletra MP uses the Vast Data file access system – there is a cache composed of Kioxia SCM fast flash. This buffer serves as a workspace to deduplicate and compress file data.
“Our data reduction system is one of the best performing, according to different benchmarks,” Tant said. “All duplicates in the data are eliminated. Then an algorithm finds the blocks that are most similar and compresses them, and it’s very efficient.
The only parts of files shared between multiple nodes are those that result from erasure coding. Preferably, a file will be read from the SSD which contains the whole thing.
More precisely, during a read, the controller transmits the request to the first SSD node chosen by the most available switch. Each data node contains the index of all content in the cluster. If the node does not contain the data to read, it sends the controller requirements to the node that contains it.
In the SAN version, the mechanism is similar except that it works on a block basis rather than on a file level.
With such an architecture, which relies more on the speed of the switches than on that of the controllers, it becomes easy to move from one node to another if you do not respond quickly enough on your Ethernet port.
One bay for multiple storage types
NVMe SSDs are the fastest at reconstructing a file from blocks of data because each 100 Gbps link in Alletra MP is as fast or faster than the network connection between the array and the application server . In competitive arrays that do not use switches between the controller and dedicated SSD nodes, it is usual to try to optimize for particular use cases.
“I am convinced of the economic advantage of Alletra MP over its competitors,” said Tant. “In an AI project, a company normally needs to set up a data pipeline. This means that a high write performance storage array collects the output of these workloads. Then you copy its contents to a storage array with read performance to train the model for machine learning. Then you store the resulting model in a hybrid array for use.
“With Alletra MP, you only have one bay that’s as fast for writes as for ML and for model usage,” he said.