
SPONSORED FEATURE: When it comes to artificial intelligence, it seems nothing succeeds like excess.
As AI models become bigger and more capable, hyperscalers, cloud service providers, and enterprises are pouring cash into building out the storage and compute infrastructure needed to support them.
The first half of 2024 saw AI infrastructure investment hit $31.8 billion, according to IDC. In 2028, the research company expects full-year spending to exceed $100 billion as AI becomes pervasive in enterprises through greater use of discrete applications as part of their broader application landscape. Once AI-enabled applications and related IT and business services are factored in, total worldwide spending is forecast to reach $632 billion in 2028.
But while surging investment is one thing, reaping the full potential of AI in empowering engineers, overhauling and optimizing operations, and improving return on investment, are whole different ball games. For enterprises looking to truly achieve these objectives, data management right through the AI pipeline is likely to prove critical.
The problem is that traditional storage and data management offerings, whether on-prem or in the cloud, are already under strain given the crushing demands of AI. Capacity is part of the issue. AI models, and the data needed to train them, have steadily grown bigger and bigger. Google Bert had 100 million parameters when it launched in 2018, for example. ChatGPT 4 was estimated to have over a trillion at last count.
At the other end of the pipeline, inference – often carried at real-time speeds – makes latency and throughput equally as critical. There are many other challenges. AI requires a multiplicity of data types and stores, spanning structured, semi-structured, and unstructured data. This in turn requires the full range of underlying storage infrastructure – block, file, and object. These datastores are unlikely to all be in one place.
In addition to the sheer complexity involved in capturing all the information required, the breadth and distribution of data sources can also create a major management problem. How do organizations and their AI teams ensure they have visibility both across their entire data estate and throughout their entire AI pipeline? How do they ensure that this data is being handled securely? And this is all further complicated by the need for multiple tools and associated skill sets.
When Legacy Means Lags
The introduction of newer and increasingly specialized AI models doesn’t remove these fundamental issues. When the Chinese AI engine DeepSeek erupted onto the broader market earlier this year, the huge investments hyperscalers have been making in their AI infrastructure were called into question.
Even so, building LLMs that don’t need the same amount of compute power doesn’t solve the fundamental data problem. Rather it potentially makes it even more challenging. The introduction of models trained on a fraction of the infrastructure will likely lower the barrier to entry for enterprises and other organizations to leverage AI, potentially making it more feasible to run AI within their own infrastructure or datacenters.
Sven Oehme, chief technology officer at DataDirect Networks explains: “If the computational part gets cheaper, it means more people participate, and many more models are trained. With more people and more models, the challenge of preparing and deploying data to support this surge becomes even more critical.”
That’s not just a challenge for legacy on-prem systems. The cloud-based platforms data scientists have relied on for a decade or more are often not up to the job of servicing today’s AI demands either. Again, it’s not just a question of raw performance or capacity. Rather it’s their ability to manage data intelligently and securely.
Oehme cites the example of metadata, which if managed correctly, means “You can reduce the amount of data you need to look at by first narrowing down the data that is actually interesting.”
An autonomous or connected vehicle will be grabbing pictures constantly, for example of stop signs. And in the event of an accident, and the subsequent need to update or verify the underlying model, the ability to analyze the associated metadata – time of day, speed of travel, direction – all become paramount.
“When they upload this picture into their datacenter, they would like to attach all that metadata to this object,” he says. That’s not a theoretical example. DDN works with multiple automotive suppliers creating autonomous capabilities.
It quickly becomes apparent that AI success depends on not just the amount of data to which an organization has access. The “richness of the data that is stored inside the system” and the ability to “Integrate all these pipelines or workflows together, where from the creation of the data to the consumption of the data, there is full governance” all come into play.
However, many organizations must currently juggle multiple databases, event systems and notifications to manage this. This can be expensive, complex, time consuming, and will inevitably create latency issues. Even cloud giant AWS has had to develop a separate product – S3 Metadata – to tackle the metadata problem.
Data Needs Intelligence, Too
What’s needed says DDN is a platform that can deliver more than just the required hardware performance, but also the ability to intelligently manage data securely, at scale. And it needs to be accessible, whether via the cloud or on-prem, which means it has to offer multi-tenancy.
This is precisely where DDN’s Data Intelligence Platform comes in. The platform consists of two elements. DDN’s Infinia 2.0 is a software-defined storage platform, which gives users a unified view across an organization’s disparate collections of data. EXAScaler is its highly scalable file system, which is optimized for high-performance, big data and AI workloads.
As Oehme explains, Infinia is “a data platform that also happens to speak many storage protocols, including those for structured data.” That’s a critical distinction he says, “Because what Infinia allows you to do is store data, but not just normal data files and objects. It allows me to store a huge amount of metadata combined with unstructured data in the same view.”
Data and metadata are stored in a massively scalable key value store in Infinia, he says: “It’s exactly the same data and metadata in two different ways. And so therefore we’re not doing this layering approach that people have done in the past.”
This can result in far more efficient data pipelines and operations, both by eradicating the multiple silos that have mushroomed across organizations, and by removing the need for data scientists and other specialists to learn and maintain multiple data analysis and management tools.
Because they’re designed to be multi-tenant from the outset, both EXAScaler and Infinia 2.0 are able to scale from enterprise applications through cloud service providers to hyperscalers.
The result are clear: Multiple TB/second bandwidth systems, with sub millisecond latency, delivering a 100 times performance advance over AWS S3, according to DDN’s comparisons. When it comes to access times for model training and inference, DDN’s platform shows a 25X speed boost, says the company.
As for on premises solutions, Infinia 2.0 supports massive density, with 100 PB in a single rack, and can deliver up to a 75 percent reduction in power, cooling and datacenter footprint, with 99.999 percent uptime. That’s an important capability as access to power and real estate are emerging as a constraint on AI development and deployment, as much as access to skills and data.
DDN partners closely with chip maker Nvidia. It’s closely aligned with the GPU giant’s hardware architecture, scaling to support over 100,000 GPUs in a single deployment, but also with its software stack, meaning tight integration into NIMs microservices for inference, as well as the Nvidia NeMO framework, and CUDA. And Nvidia is itself a DDN customer.
AI technology is progressing at a breakneck pace, with model developers competing fiercely for users’ attention. However, it is data – and the ability to manage it – that will ultimately dictate whether organizations can realize the promise of AI, whether we’re talking hyperscalers, cloud service providers, or the enterprises that use their services.
The potential is clear, says Oehme. “If you have a very good, very curious engineer, they will become even better with AI.” But that depends on the data infrastructure getting better first.
Sponsored by DDN.