Artificial intelligence is changing the best way businesses store and access their data. That’s because traditional data storage systems were designed to handle easy commands from a handful of users directly, whereas today, AI systems with thousands and thousands of agents have to repeatedly access and process large amounts of information in parallel. Traditional data storage systems now have layers of complexity, which slows AI systems down because data must go through multiple tiers before reaching the graphical processing units (GPUs) which can be the brain cells of AI.
Cloudian, co-founded by Michael Tso ’93, SM ’93 and Hiroshi Ohta, helps storage sustain with the AI revolution. The corporate has developed a scalable storage system for businesses that helps data flow seamlessly between storage and AI models. The system reduces complexity by applying parallel computing to data storage, consolidating AI functions and data onto a single parallel-processing platform that stores, retrieves, and processes scalable datasets, with direct, high-speed transfers between storage and GPUs and CPUs.
Cloudian’s integrated storage-computing platform simplifies the means of constructing commercial-scale AI tools and provides businesses a storage foundation that may sustain with the rise of AI.
“One among the things people miss about AI is that it’s all in regards to the data,” Tso says. “You possibly can’t get a ten percent improvement in AI performance with 10 percent more data and even 10 times more data — you wish 1,000 times more data. Having the ability to store that data in a way that’s easy to administer, and in such a way which you can embed computations into it so you’ll be able to run operations while the info is coming in without moving the info — that’s where this industry goes.”
From MIT to industry
As an undergraduate at MIT within the Nineties, Tso was introduced by Professor William Dally to parallel computing — a sort of computation through which many calculations occur concurrently. Tso also worked on parallel computing with Associate Professor Greg Papadopoulos.
“It was an incredible time because most faculties had one super-computing project occurring — MIT had 4,” Tso recalls.
As a graduate student, Tso worked with MIT senior research scientist David Clark, a computing pioneer who contributed to the web’s early architecture, particularly the transmission control protocol (TCP) that delivers data between systems.
“As a graduate student at MIT, I worked on disconnected and intermittent networking operations for giant scale distributed systems,” Tso says. “It’s funny — 30 years on, that’s what I’m still doing today.”
Following his graduation, Tso worked at Intel’s Architecture Lab, where he invented data synchronization algorithms utilized by Blackberry. He also created specifications for Nokia that ignited the ringtone download industry. He then joined Inktomi, a startup co-founded by Eric Brewer SM ’92, PhD ’94 that pioneered search and web content distribution technologies.
In 2001, Tso began Gemini Mobile Technologies with Joseph Norton ’93, SM ’93 and others. The corporate went on to construct the world’s largest mobile messaging systems to handle the large data growth from camera phones. Then, within the late 2000s, cloud computing became a robust way for businesses to rent virtual servers as they grew their operations. Tso noticed the quantity of information being collected was growing far faster than the speed of networking, so he decided to pivot the corporate.
“Data is being created in a whole lot of different places, and that data has its own gravity: It’s going to cost you time and cash to maneuver it,” Tso explains. “Which means the top state is a distributed cloud that reaches out to edge devices and servers. You could have to bring the cloud to the info, not the info to the cloud.”
Tso officially launched Cloudian out of Gemini Mobile Technologies in 2012, with a brand new emphasis on helping customers with scalable, distributed, cloud-compatible data storage.
“What we didn’t see after we first began the corporate was that AI was going to be the final word use case for data on the sting,” Tso says.
Although Tso’s research at MIT began greater than twenty years ago, he sees strong connections between what he worked on and the industry today.
“It’s like my whole life is playing back because David Clark and I were coping with disconnected and intermittently connected networks, that are a part of every edge use case today, and Professor Dally was working on very fast, scalable interconnects,” Tso says, noting that Dally is now the senior vice chairman and chief scientist on the leading AI company NVIDIA. “Now, if you have a look at the fashionable NVIDIA chip architecture and the best way they do interchip communication, it’s got Dally’s work throughout it. With Professor Papadopoulos, I worked on speed up application software with parallel computing hardware without having to rewrite the applications, and that’s precisely the problem we are attempting to unravel with NVIDIA. Coincidentally, all of the stuff I used to be doing at MIT is playing out.”
Today Cloudian’s platform uses an object storage architecture through which every kind of information —documents, videos, sensor data — are stored as a novel object with metadata. Object storage can manage massive datasets in a flat file stucture, making it ideal for unstructured data and AI systems, however it traditionally hasn’t been in a position to send data on to AI models without the info first being copied right into a computer’s memory system, creating latency and energy bottlenecks for businesses.
In July, Cloudian announced that it has prolonged its object storage system with a vector database that stores data in a form which is instantly usable by AI models. As the info are ingested, Cloudian is computing in real-time the vector type of that data to power AI tools like recommender engines, search, and AI assistants. Cloudian also announced a partnership with NVIDIA that enables its storage system to work directly with the AI company’s GPUs. Cloudian says the brand new system enables even faster AI operations and reduces computing costs.
“NVIDIA contacted us a couple of yr and a half ago because GPUs are useful only with data that keeps them busy,” Tso says. “Now that folks are realizing it’s easier to maneuver the AI to the info than it’s to maneuver huge datasets. Our storage systems embed a whole lot of AI functions, so we’re in a position to pre- and post-process data for AI near where we collect and store the info.”
AI-first storage
Cloudian helps about 1,000 firms around the globe get more value out of their data, including large manufacturers, financial service providers, health care organizations, and government agencies.
Cloudian’s storage platform helps one large automaker, for example, use AI to find out when each of its manufacturing robots should be serviced. Cloudian can also be working with the National Library of Medicine to store research articles and patents, and the National Cancer Database to store DNA sequences of tumors — wealthy datasets that AI models could process to assist research develop latest treatments or gain latest insights.
“GPUs have been an incredible enabler,” Tso says. “Moore’s Law doubles the quantity of compute every two years, but GPUs are in a position to parallelize operations on chips, so you’ll be able to network GPUs together and shatter Moore’s Law. That scale is pushing AI to latest levels of intelligence, however the only option to make GPUs work hard is to feed them data at the identical speed that they compute — and the one option to try this is to do away with all of the layers between them and your data.”