Solving the AI Training Data Challenge with Decart AI and Backblaze

A decorative image showing the logos of Backblaze and Decart.

Depending on which LLM you ask, we live in a world with somewhere between 25k and 80k AI startups. It’s a growing, highly competitive market where small startups with a big idea can find themselves toe-to-toe with the goliaths of tech—fighting for money, chips, talent, even raw electrical power. 

How does any company differentiate themselves in an explosive burst of technological change, one that requires a lot of investment in talent and infrastructure, where even the richest tech platforms on the planet don’t always succeed? Today we’re sharing the story of Decart—an AI startup that used Backblaze B2 Cloud Storage to leverage a successful launch with an impressive new model that provides an order of magnitude improvement in both the training and inferencing of the largest generative models.

Backblaze is an amazing solution for AI training data. We looked at a number of options and  Backblaze is seriously the best.

—Dean Leitersdorf, Co-Founder and CEO, Decart

First, the news

Decart is an AI research lab that came out of stealth on October 31 with an incredible new model:

While this might look like Minecraft, every pixel you see here and all of the gameplay is being generated by Decart’s Oasis model. It’s like Minecraft in every way you’d expect, except that the entire experience is being generated by AI and you can creatively prompt the model to build beyond the confines of the game. The mindblowing part? Decart says Oasis can perform more than 10 times more efficiently than competitors such as OpenAI’s Sora, which hasn’t been publicly released.

Don’t let the game distract you though—the Minecraft simulation is just an expression of the power of their model. According to the Decart team, this isn’t even version 1.0 of what their approach is capable of generating—more like version 0.01. Given the broad coverage they’ve already received for their launch, we’re excited to see what’s next.

How to break out in the AI market

For Decart, the strategy to pull ahead of the crowd was simple: Disrupt the market on inference speed to deliver game changing models, and do that by building the most high-performance multi-cloud model training infrastructure possible. Then, iterate on that innovation. 

We crafted state of the art infrastructure that allows us to train models that other people simply can’t train.

—Dean Leitersdorf, Co-Founder and CEO, Decart

Before we met Dean and the team at Decart, most of the hard work was done: the multi-cloud AI stack for training was dialed in and the models were going through the paces. They just had one simple, but big, problem holding them back:

The price and the logistics of moving and storing training data were going to limit their growth.

They were burning through free data storage credits from a traditional cloud provider and had data spread across a range of other cloud providers and GPU clusters. Their training data needed to scale from 100s of thousands of hours of video data to 100s of millions of hours, and they needed a storage solution that could handle that scale in three key areas:

  1. Reliably high performance: Decart needed to know that when they got time on a cluster, they could move data in as fast as possible the second that they were able to. 
  2. GPU interoperability: They needed to be sure that whatever storage platform they chose, it would work well with a multi-cluster training approach. Being able to shop jobs between different GPU clouds and disperse training was essential for Dean’s team.
  3. Efficiency: Every dollar an AI startup spends on anything other than training time is a competitive disadvantage, so ensuring that storage costs were low without any surprise fees for data retention or download was key.

Decart discovered Backblaze while researching storage alternatives. After a quick call and two fast months of testing Backblaze in a wide variety of usage patterns, it was clear to the team that they had found the storage foundation they needed. 

We chose Backblaze because everything works. It’s super stable, and we had zero problems.  That’s number one.

—Dean Leitersdorf, Co-Founder and CEO, Decart

When it came time to start moving data from Backblaze to GPU clusters, they had no problem with transferring petabyte-scale datasets. The only minor challenge was ensuring that the compute provider’s pipe could take the volume of data streaming in.

Here’s where things ended up working for Decart:

  • Performance: They were blown away by the performance they achieved with Backblaze (more to come on that later).
  • Price: With pricing at one-fifth the cost of traditional cloud providers, Backblaze unlocked a significant amount of budget.
  • Free egress: The true game changer. Decart, for a number of reasons, trains their models on multiple different GPU clusters at the same time. With Backblaze, they can egress their full dataset to up to three training sites every month with zero additional cost.

B2 Cloud Storage was literally the only technical thing we used in training these models that didn’t crash the first time we tried it. We’re in an industry where everything fails, but Backblaze didn’t.

—Dean Leitersdorf, Co-Founder and CEO, Decart

Looking forward

With performance, flexibility, and affordability squared away in their data storage approach, the Decart team is now in position to rotate out of this impressive first model and build whatever is next. With all the fundamentals working on the level that Backblaze always provides and Decart is happy with, the two teams are now working together to find even more efficiency and optimization and truly stand up the best infrastructure for training AI models.

About Stephanie Doyle

Stephanie is the Associate Editor & Writer at Backblaze. She specializes in taking complex topics and writing relatable, engaging, and user-friendly content. You can most often find her reading in public places, and can connect with her on LinkedIn.