MinIO launched the AIStore nearly a year ago to provide enterprises with an ultra-scalable object store for AI use cases. Today, it expanded AIStor into the world of big data analytics by adding support for Apache Iceberg. As MinIO executives explain, the addition gives customers important new capabilities.
Apache Iceberg has become the defacto standard for open table formats in the big data community. The software emerged from Netflix and Apple as a result of data inconsistencies and other issues experienced by users of Apache Hive, the SQL-based query engine that emerged in the Hadoop era. Iceberg fixed the problems through support for ACID transactions, among other techniques.
When Databricks bought Iceberg-backer Tabular back in 2024, it was a watershed moment for the big data community. It meant that customers no longer feared lock-in and could take their Iceberg tables anywhere and essentially query them with any query engine, such as Apache Spark, Trino, Starburst, Dremio, and Apache Flink, among others.
As one of the most popular S3-compatible object stores, MinIO also benefits from Iceberg’s emergence as the defacto standard. Some customers need to keep their tabular data on-prem, and MinIO gave them the capability to do it in a scalable fashion.
Not only that, but providing a unified repository for objects and tables means MinIO customers can run big data analytics as well as AI on all their data, says MinIO Vice President of Marketing Jason Nadeau.
“This is a game changer,” Nadeau said. “For sure you need to have tables if you’re going to do data warehousing. And that’s what people generally have done historically. But if you want to do the really cool stuff with AI in particular, that type of AI needs access to all your data, and it’s been siloed all over the place. That’s the hard part. So bringing tables and objects together into a single platform makes the discovery, the use of all that enterprise AI data basically now possible. So that’s the big enabler.”
While you can go some distance with a federated approach, in practice it doesn’t work when the data is in far-flung locations. Iceberg support helps MinIO and its customers by enabling them to eliminate data silos and consolidate data.
“Lots of folks talk about trying to have a data fabric that’s distributed, federated, stuff all over the place. But when do you actually go to access it when you need it, things don’t work. APIs time out, stuff is throttled,” Nadeau says. “[The data] has got to be consolidated into one place. That’s the only way to truly make it work.”
While MinIO customers could have stored tabular data in Iceberg files (which are based on column-oriented Parquet files) before today’s announcement, the integration wasn’t ideal. AB Periasamy, the co-CEO of MinIO, explains why.
“The challenge is that most on-prem implementations make it harder than it needs to be, requiring separate catalog databases and extra layers of infrastructure that add cost and operational risk,” Periasamy says in a press release. “By building Iceberg directly into AIStor, we take away that complexity and give enterprises a simple, scalable foundation for AI. This not only lowers costs and speeds progress, but also ensures AI can reach its full potential because all data is AI data.”
While other Iceberg implementation require a separate metadata catalog, such as Apache Polaris, AIStor’s Iceberg implementation does not. Instead, it stores the metadata in the object store itself, through the deterministic hashing algorithm that it uses to spread objects out across the cluster.
Related Items:
How Apache Iceberg Won the Open Table Wars
MinIO Pivots to AI with Launch of AIStor
MinIO Debuts DataPod, a Reference Architecture for Exascale AI Storage
The post Why MinIO Added Support for Iceberg Tables appeared first on BigDATAwire.
0 Commentaires