Vectors: Coming to a Database Near You

As customers come to grips with the requirements of building and running generative AI applications, they’re finding there’s one important ingredient that makes it all work: a vector database. That’s the number one factor driving adoption of this special type of database.

While the sky-high hype around GenAI seems to be wearing off a bit, there is still massive interest in the nascent technology.

For instance, a recent Boston Consulting Group survey found that IT leaders are projecting a 30% increase in spending on GenAI and other forms of machine learning in the coming year, while a KPMG survey from March concluded that 97% of business leaders plan to invest in GenAI over the next 12 months.

The momentum behind GenAI is helping to power interest in vector databases, too. Vector databases have been the most popular category of database for the past 13 months, according to the database trackers at DB-Engines.

The vector database trend shows no sign of letting up. Gartner predicted a year ago that 30% of companies will use vector databases with foundational models by 2026, up from just 2% in 2022.

The database industry is responding to this increase in demand by ramping up production of vector capabilities, for both stand-alone vector databases as well as multimodel databases that support vectors among other data types.

While there are tradeoffs between the two types of vector databases, the multimodel path appears to be growing quite fast. A new study from Forrester found that, by 2026, 75% of traditional databases, including relational and NoSQL, will incorporate vector capabilities into their offerings.

Source: DB-Engines.com

“Some organizations prefer these databases because they offer broader integration of both vector and non-vector data, enable hybrid search, and leverage existing database infrastructure,” writes lead Forrester Analyst Noel Yuhanna in the report, titled “Vector Databases Explode On The Scene. “Also, some multimodel databases are now providing vector capabilities at no extra cost as part of existing licenses, further enhancing their appeal to enterprises.

There are several factors that go into a customer’s decision to use a multimodel database or a native vector database. If the application requires “exceptional performance and … low-latency access to vector data,” then a vector database may be in order, according to Forrester.

Differences in use cases may also lead a customer to choose one over another. Traditional databases excel at powering applications, reporting, and business intelligence, whereas native vector databases are designed for GenAI, search, and retrieval augmented generation (RAG) applications.

A customer with lots of high-dimensional, complex data may also do better with a native vector database. Forrester also notes that native vector databases also do better with unstructured data (text, documents, images, video, audio), indexing complex data, and integrating with machine learning tools.

A traditional database has several benefits of its own, however. They are designed to support transactions, which isn’t really a concept in a native vector database, according to Forrester. They also generally have better support for third-party tooling. If you want to access the data with SQL, a traditional database is your best bet; native vector databases are mostly accessed via APIs. Multimodel databases fall somewhere in between when it comes to benefits and drawbacks.

Source: Forrester July 2024 report titled “Vector Databases Explode On The Scene”

“Unlike traditional databases, which are optimized for exact matches on structured data, vector databases excel in performing advanced similarity searches on complex, high-dimensional data,” Yuhanna and company write in the report. “For example, a vector database can quickly find all images in a database that are visually similar to a given image by comparing their respective vectors within seconds. The unique advantage of vector databases lies in their ability to support specialized vector indexes, facilitating rapid processing of requests and delivering the high performance required for querying complex data.”

How native vector databases enable customers to store, index, and search across vector embeddings is particularly important, according to Forrester. Native vector databases feature advanced indexing and hashing techniques, “including K-dimensional trees, hierarchical navigable small world (HNSW) graphs, locality-sensitive hashing (LSH), Facebook AI similarity search (Faiss), and graph-based indexes,” the analysts write.

Some of the most common use cases for vector databases include RAG, image similarity search, recommendation engine optimization, customer experience personalization, anomaly detection, search engine, and fraud detection. Forrester would recommend a native vector database or a multimodel database depending on the particular requirements of each customers’ specific use case.

“Opt for a native vector database if you require low-latency access to large volumes (tens of terabytes) of vector data exclusively,” the company writes. “However, if your applications demand the integration of vector and non-vector data, go with a mulitmodel database with vector data capabilities.”

While scalability and performance come up again and again in the native-vs.-multimodel conversation, there are questions about just how effective any of the vector databases are at the high end.

“Forrester’s conversations with clients suggest most vector databases haven’t yet demonstrated high-end scalability and performance, particularly when handling billions of vectors or when dealing with hundreds of terabytes of data,” the company writes. “For optimal performance, ensure that vectors use optimized indexes and fine-tuned search algorithms and that they leverage GPUs and scale-out architectures where applicable.”

Related Items:

Is the GenAI Bubble Finally Popping?

Forrester Slices and Dices the Vector Database Market

What’s Holding Up the ROI for GenAI?

The post Vectors: Coming to a Database Near You appeared first on Datanami.