Organizations that are using TimescaleDB to store and query their time-series data may be interested to know that they can use the database to store and query vectors for GenAI applications, too.
Timescale is best known for developing an open source time-series database.. The New York City company added extensions to Postgres to make time-series data a first class data type for IoT type applications, including gaming.
With today’s launch of Timescale Vector, the company is now entering the market for vector databases, which is flourishing as a result of the massive interest in generative AI applications built atop large language models.
Vector databases serve as a sort of long-term memory for LLMs, such as OpenAI’s GPT-4 and Llama from Meta. By storing and indexing the mathematical representations of pieces of text trained by the LLM, dubbed vector embeddings, the vector database can more quickly match the GenAI application’s user input at run time to the most pertinent piece of training data encountered by the LLM.
In TimescaleDB’s case, the company adopted pgvector, the open source vector library for Postgres. In addition to incorporating pgvector, the company bolstered its vector capability by using an Approximate Nearest Neighbor (ANN) algorithm, which it claims gives it much better performance than both plain vanilla pgvector as well as dedicated vector databases.
“We’ve built the additional support for these type of vector lookups that could enable people to build LLM models on top of it to answer … questions in a way that is much more performant, faster, and has better accuracy than other stuff that’s in the market,” says Michael Freedman, the CTO and co-founder of Timescale.
In a lengthy blog post today, the company shared some internal benchmark figures that it says proves its ANN index gives it better, faster performance on a dataset of 1 million OpenAI embeddings than competing vector databases.
The company claims it delivered 243% faster search speed at 99% recall than the vector database from Weaviate. It also claimed that it achieved about 39% faster search speed than pgvector’s ierarchical navigable small world (HNSW) algorithm and 363% faster search speed than pg_embedding.
“Timescale Vector optimizes hybrid time-based vector search, leveraging the automatic time-based partitioning and indexing of Timescale’s hypertables to efficiently find recent embeddings, constrain vector search by a time range or document age, and store and retrieve LLM response and chat history with ease,” the company writes in the blog.
In an interview with Datanami, Freedman also singled out Pinecone, which develops a dedicated vector database, as a new competitor. The problem with dedicated vector databases, Freedman says, is that they only store vector embeddings.
“But often you might have other relational data that you want to use in your question,” he says. “So if you’re building applications on Pinecone, you might need to deploy Pinecone and Postgres and something else, and then bring all that data together at query time and answer questions. If you’re using Timescale, it all sits together in one database, and you could actually build a lot of applications with a much simpler, operationally simpler stack.”
While TimescaleDB is best known as a time-series database, the company has since moved away from that niche and now considers itself to be a general database provider. It can not only store time-series and event data for IoT and gaming applications, but thanks to its Postgres core, it can store any relational data.
“We call ourselves Postgres ++,” Freedman says. “We’re Postgres ‘and.’ We’re not Postgres ‘or.’”
Having that underlying Postgres compatibility gives Timescale the capability to store the data for any organizations that are already using Postgres. That’s a considerable market, considering that Postgres is the world’s most popular database. And that has translated into a considerable amount of success for the open source offering, which counts tens of millions of users, Freedman says. The managed database service that Timescale offers in the cloud has about 1,000 paying customers, he says.
“They’re like, ‘Oh, I already use Postgres. I should just be using you for all of [my workloads],’” Freedman says. “As long as they want a relational database like Postgres, we can become a great go-to for Postgres.”
Timescale has been supporting vector workloads for a few months under a preview program, and it’s officially announcing general availability today. The company has attracted several early adopters for its vector capability, including PolyPerception, a European provider of recycling solutions.
“The simplicity and scalability of Timescale Vector’s integrated approach to use Postgres as a time-series and vector database allows a startup like us to bring an AI product to market much faster,” PolyPerception CEO Nicolas Bream says in the Timescale blog. “Choosing TimescaleDB was one of the best technical decisions we made, and we are excited to use Timescale Vector.”
Another early adopter, Blueway Software, is also finding the database a good fit for its GenAI development. “Using Timescale Vector allows us to easily combine PostgreSQL’s classic database features with storage of vector embeddings for Retrieval Augmented Generation (RAG),” says Alexis de Saint Jean, the company’s Innovation Director. “Timescale’s easy-to-use cloud platform and good support keep our team focused on imaging solutions to solve customer pains not on building infrastructure.”
You can learn more at www.timescale.com.
Related Items:
The Human Touch in LLMs and GenAI: Shaping the Future of AI Interaction
TimescaleDB Delivers Another Option for Time-Series Analytics
Home Depot Finds DIY Success with Vector Search
The post TimescaleDB Is a Vector Database Now, Too appeared first on Datanami.
0 Commentaires