I started seeing
It is an in-process multi-model serverless vector database written in Rust that is cloud-native and open-source.
It’s not to be confused with the Lance columnar format (which I was doing), which is also written in Rust and a more appropriate comparison to Parquet but is fundamental to LanceDB. It is also
So, what we have is a SQL-compatible vector database that supports vectors, images, text, and videos with full-text search. It is also said to be very fast, but you’ll need to do your own tests to see how it does on your own stack.
Two letters, AI. A vector database is at the core of the data repository used to train Large Language Models (LLM). LanceDB has gone the extra mile to provide a GitHub repository with a few vector recipes that you can find
So, you should store embeddings from a machine learning model, for example, to search for images using written descriptions.
The challenge here is the “
This leads to “Embeddings,” which are high-dimensional floating-point vector representations of a query or the data. You can embed anything using an appropriate embedding model or function.
The position of the embedding in a vector space has semantic significance depending on the type of modal and training you are using. LanceDB supports “explicit” and “implicit” data vectorization methods.
At this stage, we’re getting into some deep water concerning how this all works, and it is beyond the scope of what I’m trying to convey in this blog. I’ll share an image from the LanceDB docs that illustrates how similar entries cluster within a vector system.
The LanceDB ecosystem provides all the latest and most commonly used tools for this space to make it as convenient as possible to get started.
You can use Python and Javascript to process your data into LanceDB, using popular Python machine learning and columnar data packages. A native Typescript SDK is available to allow for vectorless search as part of serverless functions. What isn’t listed here is the new
That’s not all though, so to put it in a list, we also see:
There is a great, recent blog, “
Obviously, LanceDB isn’t a general-purpose database, it has a very specific use case, and from what I can tell, it’s a solid solution for that use case. It is an extremely fast vector database that can be used specifically in AI applications.
I can envision some really useful scenarios for building up your own LLM around product blogs and documentation with this, to enable users to write specific questions and get more tailored responses than trying to read through tons of blogs and docs.
This could be the beginning of a tide shift in how we provide information to the public.
Finally, a LanceDB Cloud is coming, and at the time of this writing, in October 2023, you can sign up to be notified about it at this
You can read the other “What the heck” articles at these links: