CocoIndex is officially supporting Qdrant! This integration combines high performance RUST 🦀 stack with real-time ETL to vector store:
- CocoIndex is an open-source ETL to turn data AI-ready - with real-time incremental processing for performance and low-latency on source updates. https://github.com/cocoindex-io/cocoindex/
- Qdrant is the leading open-source vector database designed to handle high-dimensional vectors for performance and massive-scale AI applications. https://github.com/qdrant/qdrant
It is simple to export exports data to a Qdrant collection.
The spec takes the following fields:
collection_name
(type:str
, required): The name of the collection to export the data to.grpc_url
(type:str
, optional): The gRPC URL of the Qdrant instance. Defaults tohttp://localhost:6334/
.api_key
(type:str
, optional). API key to authenticate requests with.
Before exporting, you must create a collection with a vector name that matches the vector field name in CocoIndex, and set setup_by_user=True
during export.
doc_embeddings.export(
"doc_embeddings",
cocoindex.storages.Qdrant(
collection_name="cocoindex",
grpc_url="https://xyz-example.cloud-region.cloud-provider.cloud.qdrant.io:6334/",
api_key="<your-api-key-here>",
),
primary_key_fields=["id_field"],
setup_by_user=True,
)
🚀 Getting started (with example code!) with less than 50 lines of python!:
https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_qdrant
@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
):
"""
Define an example flow that embeds text into a vector database.
"""
data_scope["documents"] = flow_builder.add_source(
cocoindex.sources.LocalFile(path="markdown_files")
)
doc_embeddings = data_scope.add_collector()
with data_scope["documents"].row() as doc:
doc["chunks"] = doc["content"].transform(
cocoindex.functions.SplitRecursively(),
language="markdown",
chunk_size=2000,
chunk_overlap=500,
)
with doc["chunks"].row() as chunk:
chunk["embedding"] = text_to_embedding(chunk["text"])
doc_embeddings.collect(
id=cocoindex.GeneratedField.UUID,
filename=doc["filename"],
location=chunk["location"],
text=chunk["text"],
# 'text_embedding' is the name of the vector we've created the Qdrant collection with.
text_embedding=chunk["embedding"],
)
doc_embeddings.export(
"doc_embeddings",
cocoindex.storages.Qdrant(
collection_name="cocoindex", grpc_url="http://localhost:6334/"
),
primary_key_fields=["id"],
setup_by_user=True,
)
We are constantly improving and adding new examples and blogs. Please drop a star at our github repo https://github.com/cocoindex-io/cocoindex for the latest updates!