Build Your Own Semantic Search Engine in Under 50 Lines—No Joke

CocoIndex is officially supporting Qdrant! This integration combines high performance RUST 🦀 stack with real-time ETL to vector store:

CocoIndex is an open-source ETL to turn data AI-ready - with real-time incremental processing for performance and low-latency on source updates. https://github.com/cocoindex-io/cocoindex/
Qdrant is the leading open-source vector database designed to handle high-dimensional vectors for performance and massive-scale AI applications. https://github.com/qdrant/qdrant

It is simple to export exports data to a Qdrant collection.

The spec takes the following fields:

collection_name (type: str, required): The name of the collection to export the data to.
grpc_url (type: str, optional): The gRPC URL of the Qdrant instance. Defaults to http://localhost:6334/.
api_key (type: str, optional). API key to authenticate requests with.

Before exporting, you must create a collection with a vector name that matches the vector field name in CocoIndex, and set setup_by_user=True during export.

doc_embeddings.export(
    "doc_embeddings",
    cocoindex.storages.Qdrant(
        collection_name="cocoindex",
        grpc_url="https://xyz-example.cloud-region.cloud-provider.cloud.qdrant.io:6334/",
        api_key="<your-api-key-here>",
    ),
    primary_key_fields=["id_field"],
    setup_by_user=True,
)

🚀 Getting started (with example code!) with less than 50 lines of python!:

https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_qdrant

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(
    flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
):
    """
    Define an example flow that embeds text into a vector database.
    """
    data_scope["documents"] = flow_builder.add_source(
        cocoindex.sources.LocalFile(path="markdown_files")
    )

    doc_embeddings = data_scope.add_collector()

    with data_scope["documents"].row() as doc:
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown",
            chunk_size=2000,
            chunk_overlap=500,
        )

        with doc["chunks"].row() as chunk:
            chunk["embedding"] = text_to_embedding(chunk["text"])
            doc_embeddings.collect(
                id=cocoindex.GeneratedField.UUID,
                filename=doc["filename"],
                location=chunk["location"],
                text=chunk["text"],
                # 'text_embedding' is the name of the vector we've created the Qdrant collection with.
                text_embedding=chunk["embedding"],
            )

    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.storages.Qdrant(
            collection_name="cocoindex", grpc_url="http://localhost:6334/"
        ),
        primary_key_fields=["id"],
        setup_by_user=True,
    )

We are constantly improving and adding new examples and blogs. Please drop a star at our github repo https://github.com/cocoindex-io/cocoindex for the latest updates!

Build Your Own Semantic Search Engine in Under 50 Lines—No Joke

Too Long; Didn't Read

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

Build Your Own Semantic Search Engine in Under 50 Lines—No Joke

Too Long; Didn't Read

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics