162 reads

Build Your Own Semantic Search Engine in Under 50 Lines—No Joke

by LJApril 21st, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

CocoIndex is an open-source ETL to turn data AI-ready with real-time incremental processing for performance and low-latency on source updates. Qdrant is the leading open- source vector database designed to handle high-dimensional vectors for performance. Getting started with less than 50 lines of python.
featured image - Build Your Own Semantic Search Engine in Under 50 Lines—No Joke
LJ HackerNoon profile picture
0-item
1-item

CocoIndex is officially supporting Qdrant! This integration combines high performance RUST 🦀 stack with real-time ETL to vector store:



It is simple to export exports data to a Qdrant collection.


The spec takes the following fields:

  • collection_name (type: str, required): The name of the collection to export the data to.
  • grpc_url (type: str, optional): The gRPC URL of the Qdrant instance. Defaults to http://localhost:6334/.
  • api_key (type: str, optional). API key to authenticate requests with.


Before exporting, you must create a collection with a vector name that matches the vector field name in CocoIndex, and set setup_by_user=True during export.


doc_embeddings.export(
    "doc_embeddings",
    cocoindex.storages.Qdrant(
        collection_name="cocoindex",
        grpc_url="https://xyz-example.cloud-region.cloud-provider.cloud.qdrant.io:6334/",
        api_key="<your-api-key-here>",
    ),
    primary_key_fields=["id_field"],
    setup_by_user=True,
)


🚀 Getting started (with example code!) with less than 50 lines of python!:

https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_qdrant

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(
    flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
):
    """
    Define an example flow that embeds text into a vector database.
    """
    data_scope["documents"] = flow_builder.add_source(
        cocoindex.sources.LocalFile(path="markdown_files")
    )

    doc_embeddings = data_scope.add_collector()

    with data_scope["documents"].row() as doc:
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown",
            chunk_size=2000,
            chunk_overlap=500,
        )

        with doc["chunks"].row() as chunk:
            chunk["embedding"] = text_to_embedding(chunk["text"])
            doc_embeddings.collect(
                id=cocoindex.GeneratedField.UUID,
                filename=doc["filename"],
                location=chunk["location"],
                text=chunk["text"],
                # 'text_embedding' is the name of the vector we've created the Qdrant collection with.
                text_embedding=chunk["embedding"],
            )

    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.storages.Qdrant(
            collection_name="cocoindex", grpc_url="http://localhost:6334/"
        ),
        primary_key_fields=["id"],
        setup_by_user=True,
    )


We are constantly improving and adding new examples and blogs. Please drop a star at our github repo https://github.com/cocoindex-io/cocoindex for the latest updates!

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks