paint-brush
AI Chatbot Esalisaka na Gérer ba Communautés ya Telegramme Lokola Propene@slavasobolev
Lisolo ya sika

AI Chatbot Esalisaka na Gérer ba Communautés ya Telegramme Lokola Pro

pene Iaroslav Sobolev12m2025/01/09
Read on Terminal Reader

Molai mingi; Mpo na kotánga

Chatbot ya Telegram ekozwa biyano na mituna na kobimisaka ba informations na histoire ya ba messages ya chat. Ekoluka biyano oyo ebongi na kolukaka biyano oyo ezali penepene na lisolo ya bato. Bot ezo résumer ba résultats ya recherche na aide ya LLM pe ezo zongisa na usager réponse ya suka na ba liens ya ba messages pertinents.
featured image - AI Chatbot Esalisaka na Gérer ba Communautés ya Telegramme Lokola Pro
Iaroslav Sobolev HackerNoon profile picture
0-item
1-item


Ba communautés, ba chats, na ba forums ezali source sans fin ya information na ebele ya ba sujets. Mbala mingi Slack ezwa esika ya mikanda ya tekiniki, mpe masanga ya Telegram mpe Discord esalisaka na masano, kobanda, crypto, mpe mituna ya mobembo. Atako makambo oyo moto amonaka na yango ezali na ntina, mbala mingi ezalaka mpenza na ebongiseli te, mpe yango esalaka ete ezala mpasi mpo na kolukaluka na kati. Na lisolo oyo, tokotala ba complexités ya ko mettre en œuvre bot ya Telegram oyo ekozwa biyano na mituna na kobimisaka ba informations na histoire ya ba messages ya chat.


Talá mikakatano oyo ezali kozela biso:

  • Luka ba messages oyo etali yango . Eyano ekoki kopalangana na masolo ya bato mingi to na lien na ba ressources ya libanda.

  • Ko ignorer offtopic . Ezali na ba spam mpe ba off-topics ebele, oyo esengeli toyekola ko identifier mpe ko filtrer

  • Kopesa makambo na esika ya liboso . Ba informations ekomi ya kala. Ndenge nini oyebi eyano ya malamu tii lelo?


Basic chatbot userflow tozo kende ko mettre en œuvre

  1. Mosaleli atuni bot motuna moko
  2. Bot yango ezwa biyano oyo eleki penepene na lisolo ya ba messages
  3. Bot ezo résumer ba résultats ya recherche na aide ya LLM
  4. Ezongisaka na mosaleli eyano ya suka na ba liens ya ba messages oyo etali yango


Totambola na ba étapes minene ya flux oyo ya usager pe to souligner ba défis minene oyo tokokutana na yango.

Bobongisi ya ba données

Pona kobongisa histoire ya message pona boluki, esengeli tosala ba embeddings ya ba messages wana - ba représentations ya texte vectorisées. Ntango tozalaki kosala na lisolo ya wiki to mokanda ya PDF, tozalaki kokabola makomi na baparagrafe mpe kosala calcul ya Embedding ya phrase mpo na moko na moko.


Kasi, tosengeli kotalela makambo ya kokamwa oyo ezalaka mingimingi mpo na masolo kasi te mpo na makomi oyo ebongisami malamu:


  • Ba messages mikuse ebele oyo elandi ewutaka na mosaleli moko. Na makambo ya ndenge wana, ezali malamu kosangisa bansango na ba blocs ya minene ya makomi
  • Ba messages mosusu ezali milayi mingi mpe etali ba sujets ebele ya ndenge na ndenge
  • Ba messages sans sens na ba spam esengeli to filtrer
  • Mosaleli akoki koyanola kozanga kotya bilembo na nsango ya ebandeli. Motuna mpe eyano ekoki kokabolama na lisolo ya kosolola na bamesaje mosusu mingi
  • Mosaleli akoki koyanola na lien ya lisungi ya libanda (ndakisa, lisolo to mokanda) .


Na sima, esengeli topona modèle ya intégration. Ezali na ba modèles ebele ya ndenge na ndenge pona kotonga ba embeddings, pe esengeli kotalela makambo ebele tango ya kopona modèle oyo ebongi.


  • Dimension ya ba embeddings . Soki ezali likolo, modèle ekoki koyekola ba nuances mingi na ba données. Boluki ekozala ya sikisiki kasi ekosenga ba ressources ya mémoire mpe ya calcul mingi.
  • Ensemble ya ba données oyo modèle ya intégration e former. Yango nde ekomonisa, na ndakisa, ndenge nini esimbaka monɔkɔ oyo osengeli na yango.


Mpo na kobongisa lolenge ya ba résultats ya boluki, tokoki ko classer ba messages na sujet. Ndakisa, na chat oyo epesameli na développement ya frontend, basaleli bakoki kolobela ba sujets lokola : CSS, tooling, React, Vue, etc. Okoki kosalela LLM (ya talo mingi) to ba méthodes classiques ya modélisation ya sujets oyo ewutaka na ba bibliothèques lokola BERTopic pona ko classer ba messages na yango mitó ya makambo.


Tokozala pe na besoin ya base de données vectorielle pona kobomba ba embeddings na ba méta-informations (ba liens ya ba posts originals, ba catégories, ba dates). Ba stockages ya vecteur mingi, lokola FAISS , Milvus , to Pinecone , ezali pona tina oyo. PostgreSQL ya mbala na mbala na extension pgvector ekosala pe.

Kosala motuna ya basaleli

Mpo na koyanola na motuna ya mosaleli, tosengeli kobongola motuna na formulaire oyo ekoki kolukama, mpe bongo kosala calcul ya embedding ya motuna, mpe lisusu koyeba mokano na yango.


Résultat ya recherche sémantique na question ekoki kozala ba questions ya ndenge moko oyo ewutaki na histoire ya chat mais biyano na yango te.


Pona ko imporver yango, tokoki kosalela moko ya ba techniques ya optimisation ya HyDE (hypothétique ya documentation) oyo eyebani mingi. Likanisi ezali ya kobimisa eyano ya hypothétique na motuna moko na kosalelaka LLM mpe na sima kosala calcul ya intégration ya eyano. Lolenge oyo na makambo mosusu epesaka nzela na koluka na bosikisiki mpe malamu mingi bansango oyo etali yango kati na biyano na esika ya mituna.


Koluka bansango oyo ezali na ntina mingi

Soki tozwi embedding ya motuna, tokoki koluka ba messages oyo ezali pene na base de données. LLM ezali na fenêtre ya contexte limité, yango wana tokoki kozala na makoki te ya kobakisa ba résultats nionso ya boluki soki ezali mingi. Motuna ebimi ya ndenge nini kotya biyano na esika ya liboso. Ezali na mayele mingi mpo na yango:


  • Point ya récency . Na tango, ba information ekomi ya kala, pe pona ko prioritiser ba messages ya sika, okoki ko calculer score ya recency na nzela ya formule simple 1 / (today - date_of_message + 1)


  • Filtre ya ba métadonnées. (esengeli o identifier sujet ya question na ba posts). Yango esalisaka mpo na kokitisa bolukiluki na yo, kotika kaka ba posts oyo ezali na ntina na sujet oyo ozali koluka


  • Boluki ya makomi mobimba . Boluki ya makomi mobimba ya kala, oyo esungami malamu na ba bases de données nionso oyo eyebani mingi, ekoki ntango mosusu kozala na litomba.


  • Kobongola na molɔngɔ ya bato . Soki tozwi biyano, tokoki kokabola yango na ndenge oyo ‘ezali penepene’ na motuna, kotikala kaka oyo ezali na ntina mingi. Reranking ekosenga modèle ya CrossEncoder , to tokoki kosalela API ya reranking, ndakisa, uta na Cohere .


Kobimisa eyano ya nsuka

Sima ya koluka pe kosala triage na étape oyo eleki, tokoki ko garder ba posts 50-100 oyo ezo correspondre na contexte ya LLM.


Etape elandi ezali ya kosala prompt ya polele mpe ya mokuse mpo na LLM na kosalelaka requête originale ya usager mpe ba résultats ya recherche. Esengeli koyebisa na LLM ndenge nini koyanola na motuna, motuna ya mosaleli, mpe contexte - ba messages oyo etali yango tokuti. Mpo na yango, ezali na ntina mingi kotalela makambo oyo:


  • System Prompt ezali malako na modèle oyo elimboli ndenge nini esengeli kosala ba informations. Ndakisa, okoki koyebisa LLM eluka eyano kaka na ba données oyo epesami.


  • Bolai ya contexte - bolai ya likolo ya ba messages oyo tokoki kosalela lokola entrée. Tokoki kosala calcul ya nombre ya ba jetons na nzela ya tokenizer oyo ekokani na modèle oyo tosalelaka. Ndakisa, OpenAI esalela Tiktoken.


  • Hyperparamètres ya modèle - ndakisa, température ezali responsable ya ndenge nini modèle ekozala créatif na ba réponses na yango.


  • Pona ya modèle . Ezali ntango nyonso na ntina te kofuta mingi mpo na modèle oyo eleki monene mpe ya nguya. Ezali na ntina kosala ba tests ebele na ba modèles différents mpe ko comparer ba résultats na yango. Na makambo mosusu, ba modèles oyo esalelaka mingi te makoki ekosala mosala soki esengi bosikisiki ya likolo te.


Kosalela yango

Sikoyo tomeka ko mettre en œuvre ba étapes oyo na NodeJS. Tala tech stack oyo nakosalela:


  • NodeJS mpe TypeScript
  • Grammy - Cadre ya bot ya télégramme
  • PostgreSQL - lokola ebombelo ya liboso mpo na ba données na biso nionso
  • pgvector - Bobakisi ya PostgreSQL mpo na kobomba ba intégration ya makomi mpe ba messages
  • OpenAI API - LLM и na ba modèles ya ba intégrations
  • Mikro-ORM - pona ko simplifier ba interactions ya db


To sauter ba étapes ya base ya ko installer ba dépendances na setup ya bot ya télégramme pe tokende mbala moko na ba fonctionnalités ya motuya mingi. Schéma ya base de données, oyo ekozala na besoin na sima:


 import { Entity, Enum, Property, Unique } from '@mikro-orm/core'; @Entity({ tableName: 'groups' }) export class Group extends BaseEntity { @PrimaryKey() id!: number; @Property({ type: 'bigint' }) channelId!: number; @Property({ type: 'text', nullable: true }) title?: string; @Property({ type: 'json' }) attributes!: Record<string, unknown>; } @Entity({ tableName: 'messages' }) export class Message extends BaseEntity { @PrimaryKey() id!: number; @Property({ type: 'bigint' }) messageId!: number; @Property({ type: TextType }) text!: string; @Property({ type: DateTimeType }) date!: Date; @ManyToOne(() => Group, { onDelete: 'cascade' }) group!: Group; @Property({ type: 'string', nullable: true }) fromUserName?: string; @Property({ type: 'bigint', nullable: true }) replyToMessageId?: number; @Property({ type: 'bigint', nullable: true }) threadId?: number; @Property({ type: 'json' }) attributes!: { raw: Record<any, any>; }; } @Entity({ tableName: 'content_chunks' }) export class ContentChunk extends BaseEntity { @PrimaryKey() id!: number; @ManyToOne(() => Group, { onDelete: 'cascade' }) group!: Group; @Property({ type: TextType }) text!: string; @Property({ type: VectorType, length: 1536, nullable: true }) embeddings?: number[]; @Property({ type: 'int' }) tokens!: number; @Property({ type: new ArrayType<number>((i: string) => +i), nullable: true }) messageIds?: number[]; @Property({ persist: false, nullable: true }) distance?: number; }


Kabola ba dialogues ya mosaleli na biteni

Kokabola ba dialogues milayi kati na basaleli ebele na biteni ezali mosala ya mpamba te.


Malheureusement, ba approches par défaut lokola RecursiveCharacterTextSplitter , oyo ezali na bibliothèque ya Langchain, ezo comptabiliser te ba peculiarités nionso spécifiques na chatting. Kasi, na oyo etali Telegram, tokoki ko profiter na threads ya Telegram oyo ezali na ba messages oyo etali yango mpe ba réponses oyo ba usagers batindi.


Mbala nyonso oyo etuluku ya sika ya bamesaje ekoya uta na salle ya kosolola, bot na biso esengeli kosala mwa makambo:


  • Filtrer ba messages mikuse na liste ya maloba ya stop (ndakisa 'bonjour', 'bye', etc.)
  • Sangisa ba messages oyo euti na mosaleli moko soki etindamaki na kolandana na boumeli ya ntango mokuse
  • Bosangisa ba messages nionso oyo ezali ya thread moko
  • Sangisa bituluku ya nsango oyo ezwami na ba blocs ya makomi ya minene mpe kokabola lisusu ba blocs ya makomi oyo na biteni na kosalelaka RecursiveCharacterTextSplitter
  • Salá calcul ya ba embeddings mpo na chunk mokomoko
  • Persister ba chunks ya texte na base de données elongo na ba embeddings na yango mpe ba liens na ba messages originales


 class ChatContentSplitter { constructor( private readonly splitter RecursiveCharacterTextSplitter, private readonly longMessageLength = 200 ) {} public async split(messages: EntityDTO<Message>[]): Promise<ContentChunk[]> { const filtered = this.filterMessage(messages); const merged = this.mergeUserMessageSeries(filtered); const threads = this.toThreads(merged); const chunks = await this.threadsToChunks(threads); return chunks; } toThreads(messages: EntityDTO<Message>[]): EntityDTO<Message>[][] { const threads = new Map<number, EntityDTO<Message>[]>(); const orphans: EntityDTO<Message>[][] = []; for (const message of messages) { if (message.threadId) { let thread = threads.get(message.threadId); if (!thread) { thread = []; threads.set(message.threadId, thread); } thread.push(message); } else { orphans.push([message]); } } return [Array.from(threads.values()), ...orphans]; } private async threadsToChunks( threads: EntityDTO<Message>[][], ): Promise<ContentChunk[]> { const result: ContentChunk[] = []; for await (const thread of threads) { const content = thread.map((m) => this.dtoToString(m)) .join('\n') const texts = await this.splitter.splitText(content); const messageIds = thread.map((m) => m.id); const chunks = texts.map((text) => new ContentChunk(text, messageIds) ); result.push(...chunks); } return result; } mergeMessageSeries(messages: EntityDTO<Message>[]): EntityDTO<Message>[] { const result: EntityDTO<Message>[] = []; let next = messages[0]; for (const message of messages.slice(1)) { const short = message.text.length < this.longMessageLength; const sameUser = current.fromId === message.fromId; const subsequent = differenceInMinutes(current.date, message.date) < 10; if (sameUser && subsequent && short) { next.text += `\n${message.text}`; } else { result.push(current); next = message; } } return result; } // .... }


Ba embeddings ya biloko

Na sima, esengeli tosala calcul ya ba embeddings pona moko na moko ya ba chunks. Pona yango tokoki kosalela modèle OpenAI text-embedding-3-large


 public async getEmbeddings(chunks: ContentChunks[]) { const chunked = groupArray(chunks, 100); for await (const chunk of chunks) { const res = await this.openai.embeddings.create({ input: c.text, model: 'text-embedding-3-large', encoding_format: "float" }); chunk.embeddings = res.data[0].embedding } await this.orm.em.flush(); }



Koyanola na mituna ya basaleli

Mpo na koyanola na motuna ya mosaleli, totangaka liboso ndenge oyo motuna yango ekɔtisami mpe na nsima tolukaka bansango oyo ezali na ntina mingi na lisolo ya kosolola


 public async similaritySearch(embeddings: number[], groupId; number): Promise<ContentChunk[]> { return this.orm.em.qb(ContentChunk) .where({ embeddings: { $ne: null }, group: this.orm.em.getReference(Group, groupId) }) .orderBy({[l2Distance('embedding', embedding)]: 'ASC'}) .limit(100); }



Na sima to rerank ba résultats ya recherche na aide ya modèle ya reranking ya Cohere


 public async rerank(query: string, chunks: ContentChunk[]): Promise<ContentChunk> { const { results } = await cohere.v2.rerank({ documents: chunks.map(c => c.text), query, model: 'rerank-v3.5', }); const reranked = Array(results.length).fill(null); for (const { index } of results) { reranked[index] = chunks[index]; } return reranked; }



Na sima, senga LLM ayanola na motuna ya mosaleli na kozongeláká na mokuse ba résultats ya boluki. Version simplifiée ya traitement ya requête ya recherche ekozala boye:


 public async search(query: string, group: Group) { const queryEmbeddings = await this.getEmbeddings(query); const chunks = this.chunkService.similaritySearch(queryEmbeddings, group.id); const reranked = this.cohereService.rerank(query, chunks); const completion = await this.openai.chat.completions.create({ model: 'gpt-4-turbo', temperature: 0, messages: [ { role: 'system', content: systemPrompt }, { role: 'user', content: this.userPromptTemplate(query, reranked) }, ] ] return completion.choices[0].message; } // naive prompt public userPromptTemplate(query: string, chunks: ContentChunk[]) { const history = chunks .map((c) => `${c.text}`) .join('\n----------------------------\n') return ` Answer the user's question: ${query} By summarizing the following content: ${history} Keep your answer direct and concise. Provide refernces to the corresponding messages.. `; }



Bobongisi mosusu

Ata sima ya ba optimisations nionso, tokoki koyoka ba réponses ya bot oyo esalemi na LLM ezali non-ideal mpe incomplete. Nini mosusu ekokaki kobongisama?


  • Mpo na ba posts ya usager oyo ezali na ba liens, tokoki pe ko parser ba contenus ya web-pages na pdf-documents.

  • Query-Routing — ko diriger ba queries ya mosaleli na source ya ba données oyo ebongi mingi, modèle, to index oyo esalemi na intention mpe contexte ya query pona ko optimiser précision, efficacité, pe coût.

  • Tokoki kokotisa ba ressources pertinentes na sujet ya chat-room na index ya recherche — na mosala, ekoki kozala documentation oyo ewutaka na Confluence, pona ba chats ya visa, ba sites internet ya consulat na mibeko, etc.

  • RAG-Evaluation - Esengeli tosala pipeline pona ko évaluer qualité ya ba réponses ya bot na biso