TnT-LLM: Text Mining at Scale With Large Language Models

Table of Links

ABSTRACT

Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. This is particularly challenging when the label space is under-specified and large-scale data annotations are unavailable. In this paper, we address these challenges with Large Language Models (LLMs), whose prompt-based interface facilitates the induction and use of large-scale pseudo labels. We propose TnT-LLM, a two-phase framework that employs LLMs to automate the process of end-to-end label generation and assignment with minimal human effort for any given use-case. In the first phase, we introduce a zero-shot, multi-stage reasoning approach which enables LLMs to produce and refine a label taxonomy iteratively. In the second phase, LLMs are used as data labelers that yield training samples so that lightweight supervised classifiers can be reliably built, deployed, and served at scale. We apply TnT-LLM to the analysis of user intent and conversational domain for Bing Copilot (formerly Bing Chat), an open-domain chat-based search engine. Extensive experiments using both human and automatic evaluation metrics demonstrate that TnT-LLM generates more accurate and relevant label taxonomies when compared against state-of-the-art baselines, and achieves a favorable balance between accuracy and efficiency for classification at scale. We also share our practical experiences and insights on the challenges and opportunities of using LLMs for large-scale text mining in real-world applications.

1 INTRODUCTION

Text mining is the process of extracting useful information and insights from a large collection of textual data [10, 27]. Two central and interrelated tasks in text mining are taxonomy generation, which involves finding and organizing a set of structured, canonical labels that describe aspects of the corpus, and text classification, or the labeling of instances in the corpus using said taxonomy. Many use cases of interest to practitioners can be framed as the sequential

application of these two tasks, especially when the label space is not well-defined or when exploring a new corpus: For example, sentiment analysis consists of devising a sentiment taxonomy (e.g., “happy”, “sad”) and classifying text content (e.g., social media posts, product reviews) with labels in this taxonomy. Likewise, intent detection consists of defining a set of intents (e.g., “book a flight”, “buy a product”) and classifying text content (e.g., chatbot transcripts, search queries) with the intent labels.

An established approach to these two tasks is to first employ domain experts to curate a label taxonomy [4, 20, 31], then gather human annotations on a small set of corpus sample using the handcrafted taxonomy in order to train a machine learning model for text classification. While such human-in-the-loop approaches offer high interpretability, they face significant scalability challenges: They demand domain expertise and careful consideration of the granularity, coverage, and consistency of the labels [4], and manual annotation is time-consuming and costly, not to mention prone to errors and biases [28]. Moreover, the process must be repeated for each downstream use-case (e.g., sentiment analysis, intent detection, etc). Another line of work aims to solve these issues of scale via machine learning techniques like text clustering, topic modeling, and keyphrase mining. Such approaches flip the ordering of taxonomy generation and classification by first organizing the corpus sample into clusters in an unsupervised or semi-supervised fashion, then deriving the label taxonomy thereafter by describing the learned clusters. Such approaches scale better with the corpus size and use-cases, but describing text clusters in an interpretable and consistent way has proved challenging, so much so that is has been likened to “reading tea leaves” [5].

To address these challenges, in this paper we propose TnT-LLM, a novel framework that combines the interpretability of manual approaches with the scale of automatic text clustering and topic modeling. TnT-LLM is an end-to-end two-phase framework for joint Taxonomy Generation and Text Classification that relies on the unique strengths of instruction following Large Language Models (LLMs) in both phases. First, in the taxonomy generation phase, we devise a zero-shot multi-stage reasoning approach that prompts an LLM to produce and refine a label taxonomy iteratively with respect to the corpus for a given use-case (e.g., intent detection). Second, in the text classification phase, we adopt LLMs as data augmentors to scale up the creation of training data, which in turn is used to train lightweight classifiers capable of large-scale labeling. This framework is adaptable and modular, and can be customized to different use cases, text corpora, LLMs, and classifiers, while requiring little human intervention or input. In summary, our main contributions are as follows:

• We introduce TnT-LLM, an end-to-end two-phase framework to automate and scale the process of taxonomy generation and text classification with representative and interpretable labels.

• We present a series of quantifiable and traceable evaluation strategies to validate each stage of this framework, including deterministic automatic metrics, human evaluation metrics, as well as LLM-based evaluations.

• We use TnT-LLM to analyze conversations from Bing Copilot (formerly Bing Chat), a web-scale, multilingual, and open-domain conversational agent. Our results show that the proposed framework can produce more accurate and relevant label taxonomies compared to the state-of-the-art text clustering approaches. We also demonstrate that the lightweight label classifiers trained on LLM annotations can achieve comparable (and sometimes better) performance than directly using LLMs as classifiers, but with much higher scalability and model transparency. Through quantitative and qualitative analysis, we provide insights and recommendations for applying LLMs on large-scale text mining.

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) Mengting Wan, Microsoft Corporation and Microsoft Corporation;

(2) Tara Safavi (Corresponding authors), Microsoft Corporation;

(3) Sujay Kumar Jauhar, Microsoft Corporation;

(4) Yujin Kim, Microsoft Corporation;

(5) Scott Counts, Microsoft Corporation;

(6) Jennifer Neville, Microsoft Corporation;

(7) Siddharth Suri, Microsoft Corporation;

(8) Chirag Shah, University of Washington and Work done while working at Microsoft;

(9) Ryen W. White, Microsoft Corporation;

(10) Longqi Yang, Microsoft Corporation;

(11) Reid Andersen, Microsoft Corporation;

(12) Georg Buscher, Microsoft Corporation;

(13) Dhruv Joshi, Microsoft Corporation;

(14) Nagu Rangan, Microsoft Corporation.

TnT-LLM: Text Mining at Scale With Large Language Models

Too Long; Didn't Read

People Mentioned

Coin Mentioned

Table of Links

ABSTRACT

1 INTRODUCTION

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

TnT-LLM: Text Mining at Scale With Large Language Models

Too Long; Didn't Read

People Mentioned

Coin Mentioned

Table of Links

ABSTRACT

1 INTRODUCTION

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics