How We Measured Media Slant and Text Pre-Processing and Featurization

This section describes how we construct the language measures used as outcomes in our regression analysis. We aim to capture the textual similarity between (i) the newspaper article snippets and (ii) the TV show transcripts. Therefore, we implement a supervised machine-learning approach to predict if a newspaper article’s content resembles that from a particular TV station (FNC or CNN/MSNBC).[4]

3.1. Text pre-processing and featurization

First, we preprocess the newspaper articles and TV transcripts, stem all words, and form bigrams (two-word phrases), see details in Appendix B.1.

The frequency threshold excludes infrequent bigrams that are highly distinctive for a given channel but carry little substantive political or topical information. This procedure produces a vocabulary V with 65,000 bigrams. Supervised learning models using n-grams are rarely sensitive to specific pre-processing and featurization choices (e.g., Denny and Spirling, 2018).

This paper is available on arxiv under CC 4.0 license.

[4] The approach is related to Gentzkow et al. (2019b), who also use a regularized linear model with n-gram inputs. Our different approach reflects a different scientific objective. Gentzkow et al. (2019b) are interested in measuring the level of polarization between groups in language. We are interested in forming a predicted probability of the source of a document for scoring influence in a second corpus. Other related methods are Peterson and Spirling (2018) and Osnabrügge et al. (2021).

[5] We have fewer snippets from FNC than from CNN/MSNBC. Thus, we randomly under-sample the snippets from the CNN/MSNBC corpus to match the number of snippets from FNC.

Authors:

(1) Philine Widmer, ETH Zürich and [email protected];

(2) Sergio Galletta, ETH Zürich and [email protected];

(3) Elliott Ash, ETH Zürich and [email protected].

How We Measured Media Slant and Text Pre-Processing and Featurization

Too Long; Didn't Read

Table of Links

3. Measuring Media Slant

3.1. Text pre-processing and featurization

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Categories

Trending Topics

How We Measured Media Slant and Text Pre-Processing and Featurization

Too Long; Didn't Read

Table of Links

3. Measuring Media Slant

3.1. Text pre-processing and featurization

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Categories

Trending Topics