From Script to Summary: A Smarter Way to Condense Movies

Written by scripting | Published 2025/04/09
Tech Story Tags: movie-script-summarization | scene-saliency-detection | abstractive-summarization | long-document-summarization | nlp-for-screenplays | large-language-models | summarization-dataset | select-and-summ-model

TLDRThis paper introduces a new dataset of 100 movie scripts with human-annotated salient scenes and proposes a two-stage model, SELECT & SUMM, which first identifies key scenes and then generates summaries using only those. The approach outperforms prior models in accuracy and efficiency, making movie script summarization more scalable and informative.via the TL;DR App

Authors:

(1) Rohit Saxena, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburg;

(2) RFrank Keller, Institute for Language, Cognition and Computation, School of Informatics, University of Edinburg.

Table of Links

Part 1

Part 2

Part 3

Part 4

Part 5

Part 6

A. Further Implementation Details

All experiments were performed on an A100 GPU with 80GB memory. It took approximately 22 hours to fully fine-tune the LED model and 30 hours for the Pegasus-X model. The LED-based models have 161M parameters, which were all fine-tuned. Our Scene Saliency Model has 60.2M parameters. The total number of parameters is 221.2M. The Pegasus-X has 568M parameters but its performance is lower than LED.

For evaluation, we used Benjamin Heinzerling’s implementation of Rouge[5] and BERTScore with the microsoft/deberta-xlarge-mnli model.

B. Scene Encoder Experiment

We compared the performance of Roberta with that of BART (Lewis et al., 2020) and LED (Encoder only) as the base models for computing scene embeddings in the classification of salient scenes. For each model, we employed the large variant and extracted the encoder’s last hidden state as scene embeddings. We report the results of scene saliency classification with different base models in Table 8. Among these models, Roberta’s embeddings performed marginally better and also had fewer parameters.

C. Classifier Robustness

To study the robustness of the scene saliency classifier we performed k-fold cross-validation with k = 5. We report mean results with standard deviation across all folds in Table 9. The low standard deviation shows that the performance of the scene classifier is robust across different folds.

D. Statistics for Summarization Result

All the ROUGE scores reported in the paper are mean F1 scores with bootstrap resampling with 1000 number of samples. To assess the significance of the results, we are reporting 95% confidence interval results for our model and the closest baseline in Table 10.

E. Samples of Movie Summaries

This paper is available on arxiv under CC BY 4.0 DEED license.


[5] https://github.com/bheinzerling/pyrouge


Written by scripting | Weaving spells of logic and creativity, bringing ideas to life, and automating the impossible.
Published by HackerNoon on 2025/04/09