Authors:
(1) PIOTR MIROWSKI and KORY W. MATHEWSON, DeepMind, United Kingdom and Both authors contributed equally to this research;
(2) JAYLEN PITTMAN, Stanford University, USA and Work done while at DeepMind;
(3) RICHARD EVANS, DeepMind, United Kingdom.
Table of Links
Storytelling, The Shape of Stories, and Log Lines
The Use of Large Language Models for Creative Text Generation
Evaluating Text Generated by Large Language Models
Conclusions, Acknowledgements, and References
A. RELATED WORK ON AUTOMATED STORY GENERATION AND CONTROLLABLE STORY GENERATION
B. ADDITIONAL DISCUSSION FROM PLAYS BY BOTS CREATIVE TEAM
C. DETAILS OF QUANTITATIVE OBSERVATIONS
E. FULL PROMPT PREFIXES FOR DRAMATRON
F. RAW OUTPUT GENERATED BY DRAMATRON
5 PARTICIPANT INTERVIEWS
Throughout our interviews with the 15 participants (anonymised as p1, p2, etc.), we collected qualitative feedback on co-writing with Dramatron. In this section, we summarize this feedback into seven themes. Each is presented alongside supporting quotes from participant interviews.
(1) Positive comments about Dramatron focused on: hierarchical generation that lets the writer work on the narrative arc, the possibility either to co-author interactively or to simply let the system generate, and the potential of the output script to serve as source material for the human writer (Section 5.1).
(2) Participants identified inspiration, world building, and content generation as potential writing applications for Dramatron, and saw it as possible tool for literary analysis (Section 5.2).
(3) Participants noticed various biases embedded in the language model (Section 5.3).
(4) Some writers were interested by the involuntary glitch aesthetic and failure modes of Dramatron, such as repetition and dialogue loops (Section 5.4).
(5) Unsurprisingly, participants noticed logical gaps in storytelling, lack of common sense, nuance and subtext, which were manifest in the lack of motivation for the characters (Section 5.5).
(6) Structural criticism focused on the need to come up with a log line, as well as on the inconsistencies between consecutive scenes due to parallel dialogue generation (Section 5.6).
(7) Participants were engaged with the tool and eager to provide suggestions for improvement (Section 5.7).
5.1 Positive Comments about Dramatron
5.1.1 Praise for the interactive hierarchical generation in Dramatron. All participants but p4 and p5 (who preferred a nonlinear writing workflow) were enthusiastic about the interactive hierarchical generation. âOnce I see this, I know the shape of the series. I know the way that the story unfolds. I can see the narrative more clearly [...] I like this approach of making it a log line and then packing the detail inside it. You are planting a seed of an idea and it is putting meat on the bonesâ (p13). âAll of it is quite consistent, symbolically consistent and coherent and relates to the state of affairs of the state of the play [...] There is lots of emotion and content about relationships in some of the generationsâ (p8). âIn terms of the interactive co-authorship process, I think it is great [...] â (p9). âWhat I like about the hierarchy is that you can do as much human-ing as you want at any levelâ (p2). âIn working with the machine I can see the content a little more clearly. As there is specificity, character arcs, then I can see how the story comes together [...] This [hierarchical generation] really felt so much cleaner than the process [GPT-2 or GPT-3 with flat prompting] I was usingâ (p15). âLetâs try more! God, you could just waste your time doing thisâ (p3). Participants p1, p6 and p3 further noted how such hierarchical generation helped with dialogue: âthere is good content from any generationâ (p1) and (referring to one of the generations) âYou got some big profound discussions in it. I am impressed with that oneâ (p3).
5.1.2 Ease of use of Dramatronâs UI and seed-based generation. Participant p13 liked the user experience of interactive, step-by-step generation of title, characters and plot, whereas p10 thought that âinteraction seemed simpler when the whole script was generated ahead of time rather than editing itâ. Participant p1 tried and discussed three different modes of script generation: 1) interactive co-authorship, 2) modifying the output from one fully automated generation, and 3) curating and modifying outputs from 3-4 generations. The benefits of running multiple generations included having âlots of materialâ, allowing to âpull good ideasâ, âcherry-pickingâ, âmore interpretations and artistic freedomâ but ârequires more massaging on my endâ and âword crafting to make it flowâ (p1). Participant p1 developed a workflow for co-generating a script that included editing lists of characters and editing the log line to add more âcharacters that we know aboutâ, giving the characters status and names, adding them to the plotâs beats. When crafting the log line, p1 wanted to imply high stakes and âstay with humanoid characters: non-human characters take us to the Theatre of the Absurd, to the Surreal, to Magical Realismâ, and they wanted log-lines that situated the story in realism âto meet the audiences expectationsâ and âset things at a specific locationâ.
5.1.3 About the potential for the script to be staged after editing. Several participants (p6, p9, p11, p13, p15) highlighted the potential for the script to be staged after editing: âa rough draft, would need to work a lot with it [but] it could be helpful and staged, definitelyâ (p6), âIt gets me thinking about how you can make a full show with a single ideaâ (p11) and âYou know, with a bit of editing, I could take that to Netflix: just need to finesse it a little bitâ (p9). Participant p1 staged several scripts generated with Dramatron (see Section 5.9).
5.2 Potential Uses of the System
5.2.1 Inspiration for the Writer. All participants found Dramatron useful for getting inspiration: âthis is perfect for writersâ blockâ (p13), âI can see it being very helpful, if you are stuckâ (p4, p5), âmore in depth than the writersâ unblocking prompts websiteâ (p3). Dramatron was described as a tool that indirectly stimulates the playwrightâs creativity: âI like what happens in my brain when I read some outputs of the model. I got an idea for the rest of the storyâ (p6), âIt is about me discovering what will translate from what it gives meâ (p10), or that directly gives actionable suggestions: âHere is a concept; it puts meat on the bones, and then you trim the fat by going back and forthâ (p13). Glitches and language model limitations can be subverted for inspiration, in particular when the script is performed: âmistakes are gifts that we can leave for the improvisersâ (p1).
5.2.2 Generation of Alternative Choices and World Building. More than merely providing a creative spark for the main story, the model can be employed to populate the universe of the story: âIf I was going to use this to write a script, Iâd use it to generate characters to see if it generated things I hadnât thought about. Or relationships I hadnât thought aboutâ (p15). Dramatron for exploration: âI would go with the suggestion that is further away from what I would have suggested because I already know what is in my head and I want to know what the machine would doâ (p12).
5.2.3 Using the System for Learning and Analysis. By prompting the system, writers could indirectly search the language model for literary styles and elements: âEven if I were not writing, it does a wonderful job of collecting what is in the literatureâ (p10) or even hypothetically search within their own output: âI would be very interested in feeding everything I ever wrote and then getting it to generate script in my voice and styleâ (p4, p5). Learning could also happen by analysing how to improve Dramatronâs outputs: âFor me, as a playwright, the interesting thing about working with this technology is thinking about how I would edit it. For instance: What would this look like on stage?â (p8).
5.2.4 Content Generation. Beyond inspiration, several participants were interested by the co-writing potential of Dramatron, and thought it could provide them with material. âOne of the big sticking points of playwriting is getting words on the page. This helps with that stepâ (p8). âI would use this tool to fix (screenwriting) projects that might be deadâ (p14). âThis is a rich tool for basically everything. I have done devised creation. There are methods that you can use to generate text, where you pull songs, scripts, or news articles, then chop and paste them down. This reminds me of Dadaist text generationâ (p11). âPractically, it might impact the economics of writing if longer running series could be augmented by such writing systems. It might be useful on long-running series, where you have a writers roomâ (p4, p5).
5.2.5 Potential of AI as Tool for TV Screenwriting. Some participants suggested this tool could be employed in a TV writersâ room, to help with writing formulaic scripts. âIf you were able to make an AI to synopsize scripts effectively, you would be valuable to the studioâ (p14). âIt is like having a very good dramaturgeâ (p10). âAI can come up with 5 scripts in 5 minutesâ (p9). âWhich part of the process is this tool relevant for? Formulaic TV seriesâ (p4, p5). âI wouldnât use it for writing a straight playâ (p11).
5.3 Stereotypes
5.3.1 The system outputs are too literal and predictable. Some participants found the character ârelationships so tight and prescriptiveâ (p4, p5); if a character has âa noble endeavour, it will be stated in the dialogueâ (p4, p5), and that characters were given âsillyâ and âon the nose, pun namesâ (p2). Similarly, the title generation âdoes what it says on the tinâ (p15), and âcan be overly descriptive sometimes: the director could make decisionsâ (p8). One commented, âthis is a thing that my students would doâ (p8). There were some positive aspects to such a predictable system: âinterpersonal relationships created here are classic tropes that keep the audience interestedâ (p3) and âthere is interest in generating outputs from the system for content that already exists: actual titles are fun to compare againstâ (p14).
5.3.2 The system outputs can be problematic, stereotypical, and biased. Participant p9 wondered âWhat cultures and languages the books come?â whereas many participants noticed gender biases and ageism in the system outputs. âI am less sexist than the computerâ (p3). âThe protagonists are both male characters, and all of the supporting characters are femaleâ (p4, p5). âThe female lead is defined by their relationship to the other characters: it is a typical thing in plays that the women characters donât have a lot of information about themâ (p11). âShe is always upset and doesnât have wants (like the male characters) [...] Actually lots of the content [...] is misogynistic and patriarchalâ (p8). This problem raised the issue of coping strategies or cultural appropriation: âif we gave GPT-2 some character names, it could come up with bigoted characters: [we] went with more made up names, not gender specific, not ethnicity-specificâ (p13) and âthere is an ethical question about using AI for a group of theatre makers: the AI throws us a topic, or relation that is unrelated to our lived experience and we are compelled to Yes, and the offersâ (p4, p5). We discuss ethical issues raised in discussion by participants in greater detail in Section 7.3.
5.4 Glitches
5.4.1 Participants embrace unexpected outputs from the system. Participant p6 laughed at the âpoetic and absurdâ suggestions. âIt is really interesting to see what it comes up withâ (p8), âlevels of absurdity that are tickling my fancyâ (p10), âI wouldnât have thought of that but it is quite funnyâ (p11). âThis is something that a human author probably would not stand for, it is uniquely created [...] I want ideas that a human couldnât possibly haveâ (p12).
5.4.2 The system often enters in generation loops. All participants noticed how the system could enter generation loops: âI would probably cut a lot of itâ (p6) or âa whole scene about a boiler being broken: yeahâ (p8). They sometimes found positive aspects to such loops: âIt is a silly conversation. It is a little repetitive. I like it.â (p6), ârepetition leaves room for subtextâ (p12) and enjoyed the glitches (p4, p5) or even made parallels with existing work (p3).
5.5 Fundamental Limitations of the Language Model and of Dramatron
5.5.1 Lack of consistency and of long-term coherence. âKeeping dialogue character-based and consistent is most important [...] There is still some difficulty in getting it to stay on track with the context.â (p15). âI want the characters to be more consistent within themselvesâ (p12). âThere is a bit of confusion in the logic, gaps in logic [...] It looks like postmodern theatre [...] But in terms of [a play with a given] genre, that has a plot to follow, it is getting confusingâ (p11). Participant 7 âwants to add some stitching between the beats to make them narratively make senseâ.
5.5.2 Lack of common sense and embodiment. Participant 8 observed that âThere are things that it is hard to show on stage â such as a cat. The system doesnât have an awareness of what is stageable and not stageableâ and p9 noted that when âinterfacing with a story telling AI, the input space is constrainedâ.
5.5.3 Lack of nuance and subtext. Participant 3 observed: âthatâs a good example of how computers do not understand nuance, the way we see language and can understand it even if it is not super specificâ. âA lot of information, a bit too verbalised, there should be more subtextâ (p6). âWith dialogue in plays, you have to ask yourself two questions: 1) Do people actually speak like that? 2) Are actors attracted to these lines and are these appealing lines to play?â (p7) âPlaywriting is about realistic dialogue... all of the things around subtext. [...] Show, not tell: here we are just telling. Just like in improv: âdo not mention the thingâ. The element in the log line became the central bit in the generation, and that was repetitiveâ (p8). Participant 14 concluded that âAI will never write Casablanca, or A Wonderful Life. It might be able to write genre boxed storytellingâ.
5.5.4 Lack of a motivation for the characters. âThe stories do not finish. The character journeys are not complete. There is perhaps something missing in the character background [...] Where is the emotional motivation, stuff that might exist in the backstory and not exist in the script?â (p14). âOn the first go-through, you are looking for the goal of the protagonist, and impediment for that drive. What is my character doing, and what do they want? If this was given to an actor they are going to struggle with the first thing to do, which is to find the needs and the wants of the character and then to personalise itâ (p9). âMy students do this: a character comes into play and says right what they want.â (p8). âThe conflict should be something inside the characterâ (p6). âWhy do people not say what they mean? It is because we have societal understanding, but sometimes get lost in translationâ (p3).
5.6 Structural Problems of Dramatron
5.6.1 Difficulty caused by the need to come up with the log line to condition all the generation. For participant 12, it was difficult to come up with a log line, and the process seemed precious. âComing up with the first prompt takes a little bit of back and forthâ (p11). âPacking the action into the log line: this is a panic moment for the writer, because they want to add everything meaningful into the script. [...] It is all about the witty premise. The system that you have right now is somewhat about wit. There is a need for the log line to hold some kind of witâ (p13). âDoes [the log line] have to have a character name? (p4, p5). âThe log line is not a closed synopsis. It is less descriptive and more prescriptive. The art of log lines is about how short you can make it so that [the producers] read the rest of your materialâ (p14).
5.6.2 Structural criticism of log line-based conditioning of the whole generation. âGenerally the way that I work, I am clear what I want to say about the world â what I think about the world. The vehicles, or the characters, or the arc is not clear. This looks like a collection of scenes that logically follow one to the next. But, the core idea of the thing to say [is missing]â (p4, p5). âIf I could program something to write a script, I wouldnât start with a log line. You can also consider starting with a character and an obstacle in the way of that characterâ (p9).
5.6.3 Negative consequence of Dramatronâs design choice: parallel dialogue generation. âFrom the scene beats, it has no idea of what the previous dialogue contained. Then to see the dialogue not be consistent is jarringâ (p1). âI wonder if there is a problem in importing the previous beat into the scene [...] Paying attention to the consistency in the beats, helps with the consistency of the dialogue generatedâ (p12). Upon learning that scene dialogue was generated in parallel for each scene, Participant 2 commented: âIf it didnât read its last scene, how can you get the last scene into the next generation? Generation of these scripts could be significantly benefited from attending to the previous sceneâs dialogueâ.
5.7 Suggested Improvements to Dramatron
Modeling characters and their relationships was a recurrent theme: âcan we make the system relationship-driven?â (p12), âwhere does status belong in character building?â (p12), âcould we generate the stem of a character and then complete it?â (p15). Participant 12 suggested: âas an author, I would build a social graph of the characters relationsâ. Answering the question âHow do you get the system to know where the scene should start and end?â (p15), three participants (p8, p13, p15) suggested fitting a narrative arc within each scene.
Several participants wanted to be able to query and dialogue with the writing model: âHave you engaged [the AI system] by trying to give it notes?â (p2) to allow it to learn about the world: âHow does world building happen? Maybe the model needs to know the Ws of Stella Adler [(Who? What? Where? Why? How? etc.)] Can you get the system to answer these questions?â (p9), or to allow rewriting and reformulation: âcan we ask the system to re-write with a style or context?â (p8). As p10 reiterates, iterative rewriting was a desired workflow: âI am less interested in shaping [the narrative], rather than seeing what it is saying, and refining it to see what it says, and then refining it again. A playwright has to see the play spoken before making cuts.â
playwright has to see the play spoken before making cuts.â Finally, p4 and p5 astutely observed that âthere has been a push away from systems of Western dramaturgy, so in terms of making this most useful for the future, it might be helpful to consider how it might be used within the context of other contemporary writingââsuggesting alternative narrative structures and elementsââas the AI is not bound by the same rules that we are. So, telling it to be bound by those human rules feels limiting of the capabilitiesâ.
5.8 Incremental Tool Improvement
As detailed in Section 5.7, the participants were engaged and provided constructive feedback about Dramatron. As one of the participants in the study remarked: âthe system is so adaptable, it can change with our feedback and tweaksâ. This sort of understanding of the systems modifiability empowered those that interacted with it to more freely suggest changes, knowing that they could be incorporated. In this way, the system positively benefited and evolved over the course of the participant study.
Over the course of the interviews, we incorporate the feedback we could by making small, incremental changes to the prompt prefix sets of Dramatron. Table 1 summarizes changes made as a direct result of participantâs feedback. This sort of participatory design and development is critical for creative tool generation as the feedback from users can be directly incorporated to improve the system for the next interaction. This is made possible via the modular design of the system, the lightweight prompt-based interactions, and the flexibility afforded by Dramatron. This participation also inspires participants to explore related, connected, creative ideas. For example Fig. 4 (LEFT) shows concept art for a narrative test of virtual actors interpreting a co-written script.
5.9 Staging and Evaluating Productions of Scripts Co-written by Dramatron
Creative writing for theatre is fundamentally interactive: not just between collaborating storytellers, but between storytellers and the audience. For this reason, we evaluated how scripts co-written with Dramatron could be produced on the theatre stage. In this section, we describe staging details and report evaluative reflections from both the creative team and two professional theatre reviewers.
Five scripts co-written with Dramatron were staged in public performances in August 2022 at North Americaâs largest theatre festival: The 2022 Edmonton International Fringe Theatre festival. The showâs run was titled Plays By Bots and ran 7 performances over two weeks (see an image from the production on Fig. 4). In each show, different casts would act out one of the plays from the co-writing experiments. The plays span different genres, styles, characters, and storylines. The scripts were brought to life by a cast of 4-6 experienced improvisers and actors. The first-half of each script was given to each of the cast members in a sealed envelope. Only when the show began were they allowed to open the script, and then they commenced performance by reading it live in front of the audience. Once the script ran out, the actors improvised the ending, based on the context and story set out by the script[5]. During each showâs performance, the director and co-writer (participant p1 from above) introduced the project to the audience and explained that they co-wrote and edited the script using Dramatron.
There were two reviews written about the production of Plays By Bots at the festival. One of the reviews noted that the performance âproves that artificial intelligence can in fact write a hit Fringe playâ. The reviewer also noted that the success of the performance was due to both the Dramatron system and the human actors, especially one performer who âmastered Dramatronâs voice and seamlessly took it off-script for the remainder of the show, much to the delight of the howling audienceâ. The second review was also positive. With a hint of incredulity, the reviewer complimented the abilities of Dramatron. The reviewer noted the style of Dramatron, and how that served the performance saying âif thereâs a certain flatness in the dialogue, which runs to declarations, that in itself is amusing since it turned out to be perfectly suited to the deadpan comic talents of [the] improvisers,â and âthe human actors continue to capture the playwright botâs toneâ. The reviewer also expressed surprise at the ability of the system to create a play that hangs together and creates a world. They further noted that some lines from Dramatron are so funny they were reprised later in the show once the human actors were improvising.
Discussions amongst the creative team compliment the reviewers and provide insights on how professional actors and improvisers found working with scripts co-written by Dramatron. Post-show discussions were facilitated and relayed to us by the director (p1 above). Four key themes emerged through these discussions which echo the themes presented earlier in Section 5. Specifically, the system has a distinct glitch style, generated text can be repetitive and fun to work with. As well, the team attributed agency to the system, and had expectations of the systems capabilities. As trained improvisational theatre performers, the actors were able to add a layer of interpretation to the co-written script. This helped add meaning to the text. Finally, the prevailing feedback from the creative team was that participating in the production was fun! Enthusiasm and reflections from the creative team echo the usefulness of co-written scripts for theatre production and collaboration; more reflections and supporting quotes are included in Appendix B.
This paper is available on arxiv under CC 4.0 license.
[5] Video of performance shared upon acceptance.