People Actually Read Privacy Policies - Then This Happened Next

Hello World is a weekly newsletter—delivered every Saturday morning—that goes deep into our original reporting and the questions we put to big thinkers in the field. Browse the archive here.

Hi, I’m Aaron Sankin, a reporter at The Markup. I’m here to talk about how if you do something boring (reading the documents where corporations talk about what they can do with your data), you can then do something fun (get really mad online).

Over the past quarter-century, privacy policies—the lengthy, dense legal language you quickly scroll through before mindlessly hitting “agree”—have grown both longer and denser. A study released last year found that not only did the average length of a privacy policy quadruple between 1996 and 2021, they also became considerably more difficult to understand.

“Analyzing the content of privacy policies, we identify several concerning trends, including the increasing use of location data, increasing use of implicitly collected data, lack of meaningful choice, lack of effective notification of privacy policy changes, increasing data sharing with unnamed third parties, and lack of specific information about security and privacy measures,” wrote De Montfort University Associate Professor Isabel Wagner, who used machine learning to analyze some 50,000 website privacy policies for the study.

While machine learning can be a useful tool in understanding the universe of privacy policies, its presence inside of a privacy policy can set off a firestorm. Case in point: Zoom.

Earlier this week, Zoom, the popular web-conferencing service that became ubiquitous when pandemic lockdowns shifted many in-person meetings to in-little-boxes-on-laptop-screen meetings, was the subject of sharp criticism from users and privacy advocates recently, when an article from the technology news site Stack Diary highlighted a section of the company’s terms of service that said it could use data collected from its users to train artificial intelligence.

The user agreement stated that Zoom users gave the company “a perpetual, worldwide, non-exclusive, royalty-free, sublicensable, and transferable license” to use “customer content” for a list of purposes, including “machine learning, artificial intelligence, training, [and] testing.” This section did not state that users first had to give explicit consent for the company to do so.

A company secretly using someone’s data to train an artificial intelligence model is particularly contentious at the moment. The use of AI to replace flesh-and-blood actors and writers is a major sticking point of the ongoing strikes that have ground Hollywood to a halt. OpenAI, the company behind ChatGPT, has been hit with a wave of lawsuits accusing the firm of training its systems on the work of writers without their consent. Companies like Stack Overflow, Reddit, and whatever Elon Musk has decided Twitter is called today have also made aggressive moves to prevent AI companies from using their content to train models without themselves getting a piece of the action.

The online backlash against Zoom was fierce and immediate, with some organizations, like the news outlet Bellingcat, proclaiming their intention to no longer use Zoom for video-conferencing. Meredith Whittaker, president of the privacy-focused messaging app Signal, used the opportunity to advertise. “AHEM: @signalapp video calls work great, even in low bandwidth, and collect NO DATA ABOUT YOU OR WHO YOU’RE TAKING TO!! Another tangible, important way Signal’s real commitment to privacy is interrupting the voracious AI surveillance pipeline.”

Zoom, unsurprisingly, felt the need to respond.

Within hours of the story going viral on Monday, Zoom Chief Product Officer Smita Hashim published a blog post aimed at quelling fears about people getting their likeness and mannerisms uploaded into artificial intelligence models when they’re virtually wishing their grandma a happy birthday from thousands of miles away.

“As part of our commitment to transparency and user control, we are providing clarity on our approach to two essential aspects of our services: Zoom’s AI features and customer content sharing for product improvement purposes,” wrote Hashim. “Our goal is to enable Zoom account owners and administrators to have control over these features and decisions, and we’re here to shed light on how we do that and how that affects certain customer groups.”

Hashim wrote that Zoom updated its terms of service to give more context about the company’s data usage policies. While the paragraph about Zoom having “a perpetual, worldwide, non-exclusive, royalty-free, sublicensable, and transferable license” to use customer data for “machine learning, artificial intelligence, training, [and] testing” remained intact, a new sentence was added directly below: “Notwithstanding the above, Zoom will not use audio, video or chat Customer Content to train our artificial intelligence models without your consent.”

In the blog post, Hashim insisted that Zoom only employs user content to train AI for specific products, like a tool that automatically generates meeting summaries, and only after users explicitly opt in to use those products. “An example of a machine learning service for which we need license and usage rights is our automated scanning of webinar invites / reminders to make sure that we aren’t unwittingly being used to spam or defraud participants,” she wrote.

“The customer owns the underlying webinar invite, and we are licensed to provide the service on top of that content. For AI, we do not use audio, video, or chat content for training our models without customer consent.”

Zoom’s privacy policy, a document separate from its terms of service, only mentions artificial intelligence or machine learning in the context of providing “intelligent features and products, such as Zoom IQ or other tools to recommend chat, email or other content.”

To get a sense of what all of this means, I talked to Jesse Woo—a data engineer at The Markup who previously helped write institutional data use policies as a privacy lawyer.

Woo explained that, while he can see why the language in Zoom’s terms of service touched a nerve, the sentiment—that users allow the company to copy and use their content—is actually pretty standard in these sorts of user agreements. The problem is that Zoom’s policy was written in a way where each of the rights being handed over to the company are specifically enumerated, which can feel like a lot. But that’s also kind of just what happens when you use products or services in 2023—sorry, welcome to the future!

As a point of contrast, Woo pointed to the privacy policy of the competing video-conferencing service Webex, which reads: “We will not monitor Content, except: (i) as needed to provide, support or improve the provision of the Services, (ii) investigate potential or suspected fraud, (iii) where instructed or permitted by you, or (iv) as otherwise required by law or to exercise or protect Our legal rights.”

That language feels a lot less scary, even though, as Woo noted, training AI models could likely be covered under a company taking steps to “support or improve the provision of the Services.”

The idea that people might get freaked out if the data they provide to a company for an obvious, straightforward purpose (like completing a video-conferencing call) is then used for another purpose (like training a machine learning algorithm) isn’t new. A report published by the Future of Privacy Forum all the way back in 2018 warned that, “the need for large amounts of data during development as ‘training data’ creates consent concerns for individuals who might have agreed to provide personal data in a particular commercial or research context, without understanding or expecting it to be further used for new algorithmic design and development.”

For Woo, the big-picture takeaway is that, under the language of the original terms of service, Zoom could have used whatever user data it wanted for training AI without asking for consent and faced essentially no legal risk in the process.

“All the risk they have faced in this fiasco has been reputational, and the only recourse for users is to pick another video-conference service,” Woo explained. “If they had been smart, they would have used more circumspect, but still accurate, language while offering opt-out consent, which is sort of an illusion of choice for most people who don’t exercise their opt-outs.”

“They are currently bound by the restrictions they just put into their terms of service, but nothing stops them from changing it later,” he added.

Future malleability aside, there is something notable in a public outcry successfully getting a company to state on the record that it won’t do something creepy. This entire news cycle serves as a warning to others that training AI systems on customer data without getting consent could make many of those customers pretty peeved.

Zoom’s terms of service have mentioned the company’s policy on artificial intelligence since March, but it only attracted widespread attention over the past week. That lag suggests that people may not be reading the increasingly long, increasingly dense legal treatises where companies detail what they’re doing with your data.

Luckily, Woo, along with Markup Investigative Data Journalist Jon Keegan, recently published a handy guide showing how to read any privacy policy and quickly identify the important/creepy/enraging parts.

Happy reading,

Aaron Sankin

Investigative Reporter

The Markup

Credits: Aaron Sankin

Also published here

Photo by Agus Monteleone on Unsplash