Hello World is a weekly newsletterâdelivered every Saturday morningâthat goes deep into our original reporting and the questions we put to big thinkers in the field. Browse the archive here.
Hi, Iâm Aaron Sankin, a reporter at The Markup. Iâm here to talk about how if you do something boring (reading the documents where corporations talk about what they can do with your data), you can then do something fun (get really mad online).
Over the past quarter-century, privacy policiesâthe lengthy, dense legal language you quickly scroll through before mindlessly hitting âagreeââhave grown both longer and denser. A study released last year found that not only did the average length of a privacy policy quadruple between 1996 and 2021, they also became considerably more difficult to understand.
âAnalyzing the content of privacy policies, we identify several concerning trends, including the increasing use of location data, increasing use of implicitly collected data, lack of meaningful choice, lack of effective notification of privacy policy changes, increasing data sharing with unnamed third parties, and lack of specific information about security and privacy measures,â wrote De Montfort University Associate Professor Isabel Wagner, who used machine learning to analyze some 50,000 website privacy policies for the study.
While machine learning can be a useful tool in understanding the universe of privacy policies, its presence inside of a privacy policy can set off a firestorm. Case in point: Zoom.
Earlier this week, Zoom, the popular web-conferencing service that became ubiquitous when pandemic lockdowns shifted many in-person meetings to in-little-boxes-on-laptop-screen meetings, was the subject of sharp criticism from users and privacy advocates recently, when an article from the technology news site Stack Diary highlighted a section of the companyâs terms of service that said it could use data collected from its users to train artificial intelligence.
The user agreement stated that Zoom users gave the company âa perpetual, worldwide, non-exclusive, royalty-free, sublicensable, and transferable licenseâ to use âcustomer contentâ for a list of purposes, including âmachine learning, artificial intelligence, training, [and] testing.â This section did not state that users first had to give explicit consent for the company to do so.
A company secretly using someoneâs data to train an artificial intelligence model is particularly contentious at the moment. The use of AI to replace flesh-and-blood actors and writers is a major sticking point of the ongoing strikes that have ground Hollywood to a halt. OpenAI, the company behind ChatGPT, has been hit with a wave of lawsuits accusing the firm of training its systems on the work of writers without their consent. Companies like Stack Overflow, Reddit, and whatever Elon Musk has decided Twitter is called today have also made aggressive moves to prevent AI companies from using their content to train models without themselves getting a piece of the action.
The online backlash against Zoom was fierce and immediate, with some organizations, like the news outlet Bellingcat, proclaiming their intention to no longer use Zoom for video-conferencing. Meredith Whittaker, president of the privacy-focused messaging app Signal, used the opportunity to advertise. âAHEM: @signalapp video calls work great, even in low bandwidth, and collect NO DATA ABOUT YOU OR WHO YOUâRE TAKING TO!! Another tangible, important way Signalâs real commitment to privacy is interrupting the voracious AI surveillance pipeline.â
Zoom, unsurprisingly, felt the need to respond.
Within hours of the story going viral on Monday, Zoom Chief Product Officer Smita Hashim published a blog post aimed at quelling fears about people getting their likeness and mannerisms uploaded into artificial intelligence models when theyâre virtually wishing their grandma a happy birthday from thousands of miles away.
âAs part of our commitment to transparency and user control, we are providing clarity on our approach to two essential aspects of our services: Zoomâs AI features and customer content sharing for product improvement purposes,â wrote Hashim. âOur goal is to enable Zoom account owners and administrators to have control over these features and decisions, and weâre here to shed light on how we do that and how that affects certain customer groups.â
Hashim wrote that Zoom updated its terms of service to give more context about the companyâs data usage policies. While the paragraph about Zoom having âa perpetual, worldwide, non-exclusive, royalty-free, sublicensable, and transferable licenseâ to use customer data for âmachine learning, artificial intelligence, training, [and] testingâ remained intact, a new sentence was added directly below: âNotwithstanding the above, Zoom will not use audio, video or chat Customer Content to train our artificial intelligence models without your consent.â
In the blog post, Hashim insisted that Zoom only employs user content to train AI for specific products, like a tool that automatically generates meeting summaries, and only after users explicitly opt in to use those products. âAn example of a machine learning service for which we need license and usage rights is our automated scanning of webinar invites / reminders to make sure that we arenât unwittingly being used to spam or defraud participants,â she wrote.
âThe customer owns the underlying webinar invite, and we are licensed to provide the service on top of that content. For AI, we do not use audio, video, or chat content for training our models without customer consent.â
Zoomâs privacy policy, a document separate from its terms of service, only mentions artificial intelligence or machine learning in the context of providing âintelligent features and products, such as Zoom IQ or other tools to recommend chat, email or other content.â
To get a sense of what all of this means, I talked to Jesse Wooâa data engineer at The Markup who previously helped write institutional data use policies as a privacy lawyer.
Woo explained that, while he can see why the language in Zoomâs terms of service touched a nerve, the sentimentâthat users allow the company to copy and use their contentâis actually pretty standard in these sorts of user agreements. The problem is that Zoomâs policy was written in a way where each of the rights being handed over to the company are specifically enumerated, which can feel like a lot. But thatâs also kind of just what happens when you use products or services in 2023âsorry, welcome to the future!
As a point of contrast, Woo pointed to the privacy policy of the competing video-conferencing service Webex, which reads: âWe will not monitor Content, except: (i) as needed to provide, support or improve the provision of the Services, (ii) investigate potential or suspected fraud, (iii) where instructed or permitted by you, or (iv) as otherwise required by law or to exercise or protect Our legal rights.â
That language feels a lot less scary, even though, as Woo noted, training AI models could likely be covered under a company taking steps to âsupport or improve the provision of the Services.â
The idea that people might get freaked out if the data they provide to a company for an obvious, straightforward purpose (like completing a video-conferencing call) is then used for another purpose (like training a machine learning algorithm) isnât new. A report published by the Future of Privacy Forum all the way back in 2018 warned that, âthe need for large amounts of data during development as âtraining dataâ creates consent concerns for individuals who might have agreed to provide personal data in a particular commercial or research context, without understanding or expecting it to be further used for new algorithmic design and development.â
For Woo, the big-picture takeaway is that, under the language of the original terms of service, Zoom could have used whatever user data it wanted for training AI without asking for consent and faced essentially no legal risk in the process.
âAll the risk they have faced in this fiasco has been reputational, and the only recourse for users is to pick another video-conference service,â Woo explained. âIf they had been smart, they would have used more circumspect, but still accurate, language while offering opt-out consent, which is sort of an illusion of choice for most people who donât exercise their opt-outs.â
âThey are currently bound by the restrictions they just put into their terms of service, but nothing stops them from changing it later,â he added.
Future malleability aside, there is something notable in a public outcry successfully getting a company to state on the record that it wonât do something creepy. This entire news cycle serves as a warning to others that training AI systems on customer data without getting consent could make many of those customers pretty peeved.
Zoomâs terms of service have mentioned the companyâs policy on artificial intelligence since March, but it only attracted widespread attention over the past week. That lag suggests that people may not be reading the increasingly long, increasingly dense legal treatises where companies detail what theyâre doing with your data.
Luckily, Woo, along with Markup Investigative Data Journalist Jon Keegan, recently published a handy guide showing how to read any privacy policy and quickly identify the important/creepy/enraging parts.
Happy reading,
Aaron Sankin
Investigative Reporter
The Markup
Credits: Aaron Sankin
Also published here
Photo by Agus Monteleone on Unsplash