paint-brush
Understanding the Privacy Risks of Popular Search Engine Advertising Systems: Backgroundby@browserology
141 reads

Understanding the Privacy Risks of Popular Search Engine Advertising Systems: Background

tldt arrow

Too Long; Didn't Read

A new study finds that privacy-focused search engines fail to protect users’ privacy when clicking ads.
featured image - Understanding the Privacy Risks of Popular Search Engine Advertising Systems: Background
Browserology: Study & Science of Internet Browsers HackerNoon profile picture

This paper is available on arxiv under CC0 1.0 DEED license.

Authors:

(1) Salim Chouaki, LIX, CNRS, Inria, Ecole Polytechnique, Institut Polytechnique de Paris;

(2) Oana Goga, LIX, CNRS, Inria, Ecole Polytechnique, Institut Polytechnique de Paris;

(3) Hamed Haddadi, Imperial College London, Brave Software;

(4) Peter Snyder, Brave Software.

2 BACKGROUND

This section briefly discusses the policies of the main search engines alongside popular tracking approaches.

2.1 Private search engines

We study the two dominant search engines that rely on user tracking for personalized search results and advertisements, namely Google and Bing, and three of the most popular privacy-branded search engines that provide users with nonpersonalized results and ads: DuckDuckGo, StartPage, and Qwant [11, 29]. Private search engines can either build their own independent search indexes or use big tech search engines like Bing, Google, or Yahoo to provide search results. Both types of private search engines claim not to store users’ search histories and not to collect nor share tracking and personal data. We now describe the advertising systems employed by the different private search engines and present a summary of their data-sharing policies outlined in their respective About pages.


DuckDuckGo is a standalone search engine that maintains and uses its own search index alongside other indexes, such as Bing’s, to provide search results [31]. DuckDuckGo relies on Microsoft’s advertising system but only serves ads based on the search results and not the behavioral profiles of users [30]:


"search ads on DuckDuckGo are based on the search results page you’re viewing instead of being based on you as a person"


When clicking an ad on DuckDuckGo, the user is redirected to the ad’s landing page through Microsoft Advertising’s platform. DuckDuckGo claims Microsoft does not store ad-click behaviors from DuckDuckGo for purposes other than accounting and does not associate ad-clicks with users’ profiles [18]:


"When you click on a Microsoft-provided ad that appears on DuckDuckGo, Microsoft Advertising does not associate your adclick behavior with a user profile. It also does not store or share that information other than for accounting purposes."


This implies that Microsoft can, though currently chooses not to, link the ad-click to an existing Microsoft user profile. The privacy policy is signed by both DuckDuckGo and Microsoft.


Qwant is a standalone EU-based search engine that allows users to access online resources without being tracked nor profiled [32]. Qwant relies on Microsoft’s advertising system to deliver ads in their search results pages. Although Qwant reports transmitting some information concerning search queries to Microsoft to enable the latter to present pertinent advertisements, it remains unclear which specific information is shared. In addition, to detect fraud, Qwant uses a specialized service offered by Microsoft, which has access to the user’s IP address and the browser "User-Agent". Qwant assures that this service does not have access to the search query, which is sent to another service that does not know the IP address of the user [32].


Unlike DuckDuckGo, which also uses Microsoft advertising, we did not find any mention to ad click information on Qwant’s privacy policy. They do not mention whether Microsoft stores this data and for what purposes they use it.


StartPage is a meta-search engine that allows users to obtain non-personalized search results from Google’s search index while protecting their privacy. StartPage relies on Google AdSense to show ads to users. According to StartPage’s privacy policy, the search engine serves strictly non-personalized ads since it does not share any identifiable information with Google. Therefore, ads displayed on the search results page are solely based on the user’s search query [38].


Regarding ad-click behavior data, the privacy policy does not make any reference to whether Google tracks or profiles users based on this information. Nevertheless, StartPage emphasizes that by clicking on an ad, users leave the protection of StartPage’s privacy policies and become subject to the practices of the website they are redirected to [37].


"By clicking on an ad, like any other external website you click on after performing a StartPage search, you leave the privacy protection of StartPage and are subject to those websites’ data collection policies."

2.2 Cross-site tracking

Cross-site tracking refers to the practice of following a user across multiple first-party websites and associate their browsing activities to a unique identifier. Web tracking practices require first party websites (e.g. the content providers) to share data about a user’s activity with third parties (the trackers). Online tracking has been traditionally implemented through browser cookies. However, due to increasing adoption of cookie-blocking browsers and extensions, and the push on adopting partitioned cookies storage on web browsers, more and more trackers started to rely on navigational tracking techniques. We next discuss how these techniques work.


2.2.1 Cookie tracking. To enable cross-site cookie tracking, whenever a user visits a first-party website, the website makes a request to the third-party website (the tracker). This allows the tracker to set a cookie, which will identify the user and will be associated with the browsing activity of the user. For example, when the user visits a website A that makes a request to the tracker T, the tracker associates the cookie identifier of the user with the fact that the user visited website A (see Figure 1). Later, when the user visits website B, which also makes a request to the tracker T, the tracker will be able to associate the cookie identifier of the user with the fact that the user visited website B. Hence, the tracker will be able to know that the user visited websites A and B.


This was initially possible because browsers had a common cookie storage containing all cookies, and trackers could read their corresponding cookies regardless of which firstparty website allowed the tracker cookie to be set (see Figure 1). However, several browsers, such as Safari, Firefox, and Brave, have implemented partitioned storage to prevent using cookies for cross-site tracking [33]. These browsers use a partitioned cookies storage with a hierarchical namespace where a tracker accesses a different storage area on each website that loads it, preventing trackers from matching or assigning the same identifiers to users across multiple websites. Hence, cross-site tracking based on cookies can no longer be performed on these browsers. Chrome -the most used web browser- is in the process of testing partitioned cookies storage but does not use it by default [27, 28].


2.2.2 Navigational tracking. Navigational tracking refers to tracking techniques that use one or more URL navigations to share user information across sites. Navigational tracking does not require third-party cookies and can be used to circumvent browsers’ privacy protections from cross-site tracking using partitioned cookies storage.


Figure 1: Cookie tracking in flat vs. partitioned cookies storage.


Bounce tracking is a navigational tracking technique that refers to redirecting users through one or more redirectors when navigating from one website to another. To allow this, a website A containing links to another website B does not directly link to the target B but instead links to an intermediary redirector (R)–the tracker (see Figure 2). When users click on a link on website A, they are taken to the redirector first, which then redirects them to the intended destination (website B) or other intermediary redirectors. The website A can directly change the actual link of the destination (b.com) to a redirection link (r.com), or a redirector’s third-party script can do it. On its turn, the redirector can change the destination link again and send it further to other redirectors.

Hence, from the link in the ad on the website A, one cannot know all the different redirectors the users will pass through when they click on an ad. We call the redirection path all the websites a user navigates through to arrive from A to B. Since, from a browser perspective, the redirector is the first-party domain, it can read or set cookies in its own partition [26]. In the following, we describe what data redirectors can infer according to the redirector’s behavior.


(1) If the redirector does not set a first-party cookie, it will only know that a user went from website A to website B and will not be able to link this to other user browsing activities.


(2) If the redirector sets a first-party cookie, it will be able to aggregate all the activity of the user that is redirected through it (either from website A or other websites that use it as a redirector), hence, it will allow cross-site tracking.


(3) If the redirector also sets third-party cookies on websites A and B, it will not be able to link the activity of the user on website A with the activity of the user on website B, and with the activity of the user that goes through its own site (through redirects) since they do not share the same user ID [33]. Hence, while bounce tracking allows to a certain degree, cross-site tracking, it does not have the same coverage as the traditional third-party cookie tracking.


UID smuggling is a navigational tracking technique that modifies users’ navigation requests by adding information to the navigation URLs in the form of query parameters. In addition, similar to bounce tracking, UID smuggling may redirect the user to one or more third-party trackers before

redirecting the user to the intended destination. Figure 3 describes this process. When a user clicks on a link on a website A, the originator page itself or a tracker on the page–through a script–decorates the URL by adding the originator’s user identifier (UID) as a query parameter. The user then passes through zero or more redirectors which are invisible to him. Each of these redirectors can get the UID from the query parameter and has permission to store it in a first-party cookie under the redirector’s domain. Finally, the user is sent to the destination website B, and the redirector can forward or not to website B the UID it received from A. All the trackers on website B will be able to read the UID from the query parameter and know that it was the UID sent by the originator (through request headers).


UID smuggling is more powerful than bounce tracking. Trackers using UID smuggling regain the ability to share UIDs across websites with different domains and can circumvent restrictions from partitioned cookie storage spaces [33]. For example, they can link the user’s visits to the website A

with the user’s visits to website B and the user’s activity that goes through its site (through redirects) since they can all be linked to the same user ID. In addition, UID smuggling can help other trackers on website B (and website A) to link users’ browsing activity across all the websites that received the UID as a query parameter.