You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Browser fingerprinting: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>JJMC89
m (JJMC89 moved page Browser Fingerprinting to Browser fingerprinting without leaving a redirect)
 
imported>SCherukuwada
(INCOMPLETE: Added some background and introductory information.)
 
Line 1: Line 1:
== Introduction ==
Browser fingerprinting refers to a collection of different methods that enable the identification of an instance of an internet browser. This is used by ad networks and anti-abuse mechanisms among others to identify multiple artifacts of online activity as originating from the same browser.
=== Why is this needed? ===
Web services that allow logged-in users can attribute all activity performed by a user to their respective user account. In an abuse-fighting scenario where a user performs an action either discouraged or prohibited by the platform, it is fairly straightforward for the service to simply block the account or otherwise send a targeted warning to the user in question. Similarly if logged-in user for, say, a video service watches a few videos in a given sequence, it is fairly straightforward to attribute a sequence of views to the user in question and then accordingly recommend videos that fit their viewing pattern.
All of the above tends to become somewhat more difficult to accomplish if the service allows actions from users who are not logged in. One popular alternative is to simply set an HTTP Cookie on the browser performing an action without logging in so that subsequent actions can be attributed to that cookie and consequently a single user. These cookies can be trivially purged by the user of the browser thereby thwarting any attempt at trying to attribute actions to a single user. Ad networks have historically used cookies ([[:en:HTTP_cookie#Third-party_cookie|third-party cookies]] to be precise) to ensure that they can track the ads displayed to a given user on different sites as they browse the web. Given that some prominent browser makers have decided to slowly phase out support for third-party cookies, alternative methods of identifying users have been gaining ground.
=== What's a browser fingerprint? ===
Since the very early days of HTTP, browsers have identified themselves to HTTP servers through the <code>User-Agent</code> header. This is usually a string along the lines of <code>Mozilla/5.0 (Linux; Android 10; Google Pixel 4 Build/QD1A.190821.014.C2; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/78.0.3904.108 Mobile Safari/537.36</code> that reveals a certain amount of information about the browser and the device it is running on. While this does vary across different users, it is far from being unique over a large number of users. However, other bits of information about the browser can be obtained through Javascript code. For instance, the list of installed fonts, the screen resolution, the list of input devices, some low-level details around how Canvas rendering works, and supported sound capabilities can all be queried through Javascript code. While none of these pieces of information are individually unique, a combination of several of them can be sufficiently unique for most purposes, not unlike a fingerprint.
=== Caveats ===
It is important to note that as of this writing, major browser vendors are looking to limit the degree of accuracy that can be obtained through fingerprinting. Google's has a number of concerted efforts around reducing browser fingerprinting based on entropy budgets. Mozilla's analysis on the topic is [https://blog.mozilla.org/en/mozilla/news/google-privacy-budget-analysis/ here]. Client hints are covered [https://web.dev/user-agent-client-hints/ here].
== Entropy ==
== Entropy ==
This is an important concept to understand because it influences how unique a given attribute of a user is. The word "entropy" has a number of different meanings, so it's important to clarify that we're interested in something called [[:en:Entropy_(information_theory)|Information Entropy or Shannon's Entropy]].  
This is an important concept to understand because it influences how unique a given attribute of a user is. The word "entropy" has a number of different meanings, so it's important to clarify that we're interested in something called [[:en:Entropy_(information_theory)|Information Entropy or Shannon's Entropy]].  

Latest revision as of 12:01, 22 September 2022

Introduction

Browser fingerprinting refers to a collection of different methods that enable the identification of an instance of an internet browser. This is used by ad networks and anti-abuse mechanisms among others to identify multiple artifacts of online activity as originating from the same browser.

Why is this needed?

Web services that allow logged-in users can attribute all activity performed by a user to their respective user account. In an abuse-fighting scenario where a user performs an action either discouraged or prohibited by the platform, it is fairly straightforward for the service to simply block the account or otherwise send a targeted warning to the user in question. Similarly if logged-in user for, say, a video service watches a few videos in a given sequence, it is fairly straightforward to attribute a sequence of views to the user in question and then accordingly recommend videos that fit their viewing pattern.

All of the above tends to become somewhat more difficult to accomplish if the service allows actions from users who are not logged in. One popular alternative is to simply set an HTTP Cookie on the browser performing an action without logging in so that subsequent actions can be attributed to that cookie and consequently a single user. These cookies can be trivially purged by the user of the browser thereby thwarting any attempt at trying to attribute actions to a single user. Ad networks have historically used cookies (third-party cookies to be precise) to ensure that they can track the ads displayed to a given user on different sites as they browse the web. Given that some prominent browser makers have decided to slowly phase out support for third-party cookies, alternative methods of identifying users have been gaining ground.

What's a browser fingerprint?

Since the very early days of HTTP, browsers have identified themselves to HTTP servers through the User-Agent header. This is usually a string along the lines of Mozilla/5.0 (Linux; Android 10; Google Pixel 4 Build/QD1A.190821.014.C2; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/78.0.3904.108 Mobile Safari/537.36 that reveals a certain amount of information about the browser and the device it is running on. While this does vary across different users, it is far from being unique over a large number of users. However, other bits of information about the browser can be obtained through Javascript code. For instance, the list of installed fonts, the screen resolution, the list of input devices, some low-level details around how Canvas rendering works, and supported sound capabilities can all be queried through Javascript code. While none of these pieces of information are individually unique, a combination of several of them can be sufficiently unique for most purposes, not unlike a fingerprint.

Caveats

It is important to note that as of this writing, major browser vendors are looking to limit the degree of accuracy that can be obtained through fingerprinting. Google's has a number of concerted efforts around reducing browser fingerprinting based on entropy budgets. Mozilla's analysis on the topic is here. Client hints are covered here.

Entropy

This is an important concept to understand because it influences how unique a given attribute of a user is. The word "entropy" has a number of different meanings, so it's important to clarify that we're interested in something called Information Entropy or Shannon's Entropy.

Resources