Removing ad trackers and cookies - the technical perspective
Sentry recently completed a multi-month project to remove all non-essential cookies and trackers from our public websites. For more context, see two blog posts that offer differing perspectives on the project: one from our marketing team, another from our legal team, and a third blog post that explains our privacy values and our ultimate motivation. Today, we are going to focus on the more technical side of this project; how we identified and removed cookies and trackers, the difficulties (both expected and unexpected), and what we wish we knew before we started.
Why Are We Doing This?
Anyone who’s used the internet has been presented with a popup informing the user that the site they’re visiting uses cookies. For many of us, this popup is a mere annoyance, but less discussed is the fact that cookies introduce security risks. The individual cookies themselves might not represent a high-security risk, but in aggregate, they create compounding liabilities and more surface area for mayhem. Third-party advertising cookies provide benefits, of course, but if hosting them comes with the price of reduced security (not to mention user privacy and annoyance), we asked ourselves if the benefit of these cookies truly outweigh their cost. Different companies serving different markets will have their own answers to this question. But at Sentry, from the leadership on down, it was a clear No; removing advertising cookies and trackers aligned with our corporate privacy values. And for the security team, it’s quite straightforward: we want the Sentry experience to be as secure as possible, so this was a challenge we enthusiastically accepted.
Hosting third-party cookies typically means you also have third-party scripts, and from a security standpoint, those additional scripts also carry risks. Scripts and cookies, it turns out, make sites vulnerable to all sorts of attacks like malicious script injections (Cross-site scripting (XSS), Cross-site Request Forgery (CSRF), just to name a few). In a perfect world, scripts that are added to your sites would go through rigorous security reviews to ensure there are no vulnerabilities, but it’s impossible for most companies to actually review every single line of code. Not to mention any and all future updates to boot.
Removing even one third-party script on your website, and the cookies associated with it, means one less thing that the security team has to worry about.
What’s Actually on Our Website?
Before we started removing cookies, we had to figure out exactly which cookies were on our website. The relative complexity of figuring out which cookies are on your site is directly related to the complexity of your site. For small sites, with only a handful of pages, this isn’t terribly difficult. However, things get tricky when you have a site that was built years ago, contains multiple subdomains, thousands of pages, and is managed by multiple teams across an organization.
Challenge #1: What Cookies are on Our Website?
We started by inspecting the Chrome developer console. You can see the cookies on a page by going to the Application
tab and looking for Cookies
under the Storage
section. Because many modern browsers block them by default, you’ll have to Enable Third-Party Cookies
in your browsers in order to see what’s there.
Unfortunately, following the steps above won’t display a comprehensive list of all of the cookies on your site. What you see in this console are the cookies on that single page. In order to identify all of the cookies across your entire website, you will need to go through every single page under every subdomain and inspect them individually. A blog post from four years ago, for instance, may have an embedded YouTube video that drops cookies. In other instances, there may also be cookies that won’t drop until they’re clicked.
Another important thing to note is that “cookies” is just a broader term that refers to tracking technology. It is also possible to track users with items stored in the browser's local storage and tracking pixels. They are a lot harder to locate and identify than cookies, but still count as tracking technology. Simply put, it’s important to look for all tracking technology, not just cookies.
It was at this point that we realized going through our site page by page was not scalable and that we would have to employ specialized tools.
The Good and the Bad of Cookie Scanners
A Google search for “cookie scanning” will return at least a dozen different cookie scanners. We tried a few, both free and paid, and while they all “worked,” none of them were perfect.
Most of the scanners we tried were pretty straightforward: you enter a URL, the tool scans it, and you’re given a list of the cookies that were found. This is much more efficient than going to the developer console on a browser, and a good way to get a quick idea of the cookies on your websites. But that’s also where their limits begin to show; they only give you a surface-level understanding of the cookies on your sites.
While scanners have their advantages, there were three main problems we encountered. First, they don’t scan every page on your site. Some of them only scan the single page of the URL you provide; while others will scan a few extra pages automatically. One of the paid scanners we tried will automatically crawl your site and look for all the pages under the domain, but they have a maximum number of pages they will scan.
This is emblematic of the inconsistency we experienced when trying to find the right tool. For instance, the results we received for a single page were often different depending on which scanner we used, and to make things worse, some of the scanners even have disclaimers pointing out that the results of the tool may not be 100% accurate. This eroded our trust in scanners altogether and made us question if they were worth the effort, to say nothing of the expense.
And lastly, the scanners we reviewed only identified cookies, and not trackers in local storage or tracking pixels. So even if we did have a reliable tool for scanning cookies, we’d still fall short of our goal of removing all the tracking technologies that are on our sites.
Taking a Step Back: What’s on Your Website?
After spending some time working with cookie scanners, we decided to delve into what elements drop cookies on web pages. Namely, scripts. Instead of only looking for cookies, we broadened our attention to the many scripts that are on our site, especially third-party scripts, to check if they were also dropping cookies and tracking users. To find out which scripts were running on our sites, we used Content-Security-Policy (CSP), a feature that most modern browsers support.
Mozilla gives a good explanation of CSP:
Content Security Policy (CSP) is an added layer of security that helps to detect and mitigate certain types of attacks, including Cross-Site Scripting (XSS) and data injection attacks. These attacks are used for everything from data theft to site defacement to malware distribution.
CSP is designed to be fully backward compatible. Browsers that don’t support it still work with servers that implement it, and vice versa: browsers that don’t support CSP ignore it, functioning as usual, defaulting to the standard same-origin policy for web content. If the site doesn’t offer the CSP header, browsers likewise use the standard same-origin policy.
When it comes to detecting scripts, CSP has a reporting feature that alerts website admins when items violate the policy, and if the CSP policy is set to only self
for everything, you will receive reports on everything on your site that’s not originating from the site URL, aka third-party scripts. Here’s an example of the CSP setting:
Content-Security-Policy: default-src 'self'; report-uri csp-report.example.com
One of the major benefits of CSP is that you don’t need to click through all the pages on your site to test them or use third-party scanning software; every user who visits your site is essentially helping you collect the information. When we started this project, we were averaging around 200k page visits on our site every week, and with that, we were able to put together a fairly comprehensive picture of which scripts were on our sites in a matter of a few days.
A few things to note here:
The
report-uri
/report-to
value will have to be set in order for the report to be sent.You will likely want to use the Report Only mode otherwise your site will likely stop working.
⚠️ Be aware that with Report Only mode enabled, CSP won’t be able to block malicious scripts from loading, which is what CSP is designed for, so we don’t suggest using report-only mode other than for testing and collecting the information you need.
You will receive reports on items that are based off of different domains, which may include first-party scripts that are hosted on a different domain, e.g. your CDN.
You will NOT receive reports on third-party scripts that are loading from your site URL, such as self-hosted scripts or things behind your proxy. If you want to collect information on those, you can do
default-src none
, but from our experience that creates way too much noise.The duration of time you should monitor will depend on the number of active users you have, the fewer visitors you have the longer you’ll want to monitor in order to receive a more accurate report.
CSP will operate on the client side, which means it is possible that browser extensions or custom scripts that a user has on their machine might trigger violations too. For websites that have a good number of visitors, these usually get diluted and become unnoticeable.
For the report destination, Sentry supports security policy reporting; if you set the report-uri
to your Sentry Project DSN, you will start receiving some CSP reports similar to the following:
Sentry then collects the CSP reports and aggregates them. Doing so helps teams glean valuable insights from the data. The same CSP violations will then be grouped into one, and event data will be available. If you click on each issue, you will also get detailed information on what directive the scripts in question violate, on which pages the violations occurred, how many times they happened, and how many users were impacted, along with a host of other details.
Using this method, you should be able to identify most, if not all, third-party scripts on your websites. And by analyzing this data, you will also be able to put together a good picture of the cookies and trackers that are being dropped by third-party scripts.
📢 Kudos to Alex Tarasov on our team who put a lot of effort into updating the Sentry security policy reporting feature and making it a much more mature and usable feature as we went through the cookieless project!
Cookie Removal
Figuring out which cookies were on our site was the first part of this project. Our next challenge was figuring out which cookies should be kept and which should be discarded. The fact that there are no right or wrong answers in this regard made this particular task even more challenging.
Challenge #2: What Should Go and What Can Stay?
Since everyone has different requirements and does things differently, we are not in the position to tell folks what they should or shouldn’t have on their websites. That said, there are a few things that we think are worth diving into.
While our primary motivation for this effort was our privacy by default values, we also think the ubiquitous cookie banner makes for a rotten user experience, so we decided to take a more draconian approach and get rid of anything that is not considered strictly-necessary. We had a simple litmus for whether a cookie should stay or not; if our website could function properly without it, then we didn’t need it. We also had close communication with our legal team to make sure the approach we took complied with different privacy laws like the European privacy laws, including the General Data Protection Regulation (GDPR) and the ePrivacy Directive, and United States privacy laws, including the California Consumer Privacy Act (CCPA).
⚠️ Close collaboration with the legal team was essential to our success; anyone pursuing a similar project should involve their legal team or outside counsel at the onset, and stay in sync for the duration.
There are three main ways we remediated the cookies:
Remove the cookie entirely. This is pretty straightforward; in these instances we removed the script or embedded code that dropped the cookie. This is how we handled most of the scripts we found.
Disable tracking in the tools. Some tools have a do not track feature that you can utilize; for example in Vimeo, you can set
dnt
(do not track) flag to true and Vimeo will not drop cookies that track users. There are also some tools that have the disable tracking option in their settings page.
⚠️ YouTube also has youtube-nocookie.com
, but it only prevents cookies from being dropped when the page loads. Tracking cookies are still dropped when the user clicks on the video.
Use a Privacy-Centric Tool. Once GDPR came on the scene, more and more privacy-centric tools appeared that do not track users or sell user data. Using a privacy-centric tool as an alternative will always be a good option. Our marketing team covered why and how we replaced Google Analytics with a privacy-centric tool in their post about removing cookies.
Looking back, it may seem simple enough, however it took us more than six months of working with a cross-functional team to go through all the cookies and review, remove, or configure in the correct way.
Monitoring and Enforcement
Once we removed the cookies we neither wanted nor needed, the next challenge was figuring out how to keep it that way. In an ideal world, all the people who maintain your sites will be fully aware of what cookies and trackers are, how they work, and know not to drop them on your sites without approval. Furthermore, third-party scripts will always work as expected and not drop any cookies when they are not supposed to. However, we don’t live in a perfect world. We need ways to ensure things are kept nice and tidy, and in compliance with our own cookie policy, not to mention requirements under privacy laws.
We decided to tackle this challenge in two ways, by proactively blocking new items from being added, and by continuously monitoring for unapproved cookies. This approach of redundancy means that even if one method fails, the other should be able to catch it.
Challenge #3: How to Prevent New Cookies from Dropping
There are many tools out there that can help you manage what’s on your sites, but we decided to go back to CSP as it’s simple to use and is natively supported by all modern browsers.
We won’t go into the details on how to configure CSP, but there are a lot of great articles out there that explain how to do it. That said, if you were using CSP to identify what’s on your website as we did, you should now have a good idea of what your CSP should look like and what needs to be allow-listed. Starting from there you can simply add default-src 'none'
to block anything that’s not allow-listed and prevent the use of *
to not allow all in any directives. Again, using report-only
mode is suggested before actually applying it to production, as CSP can easily break your site if misconfigured.
One thing to note is that access to your CSP needs to be protected as well, otherwise, it’s possible for someone to modify the CSP and allow-list their non-approved script without you knowing it. In our case, we kept our code on GitHub, and we tightened the access by using codeowners and branch protection rules. We added the security team as the owner for the CSP config files and require approval before merging. This way we will be notified and aware of any upcoming changes that are made to the CSP before they are applied.
With CSP locked down, even if someone accidentally added scripts on your site that drop cookies, the script will be blocked on the client side and no cookies or trackers will be dropped.
Challenge #4: Monitor for Unapproved Cookies
While we were working on this project, we noticed that not all third-party scripts are perfect. It is possible that one starts dropping cookies out of nowhere, or they claimed they wouldn’t drop cookies, yet they still would. Keep an eye out for unapproved items that slipped through the cracks, but also verify that approved vendors are not dropping unexpected cookies.
As a safety net, we decided to bring in a cookie scanner to help. As we already covered, we did not find a cookie scanner that perfectly fit our use case when our goal was initial diagnostics; but with the narrow task of catching rogues and stragglers, we created some custom scripts that worked with the least-worst cookie scanner to fill the gaps.
The tool we brought in can simulate users browsing in a sandbox and keep track of what scripts and cookies were found, however, it does not have the capability to crawl a site; it only scans the single URL you provided. To make it fit our needs, we wrote a Python script that aggregates the pages from several sitemaps of ours and sends them to the tool page by page for scan using their API. It then retrieves the results from it and compares the found cookies with the known cookies list. All the logs are sent to our SIEM and if unknown cookies are found, an alert will be triggered.
There are two limitations to the cookie scanners, even with the help of the scripts we wrote. First, they don’t provide real-time monitoring, the scans run off of a schedule and they don’t actively monitor your sites in real-time. This means that if you rely on cookie scanners as your only defense, there will be a time gap between when the cookie is dropped and when it’s discovered.
The second limitation, which is related to the first one, is around how often the scans can run. From our experience, there are two ways cookie scanners charge: the first type, charge based on the number of sites you want to scan, and they will scan it on a fixed cadence, such as monthly or weekly (these usually have a max number of pages it will scan as well). The other type, charges based on the number of scans performed in a particular period of time, like 10k page scans per month. Both payment models will limit how soon you can discover a cookie after they are dropped. Ideally, the more frequent the better, but based on budgets and the number of pages you have, you will need to find a balance that fits your needs.
With the combination of CSP and cookie scanners, we felt confident that unapproved cookies and trackers will not be dropped accidentally, and even if they do, we will be able to identify them in a reasonably timely manner.
What’s next?
The journey was challenging, but also fun for us. This is an ongoing project as we need to continuously monitor cookies, but the hardest part is over.
For us, we have some items in our to-do list that will tie up some loose ends, like get rid of Google Tag Manager and embed required scripts directly; and lock down our CSP and avoid using things like unsafe-inline
, which will all lower the risk of accidentally dropping a cookie in the future.
Cookie Bounty!
And to bring it one step further, we are launching a Cookie Bounty (like a bug bounty, but for cookies!). We invite everyone to help us fight along the way. If you find cookies on any of our public websites, you can report it to us and we will reward you for the finding!
If you want to know more about how to submit a report, what cookies we considered as essential, and all other details about the cookie bounty, you can visit https://sentry.io/cookiebounty/.
Overall, this was not an easy project, and from what we can gather it’s not a project that many people have chosen to undertake. That said, we’re glad we did it and proud of ourselves for what we’ve achieved to this point, and hope that others who are interested in taking a similar approach to security can learn from us. We are also eager to see more companies join us on the no cookie crusade, so feel free to reach out to us and share your thoughts and experience!