有另一种方法可以跟踪用户而不使用cookies或Javascript。这种方法 已经被用于无数网站,但几乎没人人们知道。本页解释这种方法是如何运作的,以及是如何保护你自己的。
这种跟踪方式 无需使用:
- Cookies
- Javascript
- LocalStorage/SessionStorage/GlobalStorage
- Flash、Java或其他组件
- 你的IP地址或是用户代理字符串(User Agent String )
- 任何Panopticlick所使用的方式
相反,它使用另一种存储方式,在浏览器重启的时候也可以保持持久性,这就是: caching.
即使你完全禁用cookies、关闭Javascript功能并且使用VPN,这种技术依然可以追踪到你。
在线示例:http://lucb1e.com/rp/cookielesscookies/
我们继续,输入一些东西然后保存。接着关闭你的浏览器再一次打开这个页面。它是不是还在那儿呢?
检查你的cookies,有没有?当然没有,因为它完完全全在一张假图片的校验和里,几乎没人能意识到。看到页面顶部右边的眼睛了吗?这就是我们的跟踪器。
![]() zaobao
|
本例的技术要点(以及bug)
为了描述它是如何不必使用Javascript就可以生效的,我必须找一些专属与你的信息,ETag除外。图片在页面加载完后加载,但只有图片包含 ETag。我要怎样在页面上显示实时信息呢?结果是在不动态刷新页面的情况下,我不能做到这一点,但动态刷新要用到Javascript,这又是我所想避 免的。
这个鸡生蛋,蛋生鸡的问题引出了一些bug:
– 所有能见的信息都是之前页面加载的。只有按F5可以看到新的数据。
– 当你浏览一个页面而你没有ETag的时(比如匿名模式),你的session会被清空。只有重载时才能再次看到。
我没见过有简单的方案可以解决这些问题。当然事在人为,但不像其他网站,而且我想尽量保持代码简单并贴近现实。
注意在你真的要追踪用户时,这些bug一般不会存在。因为你不会想让用户知道他们在被追踪。
源代码
什么项目没有源代码呢? 哦对,是微软的Windows。
https://github.com/lucb1e/cookielesscookies
我们该怎么去阻止它?
有一个方法我强烈推荐你做的就是, 任何时候你想更安全的浏览一个网页的话, 请开启一个隐私浏览窗口, 并仅使用https连接方式. 这么做能够单方面地消除形如 BREACH (最新的https攻击方式)攻击的影响, 禁止任何可能会产生的追踪cookie, 并且也能消除我在本页面所展示的缓存追踪的问题的影响, 我在网上购物的时候会使用隐私浏览模式. 在 Firefox 下(我想IE应该也是)快捷方式是 Ctrl+Shift+P, 在 Chrome 下则是 Ctrl+Shift+N.
除此以外, 这也取决于你的偏执级别.
由于缓存追踪实际上无法被侦测, 所以当前我还没有很直观的解决方法, 更因为缓存自身很有用(包括对你)并能够节省时间和金钱. 网站管理员们将会消耗更少的带宽(你可以想象一下, 终端用户最终将为这份因为更多的带宽而开具的账单埋单), 你的网页会更快加载, 尤其是在移动设备上会更加明显如果说你没有办理不限流量的4G套餐的话. 若你居住在有着高延迟, 低带宽的农村地区, 那就更糟糕了.
如果你非常疑神疑鬼,最好禁用所有缓存。这会阻止任何追踪的发生,但我个人认为得不偿失。
Firefox插件Self-Destructing Cookies,能在你不使用浏览器一段时间后,清空你的缓存。这也许是个不用禁用缓存的好选择;你 只会在访问时被追踪,但他们通过看哪个IP访问哪个页面的方法早就做到了这一点,所以这种方法很合算。以后任何访问都仿佛来自另一个用户,如果所有其他的追踪方式也已经被阻止。
我不知道任何一款插件可以定期删除缓存(比如每72小时一次),但也许有。这对99%的用户来说会是个好主意,因为它限制追踪功能的同时,对性能影响较小。
更新: 我听说Firefox插件SecretAgent也使用ETag盖写来防止这种追踪手段。如果你是根据域名来阻止追踪,可以添加白名单来重新开启缓存。这款插件可以阻止追踪的功能已经得到 确认。 SecretAgent的网站.
Cookieless cookies
There is another obscure way of tracking users without using cookies or even Javascript. It has already been used by numerous websites but few people know of it. This page explains how it works and how to protect yourself.
This tracking method works without needing to use:
- Cookies
- Javascript
- LocalStorage/SessionStorage/GlobalStorage
- Flash, Java or other plugins
- Your IP address or user agent string
- Any methods employed by Panopticlick
Instead it uses another type of storage that is persistent between browser restarts: caching.
Even when you disabled cookies entirely, have Javascript turned off and use a VPN service, this technique will still be able to track you.
Demonstration
As you read this, you have already been tagged. Sorry. The good news is that I don’t link your session id to any personally identifiable information. Here is everything I store about you right now:
Go ahead, type something and store it. Then close your browser and open this page again. Is it still there?
Check your cookies, is anything there? Nope, it’s all in a fake image checksum that almost noone is aware of. Saw that eye on the right top of the page? That’s our tracker.
So how does this work?
This is a general overview:
The ETag shown in the image is a sort of checksum. When the image changes, the checksum changes. So when the browser has the image and knows the checksum, it can send it to the webserver for verification. The webserver then checks whether the image has changed. If it hasn’t, the image does not need to be retransmitted and lots of data is saved.
Attentive readers might have noticed already how you can use this to track people: the browser sends the information back to the server that it previously received (the ETag). That sounds an awful lot like cookies, doesn’t it? The server can simply give each browser an unique ETag, and when they connect again it can look it up in its database.
Technical stuff (and bugs) specifically about this demo
To demonstrate how this works without having to use Javascript, I had to find a piece of information that’s relatively unique to you besides this ETag. The image is loaded after the page is loaded, but only the image contains the ETag. How can I display up to date info on the page? Turns out I can’t really do that without dynamically updating the page, which requires javascript, which I wanted to avoid to show that it can be done without.
This chicken and egg problem introduces a few bugs:
– All information you see was from your previous pageload. Press F5 to see updated data.
– When you visit a page where you don’t have an ETag (like incognito mode), your session will be emptied. Again, this is only visible when you reload the page.
I did not see a simple solution to these issues. Sure some things can be done, but nothing that other websites would use, and I wanted to keep the code as simple and as close to reality as possible.
Note that these bugs normally don’t exist when you really want to track someone because then you don’t intend to show users that they are being tracked.
Source code
What’s a project without source code? Oh right, Microsoft Windows.
https://github.com/lucb1e/cookielesscookies
What can we do to stop it?
One thing I would strongly recommend you to do anytime you visit a page where you want a little more security, is opening a private navigation window and using https exclusively. Doing this single-handedly eliminates attacks like BREACH (the latest https hack), disables any and all tracking cookies that you might have, and also eliminates cache tracking issues like I’m demonstrating on this page. I use this private navigation mode when I do online banking. In Firefox (and I think MSIE too) it’s Ctrl+Shift+P, in Chrome it’s Ctrl+Shift+N.
Besides that, it depends on your level of paranoia.
I currently have no straightforward answer since cache tracking is virtually undetectable, but also because caching itself is useful and saves people (including you) time and money. Website admins will consume less bandwidth (and if you think about it, in the end users are the ones that will have to pay the bill), your pages will load faster, and especially on mobile devices it makes a big difference if you don’t have an unlimited 4G plan. It’s even worse when you have a high-latency or low-bandwidth connection because you live in a rural area.
If you’re very paranoid, it’s best to just disable caching altogether. This will stop any such tracking from happening, but I personally don’t believe it’s worth the downsides.
The Firefox add-on Self-Destructing Cookies has the ability to empty your cache when you’re not using your browser for a while. This might be an okay alternative to disabling caching; you can only be tracked during your visit, and they can already do that anyway by following which pages were visited by which IP address, so that’s no big deal. Any later visits will appear as from a different user, assuming all other tracking methods have already been prevented.
I’m not aware of any add-on that periodically removes your cache (e.g. once per 72 hours), but there might be. This would be another good alternative for 99% of the users because it has a relatively low performance impact while still limiting the tracking capabilities.
Update: I’ve heard the Firefox add-on SecretAgent also does ETag overwriting to prevent this kind of tracking method. You can whitelist websites to re-enable caching there while blocking tracking by other domains. It has been confirmed that this add-on stops the tracking. SecretAgent’s website.
相关资料:
HTTP缓存ETAG和Last-Modified
基础知识
1) 什么是”Last-Modified”?
在浏览器第一次请求某一个URL时,服务器端的返回状态会是200,内容是你请求的资源,同时有一个Last-Modified的属性标记此文件在服务期端最后被修改的时间,格式类似这样:
Last-Modified: Fri, 12 May 2006 18:53:33 GMT
客户端第二次请求此URL时,根据 HTTP 协议的规定,浏览器会向服务器传送 If-Modified-Since 报头,询问该时间之后文件是否有被修改过:
If-Modified-Since: Fri, 12 May 2006 18:53:33 GMT
如果服务器端的资源没有变化,则自动返回 HTTP 304 (Not Changed.)状态码,内容为空,这样就节省了传输数据量。当服务器端代码发生改变或者重启服务器时,则重新发出资源,返回和第一次请求时类似。从而 保证不向客户端重复发出资源,也保证当服务器有变化时,客户端能够得到最新的资源。
2) 什么是”Etag”?
HTTP 协议规格说明定义ETag为“被请求变量的实体值” (参见 —— 章节 14.19)。 另一种说法是,ETag是一个可以与Web资源关联的记号(token)。典型的Web资源可以一个Web页,但也可能是JSON或XML文档。服务器单 独负责判断记号是什么及其含义,并在HTTP响应头中将其传送到客户端,以下是服务器端返回的格式:
ETag: “50b1c1d4f775c61:df3″
客户端的查询更新格式是这样的:
If-None-Match: W/”50b1c1d4f775c61:df3”
如果ETag没改变,则返回状态304然后不返回,这也和Last-Modified一样。本人测试Etag主要在断点下载时比较有用。
Last-Modified和Etags如何帮助提高性能?
聪明的开发者会把Last-Modified 和ETags请求的http报头一起使用,这样可利用客户端(例如浏览器)的缓存。因为服务器首先产生 Last-Modified/Etag标记,服务器可在稍后使用它来判断页面是否已经被修改。本质上,客户端通过将该记号传回服务器要求服务器验证其(客 户端)缓存。
过程如下:
1. 客户端请求一个页面(A)。
2. 服务器返回页面A,并在给A加上一个Last-Modified/ETag。
3. 客户端展现该页面,并将页面连同Last-Modified/ETag一起缓存。
4. 客户再次请求页面A,并将上次请求时服务器返回的Last-Modified/ETag一起传递给服务器。
5. 服务器检查该Last-Modified或ETag,并判断出该页面自上次客户端请求之后还未被修改,直接返回响应304和一个空的响应体。
ETAG优势