博文作者：rices

发布日期：2014-07-28

阅读次数：2203

博文内容：

【前言】

一般情况下，网站或者广告联盟都会非常想要一种技术方式可以在网络上精确定位到每一个个体，这样可以通过收集这些个体的数据，通过分析后更加精准的去推送广告（精准化营销）或其他有针对性的一些活动。Cookie技术是非常受欢迎的一种。当用户访问一个网站时，网站可以在用户当前的浏览器Cookie中永久植入一个含有唯一标示符（UUID）的信息，并通过这个信息将用户所有行为（浏览了哪些页面？搜索了哪些关键字？对什么感兴趣？点了哪些按钮？用了哪些功能？看了哪些商品？把哪些放入了购物车等等）关联起来。

而随着网民对个人隐私的重视，Cookie越来越不受待见。不少安全工具甚至是浏览器都开始允许或引导用户关闭Cookie功能，比如很多主流浏览器都有一个“隐私模式浏览”功能。这样以来，网站就很难追踪用户行为了。但仍然有一些方法可以让网站去追踪每一个访问者的行为，比如通过flash cookie的方式也可以达到唯一标识和追踪的目的。

笔者近期注意到，国外媒体报道了一种非常难以摆脱的新型在线追踪工具被用来尾随从白宫官网到色情网站YouPorn.com的热门网站的访问者。经过分析，这个就是另一种比较新的访客追踪技术：“帆布指纹识别”技术，具体代码见附录6。这个技术的独特之处是：它不像通过Cookie或者Flash Cookie等之类的方式，你基本是无法屏蔽它的。

【原理分析】

笔者收集整理了很多知名站点上的类似代码，详见附录4，从这些“帆布指纹识别”代码可以看出，均使用到了HTML5专属标签<canvas>的一个现状：在绘制canvas图片时，同样的canvas绘制代码，不同机器和浏览器绘制的图片特征是相同并且独一无二的，这样以来，提取最简单的md5 值便可以唯一标识和跟踪这个用户。

一段产生canvas元素的javascript代码：

var canvas = document.createElement('canvas');
var ctx = canvas.getContext('2d');
var txt = 'http://security.tencent.com/';
ctx.textBaseline = "top";
ctx.font = "14px 'Arial'";
ctx.textBaseline = "tencent";
ctx.fillStyle = "#f60";
ctx.fillRect(125,1,62,20);
ctx.fillStyle = "#069";
ctx.fillText(txt, 2, 15);
ctx.fillStyle = "rgba(102, 204, 0, 0.7)";
ctx.fillText(txt, 4, 17);

获取绘画的内容，需要使用到canvas.toDataURL()方法，该方法返回的是图片内容的base64编码字符串。对于PNG文件格式，以块(chunk)划分，最后一块是一段32位的CRC校验，提取这段CRC校验码便可以用于用户的唯一标识：

var b64 = canvas.toDataURL().replace("data:image/png;base64,","");
var bin = atob(b64);
var crc = bin2hex(bin.slice(-16,-12));
console.log(crc);

chrome隐身模式测试：

同一机器的chrome浏览器，无论正常模式还是隐身模式，得到的crc值始终一致。而对于不同机器得到的值是不同的，追踪效果显而易见。

看到这里，相信很多人想问，Why？为什么会出现这样的情况？同样的js代码，在不同设备的浏览器上，结果是唯一并且各不相同的。这到底是为什么？其实原因很简单，同样的HTML5 Canvas元素绘制操作，在不同的操作系统不同的浏览器上，产生的图片内容其实是不完全相同的。出现这种情况可能是有几个原因：
1、在图片格式上，不同web浏览器使用了不同的图形处理引擎、不同的图片导出选项、不同的默认压缩级别等。

2、在像素级别来看，操作系统各自使用了不同的设置和算法来进行抗锯齿和子像素渲染操作。
因此，即使是相同的绘图操作，最终产生的图片数据在hash层面上依然是不同的。这个具体代码层面，恐怕要去搞懂各个主流浏览器的实现和以及操作系统的渲染。笔者精力所限，短期很难给出。大家可以自行摸索下，欢迎交流J

【后话】

HTML5千变万化，利用canvas 这一特性来实现用户追踪，目前并没有好的对抗方案，未来也只能依靠广大浏览器厂商自行了断，实现canvas绘图机制的随机化或许可以很好的保护用户隐私，防止被追踪。

文中涉及到的代码和技术细节，只限用于技术交流，切勿用于非法用途。另外，如果想要研究更多的用户追踪技术，推荐去研究下大名鼎鼎的专注于访客追踪的开源项目：evercookie【附录5】，这个猥琐的小工具，通过几乎所有你想到和想不到的方式（Cookie、Flash、Silverlight、Web History、HTTP ETags、Web cache、window.name caching、userData storage、HTML5、甚至是java的漏洞等）来跟踪访问网站的用户行为。

【附录】

[1] http://cseweb.ucsd.edu/~hovav/dist/canvas.pdf

[2] https://securehomes.esat.kuleuven.be/~gacar/sticky/index.html

[3] https://panopticlick.eff.org/browser-uniqueness.pdf

[4] 部分“帆布指纹鉴别代码”地址列表：

http://ct1.addthis.com/static/r07/core130.js
http://i.ligatus.com/script/fingerprint.min.js
http://src.kitcode.net/fp2.js
http://admicro1.vcmedia.vn/fingerprint/figp.js
http://shorte.st/js/packed/smeadvert-intermediate-ad.js
http://stat.ringier.cz/js/fingerprint.min.js
http://cya2.net/js/STAT/89946.js
http://images.revtrax.com/RevTrax/js/fp/fp.min.jsp
http://rackcdn.com/mongoose.fp.js

[5] evercookie官网 http://samy.pl/evercookie/

[6] 使用帆布指纹识别技术的库fingerprintjs 官网 https://github.com/Valve/fingerprintjs

[7] https://www.browserleaks.com/canvas#how-does-it-work

完整测试代码，在firefpx下可以运行，chrome下不可以运行，因为缺少atob函数：

<script>
var canvas = document.createElement('canvas');
var ctx = canvas.getContext('2d');
var txt = 'http://security.tencent.com/';
ctx.textBaseline = "top";
ctx.font = "14px 'Arial'";
ctx.textBaseline = "tencent";
ctx.fillStyle = "#f60";
ctx.fillRect(125,1,62,20);
ctx.fillStyle = "#069";
ctx.fillText(txt, 2, 15);
ctx.fillStyle = "rgba(102, 204, 0, 0.7)";
ctx.fillText(txt, 4, 17);
// code is simple but we must provide some conversion functions
// it is still less and faster than md5
function bin2hex(s) {
  //  discuss at: http://phpjs.org/functions/bin2hex/
  // original by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
  // bugfixed by: Onno Marsman
  // bugfixed by: Linuxworld
  // improved by: ntoniazzi (http://phpjs.org/functions/bin2hex:361#comment_177616)
  //   example 1: bin2hex('Kev');
  //   returns 1: '4b6576'
  //   example 2: bin2hex(String.fromCharCode(0x00));
  //   returns 2: '00'

  var i, l, o = '',
    n;

  s += '';

  for (i = 0, l = s.length; i < l; i++) {
    n = s.charCodeAt(i)
      .toString(16);
    o += n.length < 2 ? '0' + n : n;
  }

  return o;
}

if (typeof window.atob == "undefined") {
function atob(a) {
// IE9 still has not atob() function (base64-to-binary)
// ... you must put some replacement here ...
}
}
var b64 = canvas.toDataURL().replace("data:image/png;base64,","");
var bin = atob(b64);
// crc32 takes only 4 bytes and placed from 16 to 12 byte from the end of file
var crc = bin2hex(bin.slice(-16,-12));
alert(crc);
console.log(crc);
console.log('123');

</script>

HTML5 Canvas Fingerprinting

This is a simple Proof-of-Concept that Browser Fingerprinting is possible without any of User-Agent identifiers.

The method is based on the fact that the same HTML5 Canvas element can produce exceptional pixels on a different web browsers, depending on the system on which it was executed.

This happens for several reasons: at the image format level — web browsers uses different image processing engines, export options, compression level, final images may got different hashes even if they are pixel-perfect; at the pixmap level — operating systems use different algorithms and settings for anti-aliasing and sub-pixel rendering. We don’t know all the reasons, but we have already collected more than a thousand unique signatures.

Unlike the other «browser detection» tricks, it deals with many OS features related on graphics environment. Potentially it can be used to identify the video adapter, especially if you are use WebGL profiling, not just Canvas 2D Context. By the way different graphics card drivers can also sometimes affect to regular fonts rendering.

This technique is good to make a unique/trackable signature when it is combined with other common methods, e.g. in systems such as EFF’s Panopticlick or PET’s Fingerprinting.

Much more difficult to obtain specific parameters of the system. It is signature-based, and the problem of signature collection is the main limiting factor why it is not so easy at the moment. We can produce Canvas/WebGL pixmap, and associate its fingerprint with own machine, but we cannot ask each visitor individually about his system and hardware. Certainly, there are ways to get such info: malware, social engineering, mturk, but for now we do not consider them, for now 🙂

First of all, there will be no any ready-to-embed solutions, this is only demo. We are just going to share some snippets how it works under the hood.

Our signatures DB is based on a simple association between «Canvas Fingerprint» and «HTTP User-Agent», so it is very prone to false positives because of the faked headers, etc. If your fingerprint matches one that is stored in database, the program will show you which User-Agents have the same signature. Otherwise, well, your browser seems unique for our modest DB, but we cannot sign you here in any way, because we do not collect signatures from this website.

Here is the javascript code that produce the pixels:

// text with lowercase/uppercase/punctuation symbols
var txt = "BrowserLeaks,com <canvas> 1.0";
ctx.textBaseline = "top";
// the most common type
ctx.font = "14px 'Arial'";
ctx.textBaseline = "alphabetic";
ctx.fillStyle = "#f60";
ctx.fillRect(125,1,62,20);
// some tricks for color mixing
ctx.fillStyle = "#069";
ctx.fillText(txt, 2, 15);
ctx.fillStyle = "rgba(102, 204, 0, 0.7)";
ctx.fillText(txt, 4, 17);
// more explanation? see the Further Reading below...

To create a signature from the canvas, we must export the pixels from the application’s memory using the toDataURL() method, which will return the base64-encoded string of the binary image file. Then we can just create MD5 hash of this string.

But for the PoC we came up with a slightly more interesting decision. As we know, PNG files is divided into chunks, and last part of each chunk is a 32-bit CRC checksum calculated on the preceding bytes. So all we need is to extract the IDAT CRC — that will be the browser fingerprint:

// code is simple but we must provide some conversion functions
// it is still less and faster than md5
function bin2hex (s) {
// ... bin-to-hex conversion code ...
}
if (typeof window.atob == "undefined") {
function atob(a) {
// IE9 still has not atob() function (base64-to-binary)
// ... you must put some replacement here ...
}
}
var b64 = canvas.toDataURL.replace("data:image/png;base64,","");
var bin = atob(b64);
// crc32 takes only 4 bytes and placed from 16 to 12 byte from the end of file
var crc = bin2hex(bin.slice(-16,-12));

Anonymous Browser Fingerprinting

What is fingerprinting?

Fingerprinting is a technique, outlined in the research by Electronic Frontier Foundation, of anonymously identifying a web browser with accuracy of up to 94%.

Browser is queried its agent string, screen color depth, language, installed plugins with supported mime types, timezone offset and other capabilities, such as local storage and session storage. Then these values are passed through a hashing function to produce a fingerprint that gives weak guarantees of uniqueness.

No cookies are stored to identify a browser.

It’s worth noting that a mobile share of browsers is much more uniform, so fingerprinting should be used only as a supplementary identifying mechanism there.

In this post I’m going to explain how it works in detail and give you real-life statistics accumulated over the period of 4 months of production usage.

Why

I was given an experimental task to implement the fingerprinting for both anonymous and logged-in users of one of our web sites. We wanted to see if it was possible at all to rely on identifying someone this way and not leave cookies. The idea was to accumulate the fingerprints and associated preferences and then pre-filter the information on front page based on what’s known about a user.

Implementation

So I got to work and started making a basic outline in my head. What is that identifies a browser? I gathered it would be: browser agent, browser language, screen color depth, installed plugins and their mime types, timezone offset, local storage, and session storage.

Initially I added the screen resolution as well, but a colleague adviced that one can use multiple monitors with a single laptop, for example connect an external monitor when working in office, so I removed it.

On my laptop browser the values are:

// Assuming jQuery in scope

navigator.userAgent
// "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36"

navigator.language
// "en-US"

var plugins = $.map(navigator.plugins, function(p){
   var mimeTypes = $.map(p, function(mimeType){
    return [mimeType.type, mimeType.suffixes].join('~');
   }).join(',');
  return [p.name, p.description, mimeTypes].join('::');
});


$.each(plugins, function(i, p){
  // truncate only for blog example
  if(p.length > 80){
    console.log(p.substring(0, 77) + '...');
  } else{
    console.log(p);
  }
});

/*
Shockwave Flash:Shockwave Flash 11.7 r700:application/x-shockwave-flash~swf,a... 
Chrome Remote Desktop Viewer:This plugin allows you to securely access other ... 
Widevine Content Decryption Module:Enables Widevine licenses for playback of ... 
Native Client::application/x-nacl~nexe 
Chrome PDF Viewer::application/pdf~pdf,application/x-google-chrome-print-prev... 
Google Talk Plugin Video Accelerator:Google Talk Plugin Video Accelerator ver... 
Google Talk Plugin:Version: 4.0.1.0:application/googletalk~googletalk 
Google Talk Plugin Video Renderer:Version: 4.0.1.0:application/o1d~o1d 
Shockwave Flash:Shockwave Flash 11.2 r202:application/x-shockwave-flash~swf,a...
*/

screen.colorDepth
// 24

new Date().getTimezoneOffset();
// -240

!!window.localStorage
// true

!!window.sessionStorage
// true

So I now knew all my browser had, and I needed to produce the fingerprint itself. For that I wanted to use a fast, non-cryptographic hashing function, such as murmur hashing.

Murmur hashing produces 32-bit integer as a result and works really well. When compared to other popular hash functions, MurmurHash performed well in a random distribution of regular keys.

I picked this implementation and added it to the code.

The last step was to combine all browser’s capabilities into a long string and pass it through hashing.

The end result on my laptop was: 3723825959

As a finishing touch, I wanted to get rid of jQuery, so I implemented the each and map methods and got a no-dependencies script.

How to improve accuracy?

The above research states that the identification accuracy is surprisingly high. But to improve it even further, Flash or Java integration is required to get a list of installed fonts, thus making each browser even more unique.

What about hash collisions?

My tests show that for random strings Murmurh hashing indeed produces collisions, but their number is negligible for my purposes: 5-7 collisions per ~200K of capabilities strings.

What about mobile browsers?

It’s simple: browser fingerprinting is not good with mobile browsers, unless you want to distinguish Android users from iPhone ones.

Results

After having had the fingerprinting on production for 4 months, I have some data to analyze. First of all, let me say that I’m not at liberty to tell the exact number of visitors to the web site, but I can say it is several millions a month, so we have some data to play with. All numbers below represent our usage and do not represent what you might have.

89% of fingerprints are unique

20% of our users have more than one fingerprint, i.e. several browsers or devices.

Very few users have a staggering amount of fingerprints, for example 20-25. I don’t know if they have a lot of devices, use different browsers or something else.

After viewing the results we removed the fingerprinting because of poor identification, especially with mobile devices. If your traffic mostly comes from desktops and you’re OK with 10-12% of false identifications you might want to try it.

Show me the code

code on github – the version I had in production

test your browser

转载请注明：jinglingshu的博客 » 取代cookie的网站追踪技术：”帆布指纹识别”初探

取代cookie的网站追踪技术：”帆布指纹识别”初探