Google的FeedFetcher爬虫会将spreadsheet的=image(“link”)中的任意链接缓存。
例如:
如果我们将=image(“http://example.com/image.jpg”)输入到任意一个Google spreadsheet中,Google就会“派出”FeedFetcher爬虫去抓取这个图片并保存到缓存中以将其显示出来。
但是,我们可以为文件名附加上随机参数,使FeedFetcher多次抓取同一文件。也就是说,如果一个网站有一个10MB的文件,要是将以下列表输入到Google spreadsheet中,那么Google的爬虫就会抓取该文件1000次。
=image("http://targetname/file.pdf?r=0") =image("http://targetname/file.pdf?r=1") =image("http://targetname/file.pdf?r=2") =image("http://targetname/file.pdf?r=3") ... =image("http://targetname/file.pdf?r=1000")
附加上随机参数后,每个链接都被看作是不同的链接,因此Google爬虫会去抓取多次,使网站产生大量出站流量。所以任何人只需使用浏览器并打开一些标签,就可以向web服务器发动巨大流量HTTP GET洪水攻击。
但 是这种攻击使攻击者根本不需要有多大的带宽,只需要将“图像”地址输入进spreadsheet,Google就会从服务器上抓取这个10MB的数据,但 是因为地址指向一个PDF文件(非图像文件),攻击者从Google得到的反馈为N/A。很明显这种类型的流量可以被放大多倍,引起的后果很可能是灾难性 的。
只需要使用一台笔记本,打开几个web标签页,仅仅拷贝一些指向10MB文件的链接,Google去抓取同一文件的流量就超过了 700Mbps。而这种600-700Mbps的抓取流量大概只持续了30-45分钟,我就把服务器关闭了。如果没算错的话,45分钟内大概走了 240GB的流量。
我 和我的小伙伴被这么高的出站流量惊呆了。如果文件再大一点的话,我想其出站流量可以轻易达到Gpbs级,而且进站流量也能达到50-100Mbps。可以 想象如果多个攻击者同时用这种方法攻击某个网站的话,流量能有多少了。同时由于Google用会多个IP地址进行抓取,所以也很难阻止这种类型的GET洪 水攻击,而且很容易将攻击持续数个小时,因为这种攻击实在是太容易实施了。
发现这个bug后,我开始搜索由其产生的真实案例,还真发现了两例:
第一起攻击案例解释了博主如何不小心攻击了自己,结果收到了巨款流量账单。另一篇文章《利用Spreadsheet作为DDoS武器》描述了另一个类似攻击,但指出攻击者必须先抓取整个网站并用多个帐户将链接保存在spreadsheet中。
不过奇怪的是没有人尝试用附加随机请求变量的方法。尽管只是目标网站的同一个文件,但通过这种添加随机请求变量的方法是可以对同一文件请求成千上万次的,后果还是挺吓人的,而且实施过程很容易,任何人只需要动动手指头拷贝一些链接就可以做到。
我昨天将这个bug提交给了Google,今天得到了他们的反馈,表示这不属于安全漏洞,认为这是一个暴力拒绝服务攻击,不在bug奖金范围中。
也许他们事前就知道这个问题,并且认为这不是bug?
不过即使拿不到奖金,我仍希望他们会修复这个问题,由于实施门槛低,任何人都可以利用Google爬虫发动这种攻击。有一种简单的修复方法,就是Google只抓取没有请求参数的链接。希望Google早日修复这个bug,使站长免受其带来的威胁。
原文地址:http://chr13.com/2014/03/10/using-google-to-ddos-any-website/
Using Google to DDoS any website
Google uses its FeedFetcher crawler to cache anything that is put inside =image(“link”) in the spreadsheet.
For instance:
If we put =image(“http://example.com/image.jpg”) in one of the cells of Google spreadsheet, Google will send the FeedFetcher crawler to grab the image and cache it to display.
However, one can append random request parameter to the filename and tell FeedFetcher to crawl the same file multiple times. Say, for instance a website hosts a 10 mb file.pdf then pasting a list in the spreadsheet will cause Google’s crawler to fetch the same file 1000 times.
=image("http://targetname/file.pdf?r=0") =image("http://targetname/file.pdf?r=1") =image("http://targetname/file.pdf?r=2") =image("http://targetname/file.pdf?r=3") ... =image("http://targetname/file.pdf?r=1000")
Appending random parameter, each link is treated as different thus Google crawls it multiple times causing a loss of outbound traffic for the website owner. So anyone using a browser and opening just a few tabs on his PC can send huge HTTP GET flood to a web server.
Here, the attacker does not need a huge bandwidth at all. Attacker requests Google to put the image link in the spreadsheet, Google fetches 10 MB data from the server but since it’s a PDF(non-image file), the attacker gets N/A from Google. This type of traffic flow clearly allows amplification and can be disaster.
Using just a single laptop with multiple tabs open and merely copy pasting multiple links to a 10 mb file, Google managed to crawl the same file at 700+ mbps. This 600-700 mbps crawling by Google continued for 30-45 min. at which point I shut down the server. If I do the math correctly, I think it’s about 240 GB of traffic in 45 min.
I was surprised that I was getting such high outbound traffic. With only a little bit of more load, I think the outbound will reach Gbps and the inbound traffic will reach 50-100 mbps. I can only imagine the numbers when multiple attackers get into this. Google uses multiple IP addresses to crawl and although one can use User Agent to block these attacks, the victim will have to edit the server config and in many cases it might be too late if this attack goes un-noticed. The attack could be so easily prolonged for hours just because of its ease of use.
After I found this bug, I started Googling for any such incidents, and I found two:
The Google Attack explains how the blogger attacked himself and got a huge traffic bill.
Another article, Using Spreadsheet as a DDoS Weapon explains similar attack but points that an attacker must first crawl the entire target website and keep the links in spreadsheet using multiple accounts and as such.
I found it a bit weird that nobody tried appending random request variables. Even though the target website has 1 file, adding random request variables thousands of request can be sent to that website. It’s a bit scary actually. Anyone copy pasting few links in the browser tabs should not be able to do this.
I submitted this bug to Google yesterday and got a response today that this is not a security vulnerability. This is a brute force denial of service and will not be included in the bug bounty.
May be they knew about this before hand and don’t really think it’s a bug ?
I hope they do fix this issue though. It’s simply annoying that anyone could manipulate Google crawlers to do this. A simple fix will be just crawling the links without the request parameters so that we don’t have to suffer.
ps:The attack can be blocked by using the User Agent which is the same for all FeedFetcher crawler.
转载请注明:jinglingshu的博客 » 利用Google爬虫DDoS任意网站