阻止搜索引擎和恶意蜘蛛爬虫访问

大量的蜘蛛爬虫访问会消耗服务器性能开销,更有工具类爬虫对网站进行渗透访问,给网站安全造成威胁,本文分享这些爬虫的 User-Agent 以及阻止方法。

现在大部分网站都使用CDN进行加速,建议直接在CDN设置 User-Agent 黑名单

阿里云全站加速 DCDN 设置方法如图所示,在图中填入 User-Agent

*dotbot*|*Go-http-client*|*CensysInspect*|*okhttp*|*MegaIndex*|*MegaIndex.ru*|*BLEXBot*|*Qwantify*|*qwantify*|*semrush*|*Semrush*|*serpstatbot*|*hubspot*|*python*|*Bytespider*|*Go-http-client*|*Java*|*PhantomJS*|*SemrushBot*|*Scrapy*|*Webdup*|*AcoonBot*|*AhrefsBot*|*Ezooms*|*EdisterBot*|*EC2LinkFinder*|*jikespider*|*Purebot*|*MJ12bot*|*WangIDSpider*|*WBSearchBot*|*Wotbox*|*xbfMozilla*|*Yottaa*|*YandexBot*|*Jorgee*|*SWEBot*|*spbot*|*TurnitinBot-Agent*|*mail.RU*|*Perl*|*Python*|*Wget*|*Xenu*|*ZmEu*

Cloudflare设置方法如图,若使用表达式生成器手动一个个添加将耗费太多时间,直接编辑表达式填入如下表达式即可

(http.user_agent contains "Go-http-client") or (http.user_agent contains "CensysInspect") or (http.user_agent contains "okhttp") or (http.user_agent contains "MegaIndex") or (http.user_agent contains "MegaIndex.ru") or (http.user_agent contains "BLEXBot") or (http.user_agent contains "Qwantify") or (http.user_agent contains "qwantify") or (http.user_agent contains "semrush") or (http.user_agent contains "Semrush") or (http.user_agent contains "serpstatbot") or (http.user_agent contains "hubspot") or (http.user_agent contains "python") or (http.user_agent contains "Bytespider") or (http.user_agent contains "Go-http-client") or (http.user_agent contains "Java") or (http.user_agent contains "PhantomJS") or (http.user_agent contains "SemrushBot") or (http.user_agent contains "Scrapy") or (http.user_agent contains "Webdup") or (http.user_agent contains "AcoonBot") or (http.user_agent contains "AhrefsBot") or (http.user_agent contains "Ezooms") or (http.user_agent contains "EdisterBot") or (http.user_agent contains "EC2LinkFinder") or (http.user_agent contains "jikespider") or (http.user_agent contains "Purebot") or (http.user_agent contains "MJ12bot") or (http.user_agent contains "WangIDSpider") or (http.user_agent contains "WBSearchBot") or (http.user_agent contains "Wotbox") or (http.user_agent contains "xbfMozilla") or (http.user_agent contains "Yottaa") or (http.user_agent contains "YandexBot") or (http.user_agent contains "Jorgee") or (http.user_agent contains "SWEBot") or (http.user_agent contains "spbot") or (http.user_agent contains "TurnitinBot-Agent") or (http.user_agent contains "mail.RU") or (http.user_agent contains "perl") or (http.user_agent contains "Python") or (http.user_agent contains "Wget") or (http.user_agent contains "Xenu") or (http.user_agent contains "ZmEu")

在本机装有WAF的情况,例如宝塔WAF,直接导入即可,格式是一行一个UA

可使用AI工具处理格式

在Nginx配置代码如下:

    if ($http_user_agent ~ "MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$" ) {
        return 444;
    }

Nginx配置网站同时支持多个PHP版本

很多框架系统都有插件应用市场(例如Discuz!),有些插件应用开发者由于各种原因不再对插件应用更新维护,导致该应用不支持PHP7、PHP8,但框架系统已经支持新版PHP。亦或是系统未支持新版PHP,但应用需要新版PHP才能运行。这种情况可以对Nginx进行配置实现同时支持多个PHP。

    location ~ [^/]\.php(/|$)
    {
        if ($request_uri ~*  "archives"){
          fastcgi_pass unix:/tmp/php-cgi-72.sock;
        }
      fastcgi_pass  unix:/tmp/php-cgi-56.sock;
      fastcgi_index index.php;
      include fastcgi.conf;
      include pathinfo.conf;
    }