阻止搜索引擎和恶意蜘蛛爬虫访问

大量的蜘蛛爬虫访问会消耗服务器性能开销,更有工具类爬虫对网站进行渗透访问,给网站安全造成威胁,本文分享这些爬虫的 User-Agent 以及阻止方法。

现在大部分网站都使用CDN进行加速,建议直接在CDN设置 User-Agent 黑名单

阿里云全站加速 DCDN 设置方法如图所示,在图中填入 User-Agent

*dotbot*|*Go-http-client*|*CensysInspect*|*okhttp*|*MegaIndex*|*MegaIndex.ru*|*BLEXBot*|*Qwantify*|*qwantify*|*semrush*|*Semrush*|*serpstatbot*|*hubspot*|*python*|*Bytespider*|*Go-http-client*|*Java*|*PhantomJS*|*SemrushBot*|*Scrapy*|*Webdup*|*AcoonBot*|*AhrefsBot*|*Ezooms*|*EdisterBot*|*EC2LinkFinder*|*jikespider*|*Purebot*|*MJ12bot*|*WangIDSpider*|*WBSearchBot*|*Wotbox*|*xbfMozilla*|*Yottaa*|*YandexBot*|*Jorgee*|*SWEBot*|*spbot*|*TurnitinBot-Agent*|*mail.RU*|*Perl*|*Python*|*Wget*|*Xenu*|*ZmEu*

Cloudflare设置方法如图,若使用表达式生成器手动一个个添加将耗费太多时间,直接编辑表达式填入如下表达式即可

(http.user_agent contains "Go-http-client") or (http.user_agent contains "CensysInspect") or (http.user_agent contains "okhttp") or (http.user_agent contains "MegaIndex") or (http.user_agent contains "MegaIndex.ru") or (http.user_agent contains "BLEXBot") or (http.user_agent contains "Qwantify") or (http.user_agent contains "qwantify") or (http.user_agent contains "semrush") or (http.user_agent contains "Semrush") or (http.user_agent contains "serpstatbot") or (http.user_agent contains "hubspot") or (http.user_agent contains "python") or (http.user_agent contains "Bytespider") or (http.user_agent contains "Go-http-client") or (http.user_agent contains "Java") or (http.user_agent contains "PhantomJS") or (http.user_agent contains "SemrushBot") or (http.user_agent contains "Scrapy") or (http.user_agent contains "Webdup") or (http.user_agent contains "AcoonBot") or (http.user_agent contains "AhrefsBot") or (http.user_agent contains "Ezooms") or (http.user_agent contains "EdisterBot") or (http.user_agent contains "EC2LinkFinder") or (http.user_agent contains "jikespider") or (http.user_agent contains "Purebot") or (http.user_agent contains "MJ12bot") or (http.user_agent contains "WangIDSpider") or (http.user_agent contains "WBSearchBot") or (http.user_agent contains "Wotbox") or (http.user_agent contains "xbfMozilla") or (http.user_agent contains "Yottaa") or (http.user_agent contains "YandexBot") or (http.user_agent contains "Jorgee") or (http.user_agent contains "SWEBot") or (http.user_agent contains "spbot") or (http.user_agent contains "TurnitinBot-Agent") or (http.user_agent contains "mail.RU") or (http.user_agent contains "perl") or (http.user_agent contains "Python") or (http.user_agent contains "Wget") or (http.user_agent contains "Xenu") or (http.user_agent contains "ZmEu")

在本机装有WAF的情况,例如宝塔WAF,直接导入即可,格式是一行一个UA

可使用AI工具处理格式

在Nginx配置代码如下:

    if ($http_user_agent ~ "MegaIndex|MegaIndex.ru|BLEXBot|Qwantify|qwantify|semrush|Semrush|serpstatbot|hubspot|python|Bytespider|Go-http-client|Java|PhantomJS|SemrushBot|Scrapy|Webdup|AcoonBot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Jorgee|SWEBot|spbot|TurnitinBot-Agent|mail.RU|perl|Python|Wget|Xenu|ZmEu|^$" ) {
        return 444;
    }

解决宝塔登录面板时卡在正在登录的问题

在登录宝塔面板时一直卡在“正在登录…” 进不去,尝试了重启面板、修复面板、清除缓存、清理系统垃圾、更换设备及网络等一系列操作都没解决问题。最后执行了(22) 显示面板错误日志,发现很多错误。

%title插图%num

本文使用的宝塔面板版本:8.0.1,系统:CentOS 7.9.2009 x86_64

宝塔面板每天会自动备份配置文件,保存在 /www/backup/panel/ 目录,最终通过恢复备份成功解决宝塔卡在正在登录的问题,以下是命令执行记录以供参考。

# bt stop
Stopping Bt-Tasks...    done
Stopping Bt-Panel...    done
# mv /www/server/panel/data/default.db /www/backup/default.db
# mkdir /www/backup/db/ -p
# unzip -o -d /www/backup/db /www/backup/panel/2023-08-04.zip
Archive:  /www/backup/panel/2023-08-04.zip
   creating: /www/backup/db/2023-08-04/
   creating: /www/backup/db/2023-08-04/data/
  inflating: /www/backup/db/2023-08-04/data/404.html  
  inflating: /www/backup/db/2023-08-04/data/default.db
# \cp -r -a /www/backup/db/2023-08-04/data/default.db /www/server/panel/data/default.db
# bt start
Starting Bt-Panel....   done
Starting Bt-Tasks...    done
 

为什么恢复4号而不是最近的?
最近5号到8号的备份文件大小一样,可能问题就从5号开始,所以优先恢复4号的。

%title插图%num

宝塔面板免费版7.0.3开启收费插件Nginx防火墙的方法

打开 软件管理 > Nginx > 设置 > 配置修改;
查找#include luawaf.conf;,去掉前面的 # 符号(“#”代表注释),保存并重启 Nginx即可开启。

可以试着访问 http://你的网址/?id=../etc/passwd,如果页面弹出拦截提示即表示开启成功!

一般来讲默认开启即可发挥拦截作用,如需修改规则可在/www/server/nginx/waf/config.lua自行修改。

RulePath = "/www/server/panel/vhost/wafconf/"   
--waf 详细规则存放目录(一般无需修改)
attacklog = "on"
--是否开启攻击日志记录(on 代表开启,off 代表关闭。下同)
logdir = "/www/wwwlogs/waf/"
--攻击日志文件存放目录(一般无需修改)
UrlDeny="on"
--是否开启恶意 url 拦截
Redirect="on"
--拦截后是否重定向
CookieMatch="off"
--是否开启恶意 Cookie 拦截
postMatch="off"
--是否开启 POST 攻击拦截
whiteModule="on"
--是否开启 url 白名单
black_fileExt={"php","jsp"}
--文件后缀名上传黑名单,如有多个则用英文逗号分隔。如:{"后缀名1","后缀名2","后缀名
3"……} ipWhitelist={"127.0.0.1"}

--白名单 IP,如有多个则用英文逗号分隔。如:{"127.0.0.1","127.0.0.2","127.0.0.3"……} 下同
ipBlocklist={"1.0.0.1"}
--黑名单 IP
CCDeny="off"
--是否开启 CC 攻击拦截
CCrate="300/60"
--CC 攻击拦截阈值,单位为秒。"300/60" 代表 60 秒内如果同一个 IP 访问了 300 次则拉黑