有个需求,是统计nginx访问日志,统计出调用rest api最高的用户app前10,每分钟统计一次。

nginx定义的日志格式:

‘V1 [$time_iso8601] $remote_addr $host $request ‘ # Request
‘$status $body_bytes_sent $bytes_sent $request_time ‘ # Response
‘”$http_referer” “$http_user_agent” “$http_x_forwarded_for” ‘ # Client
‘$tcpinfo_rtt,$tcpinfo_rttvar,$tcpinfo_snd_cwnd,$tcpinfo_rcv_space ‘ # NET
‘$upstream_addr $upstream_status $upstream_response_time’; # Upstream

如下为日志返回的三种状态(200,40x, 50x)

207.247.141.184 a1.XXX.com – [2015-11-29T09:42:05+08:00] “POST /qudeo/xiying/devices HTTP/1.1” 503 740 “-” “XXX-SDK(Android) 2.1.3” “-” 0.316 – – –
122.46.19.22 a1.XXX.com – [2015-11-29T09:42:05+08:00] “POST /laodae/laodaeapp/messages HTTP/1.1” 200 346 “-” “-” “-” 0.010 0.002 10.652.58.178:8080 200

222.44.83.180 a1.XXX.com – [2015-11-29T09:42:05+08:00] “POST /wgdedi/zgeag/token HTTP/1.1” 400 75 “-” “XXX-SDK(Android) 2.1.1” “-” 0.019 0.019 40.193.79.4:8080 400

其中200是正常返回,40x是服务器端问题,50x是客户端问题

思路:用正则匹配出对应的api类型,按返回状态码类型分别放到三个文件里,文件按天为目录,一小时一个文件,每分钟统计一次,作top10排序,追回到文件。

代码:

usergrid_api_regex.awk:

usergrid_access.awk:

regex.awk:

最后是统计脚本api_top10.sh:

把api_top10.sh放入crontab,每分钟调用一次。用igawk代替awk,是因为awk 3.1.7版本需要用igawk才能支持使用include语句。

Categories: 未分类

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *