有个需求,是统计nginx访问日志,统计出调用rest api最高的用户app前10,每分钟统计一次。
nginx定义的日志格式:
‘V1 [$time_iso8601] $remote_addr $host $request ‘ # Request
‘$status $body_bytes_sent $bytes_sent $request_time ‘ # Response
‘”$http_referer” “$http_user_agent” “$http_x_forwarded_for” ‘ # Client
‘$tcpinfo_rtt,$tcpinfo_rttvar,$tcpinfo_snd_cwnd,$tcpinfo_rcv_space ‘ # NET
‘$upstream_addr $upstream_status $upstream_response_time’; # Upstream
如下为日志返回的三种状态(200,40x, 50x)
207.247.141.184 a1.XXX.com – [2015-11-29T09:42:05+08:00] “POST /qudeo/xiying/devices HTTP/1.1” 503 740 “-” “XXX-SDK(Android) 2.1.3” “-” 0.316 – – –
122.46.19.22 a1.XXX.com – [2015-11-29T09:42:05+08:00] “POST /laodae/laodaeapp/messages HTTP/1.1” 200 346 “-” “-” “-” 0.010 0.002 10.652.58.178:8080 200
222.44.83.180 a1.XXX.com – [2015-11-29T09:42:05+08:00] “POST /wgdedi/zgeag/token HTTP/1.1” 400 75 “-” “XXX-SDK(Android) 2.1.1” “-” 0.019 0.019 40.193.79.4:8080 400
其中200是正常返回,40x是服务器端问题,50x是客户端问题
思路:用正则匹配出对应的api类型,按返回状态码类型分别放到三个文件里,文件按天为目录,一小时一个文件,每分钟统计一次,作top10排序,追回到文件。
代码:
usergrid_api_regex.awk:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
BEGIN { api_regex["devices"] = "^/([^\/]+)/([^\/]+)/devices$" api_regex["token"] = "^/([^\/]+)/([^\/]+)/token$" api_regex["devicelogs"] = "^/([^\/]+)/([^\/]+)/devicelogs[/]{0,1}$" api_regex["users"] = "^/([^\/]+)/([^\/]+)/users[/]{0,1}$" api_regex["users_"] = "^/([^\/]+)/([^\/]+)/users/[^\/]+$" api_regex["users_status"] = "^/([^\/]+)/([^\/]+)/users/[^\/]+/status$" api_regex["users_password"] = "^/([^\/]+)/([^\/]+)/users/[^\/]+/password$" api_regex["users_offline_msg_count"] = "^/([^\/]+)/([^\/]+)/users/[^\/]+/offline_msg_count$" api_regex["users_joined_chatgroups"] = "^/([^\/]+)/([^\/]+)/users/[^\/]+/joined_chatgroups$" api_regex["users_blocks_users"] = "^/([^\/]+)/([^\/]+)/users/[^\/]+/blocks/users[/]{0,1}$" api_regex["users_blocks_users_"] = "^/([^\/]+)/([^\/]+)/users/[^\/]+/blocks/users/[^\/]+$" api_regex["users_contacts_users"] = "^/([^\/]+)/([^\/]+)/users/[^\/]+/contacts/users[/]{0,1}$" api_regex["users_contacts_users_"] = "^/([^\/]+)/([^\/]+)/users/[^\/]+/contacts/users/[^\/]+$" api_regex["messages"] = "^/([^\/]+)/([^\/]+)/messages$" api_regex["chatmessages"] = "^/([^\/]+)/([^\/]+)/chatmessages[/]{0,1}$" api_regex["offline_chatmessages"] = "^/([^\/]+)/([^\/]+)/offline_chatmessages$" api_regex["chatfiles"] = "^/([^\/]+)/([^\/]+)/chatfiles[/]{0,1}$" api_regex["chatfiles_"] = "^/([^\/]+)/([^\/]+)/chatfiles/[^\/]+$" api_regex["chatgroups"] = "^/([^\/]+)/([^\/]+)/chatgroups$" api_regex["chatgroups_"] = "^/([^\/]+)/([^\/]+)/chatgroups/[^\/]+$" api_regex["chatgroups_users"] = "^/([^\/]+)/([^\/]+)/chatgroups/[^\/]+/users$" api_regex["chatgroups_users_"] = "^/([^\/]+)/([^\/]+)/chatgroups/[^\/]+/users/[^\/]+$" api_regex["counters"] = "^/([^\/]+)/([^\/]+)/counters$" api_regex["credentials"] = "^/([^\/]+)/([^\/]+)/credentials$" api_regex["management_token"] = "^/management/token$" api_regex["management_organizations_applications"] = "^/management/organizations/([^\/]+)/applications/([^\/]+)$" api_regex["NONAPI"] = "^/$|^/status$|^/favicon.ico$" } |
usergrid_access.awk:
1 2 3 4 5 6 7 8 9 10 11 |
{ if ( NF < 16 ) { next } method = substr($5, 2, 10) timestamp = substr($4, 2, 19) request_uri = $6 split(request_uri, u, "?") uri = u[1] status = $8 request_time = $(NF-3) } |
regex.awk:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
@include usergrid_api_regex.awk @include usergrid_access.awk { org = "-" app = "-" api = "-" for (name in api_regex) { if ( uri ~ api_regex[name] ) { match(uri, api_regex[name], res) org=res[1] app=res[2] api=method"_"name break } } if ( api != "-" ) { #printf "%s#%s\t%s\t%d\n", org, app, api, status app_name=org"#"app count[timestamp, app_name, api, status] += 1 total_request_time[timestamp, app_name, api, status] += request_time } else { #print } } END { for ( app_api_status in count ) { split(app_api_status, idx, SUBSEP) time = idx[1] app = idx[2] api = idx[3] status = idx[4] request_time = total_request_time[app_api_status] #if (status ~ 40){ printf "%s %s %s %d %d %f\n", time, app, api, status, count[app_api_status], request_time #} } } |
最后是统计脚本api_top10.sh:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
#!/bin/bash # wrote on 2015/11/28 by Sean Lv # count rest api top 10 including return status of 200,40x, 50x, counted by 1 min YEAR=`date +%Y` MON=`date +%m` DAY=`date +%d` HOU=`date +%H` #MIN=`date +%M` LOGFILE=/x/x/x/nginx/usergrid-access.log DIR=/x/x/x/api_account/$YEAR/$MON/$DAY if [ ! -d "$DIR" ];then mkdir -p $DIR fi S200=$DIR/$HOU-200 S400=$DIR/$HOU-400 S500=$DIR/$HOU-500 #[2015-11-28T22:03:43+08:00] cd /data/shell/api_account/ tail -n70000 $LOGFILE | grep `date +%F"T"%H:%M -d "1 minute ago"` | igawk -f regex.awk |awk '{if ($4 ~ 20) {a[$5]=$0} else if ($4 ~ 40) {a4[$5]=$0} else if ($4 ~ 50) {a5[$5]=$0}}END{if (a[1]!="") {for (i in a){print a[i]|"sort -rn -k5 | head -n10>>'$S200'"}} if (a4[1]!="") {for (i in a4){print a4[i]|"sort -rn -k5 | head -n10 >>'$S400'"}} if (a5[1]!="") {for (i in a5){print a5[i]|"sort -rn -k5 | head -n10>>'$S500'"}}}' |
1 |
#由于以为语句太长,下面用\来分行显示,效果同上: |
1 2 3 4 5 6 7 8 9 10 |
tail -n70000 $LOGFILE | grep `date +%F"T"%H:%M -d "1 minute ago"` |\ igawk -f regex.awk |awk '{if ($4 ~ 20) {a[$5]=$0} \ else if ($4 ~ 40) {a4[$5]=$0} \ else if ($4 ~ 50) {a5[$5]=$0}}\ END{if (a[1]!="") {\ for (i in a){print a[i]|"sort -rn -k5 | head -n10>>'$S200'"}} \ if (a4[1]!="") \ {for (i in a4){print a4[i]|"sort -rn -k5 | head -n10 >>'$S400'"}} \ if (a5[1]!="") \ {for (i in a5){print a5[i]|"sort -rn -k5 | head -n10>>'$S500'"}}}' |
把api_top10.sh放入crontab,每分钟调用一次。用igawk代替awk,是因为awk 3.1.7版本需要用igawk才能支持使用include语句。
0 Comments