I am using this command to get some info about bots/spiders from my Centos server access.log file:
Code:
grep 'spider\|bot' access.log | sort -u -f >> bots.txt
Result is like this (i know pingdom is not bad):
Code:
141.101.105.102 - - [28/Mar/2015:01:59:56 +0200] "GET / HTTP/1.1" 200 24194 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" 141.101.105.158 - - [28/Mar/2015:02:09:56 +0200] "GET / HTTP/1.1" 200 24260 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" 141.101.105.102 - - [28/Mar/2015:02:19:56 +0200] "GET / HTTP/1.1" 200 24277 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" 108.162.215.53 - - [27/Mar/2015:23:13:21 +0200] "GET /user/74595-tery1/?tab=idm HTTP/1.1" 200 3905 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" 108.162.215.53 - - [27/Mar/2015:23:11:59 +0200] "GET /user/275904-ktlk21/ HTTP/1.1" 200 3805 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" 108.162.215.75 - - [27/Mar/2015:23:21:31 +0200] "GET /user/74595-tery1/?tab=topics HTTP/1.1" 200 13588 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
Is there any command that can remove duplicate lines if the ip and the user-agent is the same on each line?
To get something like:
Code:
141.101.105.102 - - [28/Mar/2015:01:59:56 +0200] "GET / HTTP/1.1" 200 24194 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" 141.101.105.158 - - [28/Mar/2015:02:09:56 +0200] "GET / HTTP/1.1" 200 24260 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" 108.162.215.53 - - [27/Mar/2015:23:11:59 +0200] "GET /user/275904-ktlk21/ HTTP/1.1" 200 3805 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" 108.162.215.75 - - [27/Mar/2015:23:21:31 +0200] "GET /user/74595-tery1/?tab=topics HTTP/1.1" 200 13588 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
Or if there is no way for this then to get only one line (even if different ip's exist for each user agent) like:
Code:
141.101.105.102 - - [28/Mar/2015:01:59:56 +0200] "GET / HTTP/1.1" 200 24194 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" 108.162.215.53 - - [27/Mar/2015:23:11:59 +0200] "GET /user/275904-ktlk21/ HTTP/1.1" 200 3805 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"