Easy Way To Find Bots/spiders And Block Them From Nginx Access.log

Hi

I am looking for an easy way to find bots or wired user agents from my huge log file so i can block them but my access.log is very big and is not so easy to search each line

Any other way or any useful grep command?

At the moment i have this:

Code:
grep 'spider\|bot' access.log | sort -u -f >> bots.txt

Still trying to work out how to just print out the spider / bot name and remove the duplicates....

Looking also for some ideas for what else i can search for other than spider/bot as i don't know what else is bad or can cause huge load on my server....

Thanks


Similar Content



How Can I Remove Duplicated Lines On A File?

Hi

I am using this command to get some info about bots/spiders from my Centos server access.log file:

Code:
grep 'spider\|bot' access.log | sort -u -f >> bots.txt

Result is like this (i know pingdom is not bad):

Code:
141.101.105.102 - - [28/Mar/2015:01:59:56 +0200] "GET / HTTP/1.1" 200 24194 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"
141.101.105.158 - - [28/Mar/2015:02:09:56 +0200] "GET / HTTP/1.1" 200 24260 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"
141.101.105.102 - - [28/Mar/2015:02:19:56 +0200] "GET / HTTP/1.1" 200 24277 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"
108.162.215.53 - - [27/Mar/2015:23:13:21 +0200] "GET /user/74595-tery1/?tab=idm HTTP/1.1" 200 3905 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
108.162.215.53 - - [27/Mar/2015:23:11:59 +0200] "GET /user/275904-ktlk21/ HTTP/1.1" 200 3805 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
108.162.215.75 - - [27/Mar/2015:23:21:31 +0200] "GET /user/74595-tery1/?tab=topics HTTP/1.1" 200 13588 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"

Is there any command that can remove duplicate lines if the ip and the user-agent is the same on each line?

To get something like:

Code:
141.101.105.102 - - [28/Mar/2015:01:59:56 +0200] "GET / HTTP/1.1" 200 24194 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"
141.101.105.158 - - [28/Mar/2015:02:09:56 +0200] "GET / HTTP/1.1" 200 24260 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"
108.162.215.53 - - [27/Mar/2015:23:11:59 +0200] "GET /user/275904-ktlk21/ HTTP/1.1" 200 3805 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
108.162.215.75 - - [27/Mar/2015:23:21:31 +0200] "GET /user/74595-tery1/?tab=topics HTTP/1.1" 200 13588 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"

Or if there is no way for this then to get only one line (even if different ip's exist for each user agent) like:

Code:
141.101.105.102 - - [28/Mar/2015:01:59:56 +0200] "GET / HTTP/1.1" 200 24194 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)"
108.162.215.53 - - [27/Mar/2015:23:11:59 +0200] "GET /user/275904-ktlk21/ HTTP/1.1" 200 3805 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"

Diffing The Line Numbers

hi guys

i am trying to find the "size" of a "block" of data in LARGE data files, the example below test_data.txt is very simplified. by "size" i mean the difference in line numbers of a block, and the "size" will be constant throughout the file so

1234 6.600000 4321
1234 8.500000 4321
1234 1.800000 4321
1234 2.300000 4321
1234 8.500000 4321
1234 2.800000 4321

if i define a block as whenever i find 8.500000 in the second column, then in the example the the block size would be 3 becasue 8.500000 occurs on the 5th line and on the 2nd. right now i am using

Code:
 grep -n "8.500000" test_data.txt | cut -f1 -d:

and/or

Code:
 awk '/8.500000/ {print FNR}' test_data.txt

obviously i don't remeber how to tag text as code?

btw, the grep command is much much faster

both of these commands give an entire list (long list of number for files greater than a gig) of line numbers which i then have to subtract one from another to come up with 3 in the example. not that i'm opposed to doing math, but i would think awk or grep should be able to do this for me

ideas?

tabby

Quick GREP Question..

Hey guys,

Something is puzzling me!

I saw someone use the grep in the following way and I'm not sure I understand what it does, and if there's any benefit to using it this way.

Code:
grep X.X.X.X /var/log/log.log | grep -v query

I checked the man file which confirmed that -v is relating to matching non grouping lines (which I'm not sure I fully understand either!) but I don't see any difference in the output of the above command versus the same command without the | grep -v query bit..

Why would you pipe grep into grep unless you were searching for something specific within the search results?

Does query mean something else?

Using Xargs And Grep In Find Command

I've been using this a lot:

find <directory to start search at> -name "<files to search in>" -type f | xargs grep "<string to search for>"

e.g.

find /usr/include -name "*.h" -type f | xargs grep "#define UINT"

now what if I wanted to output the results to a file?

How Can I Grep Variable?

I want to And search grep shell

but It's hard to grep variable


---------------------------------------------------------------
#!/bin/bash


if [ $# -eq 0 ]
then
echo "Ussage: phone searchfor [...searchfor]"
echo "(You didn't tell me what you want to search for )"

else

pass=0
find=""

for idx in $*
do
if [ -n "$idx" ]
then
if [ $pass -eq 0 ]
then
find=$(egrep "$idx" mydata)
pass=1

else

find=$("$find" | grep "$idx")

echo $find
fi
fi

done

if [ -z "$find" ]
then
echo "There is no such thing"
else

echo $find | awk -f display.awk

fi

fi

-----------------------------------------------------

there is one error : command not found

in find=$("$find" | grep "$idx")

how can I grep variable and store it into variable?

Please Interpret The Meaning Of This Command

Hi ,

Please explain what the below command means ..


Code:
if grep -c -i Y $INIFILE > /dev/null

I know what is the use of grep, it is used for finding a character or string in a file. But i could not understand the above form of grep command.

I am new to linux so this might be a simple question, but please throw some light on it.


Edited

And also please explain why they are creating a file in null in the below command

Code:
cat /dev/null > $DATA_DIR/$DATAFILE

What Ftp Server Is Running

in my RHEL 4 server , I want to know what ftp server is running but can't find it.

I tried "ps -ef |grep ftp" but no output , chkconfig --list |grep ftp also no output related to ftp , /etc/rc.d/init.d can't find ftp service , ftp localhost is not allow .

when use FileZilla to connect it , it is ok , the ftp should be running , I tried to use "ps -ef |grep ftp" , it pops the following output , would advise what ftp server is running in server ? thanks

Code:
edp 11027 11026  0 12:39 ?        00:00:00 tcsh -c /usr/libexec/openssh/sftp-server
edp 11037 11027  0 12:39 ?        00:00:00 /usr/libexec/openssh/sftp-server
user   11050  7747  0 12:48 pts/2    00:00:00 grep ftp

Exporting Log Data To A File That Matches Stdout

hey guys,

Let's say I want to find out which log files have related ntp information in them. I use cat and grep to search through the files in /var/log and then export that to a file. this is the command...

# cat /var/log/* | grep ntp > /home/log.txt

The file created from this command will not include the directories the log entries are apart of. Why not? For example, if you do this same command without exporting to the /home/log.txt file it will show you in stdout which directory each log entry is in. Hope I'm making sense here. My question is, is there a clever way to export to a file in a way so that the file created is structured exactly like the stdout of the command below?

# cat /var/log/* | grep ntp

How Do I Select With Grep And Awk Only ONE Text Content In A Script

i got a nasty isue here.
for my machine i want to make a checkup script to see to what wireless network i am connected to.

if i type iwconfig i get below output.
Code:
 iwconfig
wlan0     IEEE 802.11bgn  ESSID:"APqwerty"
          Mode:Managed  Frequency:2.447 GHz  Access Point: 72:6B:D3:36:29:44
          Bit Rate=54 Mb/s   Tx-Power=20 dBm
          Retry short limit:7   RTS thr:off   Fragment thr:off
          Encryption key:off
          Power Management:off
          Link Quality=55/70  Signal level=-55 dBm
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:7339  Invalid misc:83573   Missed beacon:0

lo        no wireless extensions.

eth0      no wireless extensions.

With a grep awk combo i manage to narow it down.
Code:
  iwconfig | grep ESSID |awk '{ /ESSID/; print $4 }'
lo        no wireless extensions.

eth0      no wireless extensions.

ESSID:"APqwerty"

why it also shows lo and eth0 ,i dont know, but ok.

i tried several combo's on grep, awk, even cut.
i only want to catch the ESSID to where i am connected to, in this case APqwerty.
i know i am missing something, but cant find out what it is, any advice ?

Getting Variable Integer Values From Grep

I want to write a Python script. In order to write it I need to figure out how to access the values associated with my signal level and bit rate.

If I use the following command

Code:
iwconfig | grep 'Signal level'

I get:

eth0 no wireless extensions.

lo no wireless extensions.

Link Quality=70/70 Signal level= -38 dBm


Obviously, I don't want Signal level. I want whatever it happens to be. In this case, it happens to be -38. Ditto Bit Rate...How do I grab -38 from the command line?