Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

deducing web crawler behavior from access.log files

ps1c5o

2008-07-03

Replies: Find Java Web Hosting

Author LoginPost Reply

I dont know if this is the right place but... if not, sry.

ike the title says i need to be able to deduce web crawler behavior from the
access log.
In particular, i need to understand what this means:

xx.xx.xx.x - - [12/Jun/2008:21:10:31 +0100] "GET /phpmyadmin/main.php
HTTP/1.0" 404 1123 "-" "-"

xx.xx.x.xx - - [12/Jun/2008:21:10:31 +0100] "GET /phpMyAdmin/main.php
HTTP/1.0" 404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:31 +0100] "GET /db/main.php HTTP/1.0"
404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /web/main.php HTTP/1.0"
404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /PMA/main.php HTTP/1.0"
404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:32 +0100] "GET /admin/main.php
HTTP/1.0" 404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:33 +0100] "GET /dbadmin/main.php
HTTP/1.0" 404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:33 +0100] "GET /PMA2006/main.php
HTTP/1.0" 404 1123 "-" "-"

xxx.xxx.xx.xx - - [12/Jun/2008:21:10:34 +0100] "GET /pma2006/main.php
HTTP/1.0" 404 1123 "-" "-"

xx.xx.xx.xx - - [12/Jun/2008:21:10:34 +0100] "GET /sqlmanager/main.php
HTTP/1.0" 404 1123 "-" "-"


where i replaced the ip for x's for privacy sake.

this is just an extract... there are probably over 200 lines similar to
those where the crawler tries to get main.php file from hundreds of
different file paths, most including some folder named phpmyadmin or
similar.

Is this an attempt to attack the machine? Why does he want the main.php file
so bad?

thnx in advance
--
Sent from the Nutch - User mailing list archive at Nabble.com.

©2008 java2.5341.com - Jax Systems, LLC, U.S.A.