Java Mailing List Archive

http://www.java2.5341.com/

Home » nutch-user.lucene »

db_gone/javascript/invalid URLs

Höchstötter Nadine

2008-10-09

Replies: Find Java Web Hosting

Author LoginPost Reply
Hi all,
I have a problem with javascript. I tried to crawl bild.de and I got many links not having been fetched. I got the stats and they mostly say "Status 3: (db_gone)". With a look at those urls entitled "db_gone" you will see some weird things as listed below the email. I just listed a few. I do not think that this is only a javascript problem but probably also a url normalization problem. Does anybody know how to deal with it? Thanks, Nadine.

http://software.bild.de/js/6M/x-6N-6Q-6T

Status: 3 (db_gone)

http://software.bild.de/js/;l(6.1f(7,

Status: 3 (db_gone)

http://software.bild.de/js/</22>

Status: 3 (db_gone)

http://software.bild.de/js/</4t></29></22>

Status: 3 (db_gone)

http://software.bild.de/js/a.1i

Status: 3 (db_gone)

http://software.bild.de/js/},4o:q(){6(7)[6(7).4E(

Status: 3 (db_gone)

http://software.bild.de/ratgeber-karriere/jobs/allgemein

Status: 3 (db_gone)

http://software.bild.de/text/javascript

Status: 3 (db_gone)

http://software.bild.de/top.document.all.

Status: 3 (db_gone)

http://tv.bild.de/+escape(document.referrer)+

Status: 3 (db_gone)

http://tv.bild.de/_js/+escape(document.referrer)+

Status: 3 (db_gone)

http://tv.bild.de/_js/...

Status: 3 (db_gone)

http://tv.bild.de/_js/1.5.1.1

Status: 3 (db_gone)

http://tv.bild.de/_js/</tbody></+escape(document.referrer)+

Status: 3 (db_gone)

http://tv.bild.de/_js/</tbody></bild//CP//+escape(document.referrer)+

Status: 3 (db_gone)

http://tv.bild.de/_js/</tbody></bild//CP//bild//CP//+escape(document.referrer)+

Status: 3 (db_gone)

http://tv.bild.de/_js/</tbody></bild//CP//bild//CP//entertainment/body/tv/tvprogramm/tvprogramm/home/+escape(document.referrer)+

Status: 3 (db_gone)


©2008 java2.5341.com - Jax Systems, LLC, U.S.A.