[tor-reports] GSoC: Weekly report for ahmia, week 33
Juha Nurmi
juha.nurmi at ahmia.fi
Sat Aug 16 18:36:05 UTC 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi all,
I have been tweaking the crawler's settings and scheduling it to crawl
all the time. I am taking automated backups from the Solr server.
Furthermore, I found out that some sites are serving infinitely number
of sub domains. For instance, the pirate bay is doing this:
rss.rss.rss.rss.rss...rss.rss.rss.uj3wazyk5u4hnvtk.onion
and as a result it is messing now the search results:
https://ahmia.fi/search/?q=the+pirate+bay
Because the crawler is re-building the index all the time and it is
not allowed to follow these kind of problematic sub domain chains any
more the problem will solve itself within a week.
Greetings,
Juha
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBAgAGBQJT76RoAAoJELGTs54GL8vAzlcH+gIH0q8XmgOAs2/tcbs11Qxt
+3wzLtB/UgmY6b3yGlWfYlYKJW2rnPsfERXmH850sJESZwhFuYNVqshSVVjHbohS
+bO311bXAslPYrJYKt0ME8MtHmPBR4nvPIH5JNRmsuLxH7TD5MthbbFvC/vWk5Pi
mbEYBcIm5jbPhaRuby0xbHO9q766uVx4iWafNTc5i11qqrdIZ1inJvET5MyUxZEL
rge++BsYNqV2M4Dk55cMHNe4bUtAPMMlfVBl7b9li3aPVtSH6uJL40a/DggeeF9d
DYBXEZ/CvPBStsh73R7V3KS9Ro78uU9Lxi2XkVA7SlBs+r4REQ2uSvPE5rBBU9o=
=oGrf
-----END PGP SIGNATURE-----
More information about the tor-reports
mailing list