[tor-reports] GSoC: Weekly report for ahmia, week 33

Juha Nurmi juha.nurmi at ahmia.fi
Sat Aug 16 18:36:05 UTC 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

I have been tweaking the crawler's settings and scheduling it to crawl
all the time. I am taking automated backups from the Solr server.

Furthermore, I found out that some sites are serving infinitely number
of sub domains. For instance, the pirate bay is doing this:

rss.rss.rss.rss.rss...rss.rss.rss.uj3wazyk5u4hnvtk.onion

and as a result it is messing now the search results:

https://ahmia.fi/search/?q=the+pirate+bay

Because the crawler is re-building the index all the time and it is
not allowed to follow these kind of problematic sub domain chains any
more the problem will solve itself within a week.

Greetings,
Juha
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJT76RoAAoJELGTs54GL8vAzlcH+gIH0q8XmgOAs2/tcbs11Qxt
+3wzLtB/UgmY6b3yGlWfYlYKJW2rnPsfERXmH850sJESZwhFuYNVqshSVVjHbohS
+bO311bXAslPYrJYKt0ME8MtHmPBR4nvPIH5JNRmsuLxH7TD5MthbbFvC/vWk5Pi
mbEYBcIm5jbPhaRuby0xbHO9q766uVx4iWafNTc5i11qqrdIZ1inJvET5MyUxZEL
rge++BsYNqV2M4Dk55cMHNe4bUtAPMMlfVBl7b9li3aPVtSH6uJL40a/DggeeF9d
DYBXEZ/CvPBStsh73R7V3KS9Ro78uU9Lxi2XkVA7SlBs+r4REQ2uSvPE5rBBU9o=
=oGrf
-----END PGP SIGNATURE-----


More information about the tor-reports mailing list