[tor-bugs] #29697 [Internal Services]: archive.tpo is soon running out of space

Tor Bug Tracker & Wiki blackhole at torproject.org
Sun Jun 9 16:41:51 UTC 2019


#29697: archive.tpo is soon running out of space
-------------------------------+--------------------------
 Reporter:  boklm              |          Owner:  anarcat
     Type:  defect             |         Status:  assigned
 Priority:  Medium             |      Milestone:
Component:  Internal Services  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:                     |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+--------------------------

Comment (by dcf):

 Replying to [comment:5 anarcat]:
 > If git-annex is too complicated, we can talk to IA directly. I would
 recommend, however, against using their web-based upload interface which,
 even they acknowledge, is terrible and barely useable. I packaged the
 [https://tracker.debian.org/pkg/python-internetarchive internetarchive]
 python client in Debian to work around that problem and it works much
 better.
 >
 > Moving files to IA only shifts the problem, in my opinion: then we have
 only a single copy, elsewhere and while we don't need to manage that space
 anymore, we also don't manage backups and will never know if they drop
 stuff on us (and they do, sometimes, either deliberately or by mistake). I
 would propose that if stuff moves out of our "backed-up" infrastructure,
 it should be stored in at least two administratively distinct locations.

 Recently I had the idea to archive some early flash proxy/pyobfsproxy
 browser bundles from circa 2013--some of them were only ever present under
 !https://people.torproject.org/~dcf/ and so what I have locally is a
 superset of what's at archive.torproject.org (for this specific group of
 packages). The problem I'm encountering with IA is the automatic malware
 scan--as soon as I upload a self-extracting Windows .exe package, the
 virus scan returns positive and automatically darks (hides) the entire
 item. Here are some attempted uploads that got darked:
  * https://archive.org/details/tor-flashproxy-browser-2.4.6-alpha-1
  * https://archive.org/details/tor-flashproxy-browser-2.4.6-alpha-2
  * https://archive.org/details/tor-flashproxy-pyobfsproxy-
 browser-2.4.7-alpha-1
 Here's a
 [https://www.virustotal.com/gui/file/358c6b2b96ad4d8137a835b987a6a0bc7ab85f5b8a010863e101a7f2a40c74f4/detection
 sample report] from the upload log. Notice some of the matches say
 "Not-a-virus" and are simply reporting the presence of tor, but it's
 enough to fail the IA check.
  * Kaspersky: Not-a-virus:NetTool.Win32.Tor.k
  * Qihoo-360: Win32/Virus.NetTool.c06
  * Microsoft: PUA:Win32/Presenoker
  * ZoneAlarm by Check Point: Not-a-virus:NetTool.Win32.Tor.k
 It seems that I can avoid the virus check by structuring the uploads:
 upload all files except the .exe, let them be virus scanned, then upload
 the .exe. The upload log says "item already had a curatenote indicating it
 had been checked, no need to update" and the item remains undarked. But
 this is no solution; besides being an apparent bug in the malware scanning
 system, it'll only work until the next time someone runs a batch scan or
 something, and then the items will disappear. For the sake of example,
 here are items I managed to upload in that way:
  * https://archive.org/details/tor-flashproxy-pyobfsproxy-
 browser-2.4.7-test-1
  * https://archive.org/details/tor-pluggable-transports-
 browser-2.4.11-alpha-1
 TL;DR: archiving at IA will probably require talking to someone there and
 getting them to make a special collection for us with
 [https://archive.org/services/docs/api/metadata-
 schema/index.html#viruscheck viruscheck] disabled.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29697#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list