[tor-relays] clarification on what Utah State University exit relays store ("360 gigs of log files")
rdump at river.com
Mon Aug 24 05:21:43 UTC 2015
On 2015-08-13 01:40, Mike Perry wrote:
> As such, I still look forward to hearing from someone who has worked at
> an ISP/University/etc where this is actually practiced. What is in
> *those* logs?
I deal with two flow recording practices and resulting records retention at work.
The first is at the upstream ISP. Roughly 1 in 5k flows are sampled from
routers for statistics gathering and general traffic measurement, in netflow
format. Within two weeks, those records are used to generate simpler tuples of
bandwidth used by member IP addresses and subnets, at decreasing resolution
over time in a Cricket RRD.
The second is work's own security monitors. Full flows are generated and
recorded from the same raw network data provided to IDSes. Records are
retained for 1 year in practice.
The flow recorder is the open source Argus from Qosient.com. Argus will
indicate a TIMEOUT situation when a TCP flow it has seen is still open at the
time logs are rolled. Additional traffic on the flow will result in a new
record, which is often annealed in post-processing with the previously seen flow.
In work's case Argus is configured to not record flows from problem hosts
(high volume noise sources) or privacy-sensitive hosts (Tor nodes, others).
Not all institutions will have that kind of configuration.
At first blush, it seems padding traffic may cause more TCP flows to be live
for sampling hits in regimes like work's upstream ISP.
On the other hand, Argus may more easily confirm a TCP flow is still live in
the case of padding traffic, but in practice "live" is already assumed for the
lesser of a tcp.established default or a log roll, unless Argus saw a TCP
More information about the tor-relays