[tor-relays] clarification on what Utah State University exit relays store ("360 gigs of log files")

Richard Johnson rdump at river.com
Mon Aug 24 05:21:43 UTC 2015

On 2015-08-13 01:40, Mike Perry wrote:
> As such, I still look forward to hearing from someone who has worked at
> an ISP/University/etc where this is actually practiced. What is in
> *those* logs?

I deal with two flow recording practices and resulting records retention at work.

The first is at the upstream ISP. Roughly 1 in 5k flows are sampled from 
routers for statistics gathering and general traffic measurement, in netflow 
format. Within two weeks, those records are used to generate simpler tuples of 
bandwidth used by member IP addresses and subnets, at decreasing resolution 
over time in a Cricket RRD.

The second is work's own security monitors. Full flows are generated and 
recorded from the same raw network data provided to IDSes. Records are 
retained for 1 year in practice.

The flow recorder is the open source Argus from Qosient.com. Argus will 
indicate a TIMEOUT situation when a TCP flow it has seen is still open at the 
time logs are rolled. Additional traffic on the flow will result in a new 
record, which is often annealed in post-processing with the previously seen flow.

In work's case Argus is configured to not record flows from problem hosts 
(high volume noise sources) or privacy-sensitive hosts (Tor nodes, others). 
Not all institutions will have that kind of configuration.

At first blush, it seems padding traffic may cause more TCP flows to be live 
for sampling hits in regimes like work's upstream ISP.

On the other hand, Argus may more easily confirm a TCP flow is still live in 
the case of padding traffic, but in practice "live" is already assumed for the 
lesser of a tcp.established default or a log roll, unless Argus saw a TCP 
teardown beforehand.


More information about the tor-relays mailing list