[tor-relays] clarification on what Utah State University exit relays store ("360 gigs of log files")

Sharif Olorin sio at tesser.org
Thu Aug 13 12:35:43 UTC 2015


Mike,

> At what resolution is this type of netflow data typically captured?

For raw capture, timestamps are typically second-resolution. The
resolution post-aggregation is a different question. Keep in mind that
netflow is just the most common example; many networks don't use Cisco
netflow, but have something that meets the same requirements,
storing relatively more or less data (e.g., pmacct, bro).

> Are we talking about all connection 5-tuples, bidirectional/total
> transfer byte totals, and open and close timestamps, or more (or less)
> detail than this?

That's about right; some systems (e.g., pmacct in some configurations)
store a four-tuple of (src,dest,tx,rx) while throwing out the
ports and aggregating over the tx and rx flows such that connections
can no longer be uniquely identified. What's stored from Cisco netflow
is quite flexible[0]. Other systems like bro default to storing
one record per connection, with all the information in a five-tuple
plus things like IP TOS and byte counts.

> Are timestamps always included?

Yes, to some granularity (there's not much point in storing connection info
without times, for any of the reasons people normally store connection
info). The most recent system I set up (bro) records connections with
second-precision timestamps; the one before that (pmacct) stored
aggregates over ten seconds (src,dest,tx,rx).

> Are bidirectional transfer bytecounts
> always included?

You mean the number tx + rx, or the tuple tx,rx as opposed to just
tx or rx? It's almost always the second one (tx,rx).

> Are subsampled packet headers (or contents)
> sometimes/often included? 

Contents storage is rare. Some universities store enough data to
reconstruct most packets[1]; other ISPs usually don't. When full
connection data is stored, it's deleted pretty fast (days or weeks at
most).

Storing a subset of data from packet headers (ports, TOS) is very
common, as is keeping counts of things like checksum mismatches.

> What about UDP sessions? IPv6?

UDP is treated the same as TCP. IPv6 is the same as IPv4. ICMP et
cetera are often stored too; these systems are normally thinking more
in terms of IP packets than TCP segments or UDP datagrams.

> I think for various reasons (including this one), we're soon going to
> want some degree of padding traffic on the Tor network at some point
> relatively soon, and having more information about what is typically
> recorded in these cases would be very useful to inform how we might want
> to design padding and connection usage against this and other issues.

arma or others can probably explain why this is a hard problem; I
don't know enough in this area to comment.

> Information about how UDP is treated would also be useful if/when we
> manage to switch to a UDP transport protocol, independent of any
> padding.

I don't think UDP helps you at all here. What makes you think it might?

Sharif

[0] http://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.html
[1] https://www.bro.org/community/time-machine.html

-- 
OpenPGP: 6FB7 ED25 BFCF 3E22 72AE 6E8C 47D4 CE7F 6B9F DF57
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-relays/attachments/20150813/df6a8d75/attachment.sig>


More information about the tor-relays mailing list