[metrics-team] collector bridge IP:port masking code issue?

Mon Oct 23 07:59:02 UTC 2017

On 2017-10-20 22:46, Iain R. Learmonth wrote:
> Hi,

Hi nusenu and irl,

> On 20/10/17 20:19, nusenu wrote:
>>> To ensure anonymity of bridges, all informations about
>>> bridges in several files are obfuscated. In those files: data are changed, the real
>>> IP address is changed into a 10/8 format, the real port and the fingerprint are
>>> changed to keep anonymity of these special nodes.
> 
> This would appear to be a version of the description of what CollecTor
> does, or at least what comes out of Onionoo.

Yep. And as of a few months ago we even have a specification on Tor Metrics:

https://metrics.torproject.org/bridge-descriptors.html

>>> To find a match between those files and the bridges we have extracted, we
>>> have studied the source code of TOR and written a small program which con-
>>> verts the fingerprint into the hashed fingerprint. The hashed fingerprint is a
>>> simple SHA-1 hash of the fingerprint but format issues appeared during the
>>> implementation.
> 
> It's Tor not TOR. Programs to produce SHA-1 hashes already exist.

Yep. Here's some Python that does this trick:

from binascii import *
from hashlib import sha1
sha1(a2b_hex("f2044413dac2e02e3d6bcf4735a19bca1de97281")).hexdigest()

My guess is that they omitted the a2b_hex step and passed the hex string
to sha1.

I hope that the new specification document above makes this clear
enough. Are there ways to make that even clearer? Include sample code in
the specification? Use a more formal notation?

>>> The fingerprint is a 20-byte hexadecimal string which is then converted into
>>> a char string. Then SHA-1 is applied and the hashed fingerprint is converted
>>> again in hexadecimal. At first, we have collected different files like consensus
>>> file regularly to get a list of hashed fingerprints. Automated bridge extractionenabled to have the real IP address and the real port as well as the fingerprint.
> 
> This sounds like they've abused the mail gateway or similar. With enough
> time, anyone can do this. There's nothing clever going on here. I don't
> know about all the limiting around it, but if you've got 4 months, I'm
> not surprised you can get a lot of bridges out.
> 
> All this does is permit those bridges to be censored. Private bridges
> won't be affected.
> 
> I believe BridgeDB is also written in such a way that others may be able
> to run bridge authorities to distribute the bridges they run, so it's
> not just one source of bridges.

I can't comment on the actual mechanism they used to harvest bridges.

But just in case somebody finds a smart way to learn about bridges,
maybe there should be a way for them to prove their findings that
doesn't involve publishing 2k bridge IP addresses.

Should we think about including something in the descriptor that can
only be "decrypted" when knowing the original fingerprint of a bridge?

All the best,
Karsten

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 528 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/metrics-team/attachments/20171023/a5a13f96/attachment.sig>