[tor-dev] Get Stem and zoossh to talk to each other

l.m ter.one.leeboi at hush.com
Sun Aug 16 11:43:07 UTC 2015


Hi Philipp,

First, thank you for the input. I will certainly review your
discussions with other measurement team members. I'm sorry I wasn't
able to attend.

On the subject of databases and why they're a kludge. Databases
represent relationships between data as joins. Joins are a construct
which must be maintained by the database which must persist or be
enforced by integrity constraints. A database may be useful to store
data in it's final form, and to represent relationships between such
entities. It requires computation in an interpreted language and joins
are not represented using formal math. (In a matter of speaking
database theory does encompass some abstract math objects in the form
of sets). Storing data and representing known relationships is what a
database is designed for. Analyzing data and finding dynamic
relationships is something a database will never do well--it's outside
the intended use. Formal (mathematically) methods for representing
semantics can always be proved correct using rigorous methods, and
will always be faster. Imagine if tor's path selection algorithm were
implemented as a database. It would work but the math-derived
implementation will also be vastly superior.

Allow me to clarify further. The formal language described here is
used to derive subset languages. In a matter of speaking the base
language is a representation of tor's network communication. By adding
additional grammar to this language a researcher can define formally
the semantic relationships that hold particular interest or meaning.
One researcher, who is only interested in onionoo-like applications
(which is me in this case, not Karsten) would create a grammar
describing such content. Another who is interested in a particular
class of analysis might have another grammar. Right now my objective
in the forks is to make this possible (it's not currently). 

The advantage is it's easy to maintain for researchers, easy to
maintain for developers, easy to create proofs on the system, easy to
implement formal validation methods (which you may really want for
some important classes of research).

So there's really not a language to learn per-se. It's a formal method
of making all that tor-network gibberish make sense. Once you've
described the semantic meaning it's *all* automatic. Want that
semantic relationship to build a shiny viz in R--automatic. Want those
semantics to trigger an email for censorship--automatic. Would you
rather have a report and a graph describing nodes involved in a
potential attack--automatic. Would you like to create JSON
representations of related entity--automatic.

Strangely, in the history of analysis at tor project, no one has
tried, and it is not implemented in any reusable/presentable form. I
very much doubt a potential-sponsor would be willing to sponsor work
on metrics-lib, because it's basically useless for analysis (same as
the others I've mentioned). A researcher has to do too much work to
perform analysis to see tor project as having contributed to making it
easy.

I hope that clears things about having to learn a language. Although
that's also possible, the techniques are not being used here to create
a programming language. The techniques are being used to perform
linguistics on tor data. It's possible however to extend this work to
define a language for programming, but that's not the primary
objective. (An implementation, such as I describe, would make that
possible in a formal way--which is good of course)

Regards
--leeroy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20150816/070842f1/attachment.html>


More information about the tor-dev mailing list