[ooni-dev] Some ideas on the visualization of OONI data

David Stainton dstainton415 at gmail.com
Sat Oct 18 17:06:18 UTC 2014

> To be honest, at the time i wrote the email i knew that mongoDB
> provided sharding which should provide horizontal scaling but at the
> time i didn't know how it worked in mongodb, because i hadn't time to
> dig through the docs. Now, after learning a bit more about mongodb i
> still don't know if i know :) but i agree with you, this is not
> distributed.

ah interesting mongodb has built in sharding:

perhaps you are correct about mongo db in that it does seem like it would scale well.

however we have to carefully evaluate several more criteria before
choosing a data store. for instance operational costs should always be evaluated:

Is it a pain to setup? (sharded mongo db seems heavy weight!)
Is it a pain to add a new replica to a replica set?
How are additional shards added?
Does balancing the cluster after adding additional shards kill performance and take a long time? (most likely yes)

> So, i think we should index the reports to provide a query API, this
> still applies, but, should we build a distributed datastore that will
> fit with every deployed collector? or a central respository that grabs
> the reports of the collectors and index them? should we care at all?

Yes... "indexed" reports sound much easier to work with than just the reports...
however it is not yet clear that we really need the datastore to be distributed.
Highly or mostly highly availability might be a requirement for this project.
That is much easier to accomplish!

OK... so if we go with one of these CF (column family) data stores... then we must keep in mind
the types of queries we will need when creating the schema. Another possibility would be Redis.
It supports a set-theoretic query language... Also I've heard good things about CouchDB.
I think we should look at these different datastore possibilities and discuss potential schema
and query design for our project. I suspect a discussion of schema and query patterns will be more useful
than discussing operational properties especially if a centralized datastore is good enough.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/ooni-dev/attachments/20141018/160459f9/attachment.sig>

More information about the ooni-dev mailing list