Re: [tor-dev] Metrics Plans

11 Jun 2013

      ...
I can try experimenting with this later on (when we have the full / needed
importer working, e.g.), but it might be difficult to scale indeed (not
sure, of course). Do you have any specific use cases in mind? (actually
curious, could be interesting to hear.)
The advantages of being able to reconstruct Descriptor instances is
simpler usage (and hence more maintainable code). Ie, usage could be
as simple as...

========================================

from tor.metrics import descriptor_db

# Fetches all of the server descriptors for a given date. These are provided as
# instances of...
#
#   stem.descriptor.server_descriptor.RelayDescriptor

for desc in descriptor_db.get_server_descriptors(2013, 1, 1):
  # print the addresses of only the exits

  if desc.exit_policy.is_exiting_allowed():
    print desc.address

========================================

Obviously we'd still want to do raw SQL queries for high traffic
applications. However, for applications where maintainability trumps
speed this could be a nice feature to have.
...
...
* After making the schema update the importer could then run over this
raw data table, constructing Descriptor instances from it and
performing updates for any missing attributes.
I can't say I can easily see the specifics of how all this would work, but
if we had an always-up-to-date data model (mediated by Stem Relay Descriptor
class, but not necessarily), this might work.. (The ORM <-> Stem Descriptor
object mapping itself is trivial, so all is well in that regard.)
I'm not sure if I entirely follow. As I understand it the importer...

* Reads raw rsynced descriptor data.
* Uses it to construct stem Descriptor instances.
* Persists those to the database.

My suggestion is that for the first step it could read the rsynced
descriptors *or* the raw descriptor content from the database itself.
This means that the importer could be used to not only populate new
descriptors, but also back-fill after a schema update.

That is to say, adding a new column would simply be...

* Perform the schema update.
* Run the importer, which...
  * Reads raw descriptor data from the database.
  * Uses it to construct stem Descriptor instances.
  * Performs an UPDATE for anything that's out of sync or missing from
the database.

Cheers! -Damian

Re: [tor-dev] Metrics Plans

Damian Johnson