Hi,
forgot to reply to this email earlier on..
On Tue, Jun 11, 2013 at 6:02 PM, Damian Johnson <atagar@torproject.org> wrote:
> I can try experimenting with this later on (when we have the full / neededThe advantages of being able to reconstruct Descriptor instances is
> importer working, e.g.), but it might be difficult to scale indeed (not
> sure, of course). Do you have any specific use cases in mind? (actually
> curious, could be interesting to hear.)
simpler usage (and hence more maintainable code).
[...]
Obviously we'd still want to do raw SQL queries for high traffic
applications. However, for applications where maintainability trumps
speed this could be a nice feature to have.
>> * After making the schema update the importer could then run over this
>> raw data table, constructing Descriptor instances from it and
>> performing updates for any missing attributes.
>
> I can't say I can easily see the specifics of how all this would work, but
> if we had an always-up-to-date data model (mediated by Stem Relay Descriptor
> class, but not necessarily), this might work.. (The ORM <-> Stem Descriptor
> object mapping itself is trivial, so all is well in that regard.)
I'm not sure if I entirely follow. As I understand it the importer...
* Reads raw rsynced descriptor data.
* Uses it to construct stem Descriptor instances.
* Persists those to the database.
My suggestion is that for the first step it could read the rsynced
descriptors *or* the raw descriptor content from the database itself.
This means that the importer could be used to not only populate new
descriptors, but also back-fill after a schema update.
That is to say, adding a new column would simply be...
* Perform the schema update.
* Run the importer, which...
* Reads raw descriptor data from the database.
* Uses it to construct stem Descriptor instances.
* Performs an UPDATE for anything that's out of sync or missing from
the database.
> I can try experimenting with this later on (when we have the full / neededThe advantages of being able to reconstruct Descriptor instances is
> importer working, e.g.), but it might be difficult to scale indeed (not
> sure, of course). Do you have any specific use cases in mind? (actually
> curious, could be interesting to hear.)
simpler usage (and hence more maintainable code). Ie, usage could be
as simple as...
========================================
from tor.metrics import descriptor_db
# Fetches all of the server descriptors for a given date. These are provided as
# instances of...
#
# stem.descriptor.server_descriptor.RelayDescriptor
for desc in descriptor_db.get_server_descriptors(2013, 1, 1):
# print the addresses of only the exits
if desc.exit_policy.is_exiting_allowed():
print desc.address
========================================
Obviously we'd still want to do raw SQL queries for high traffic
applications. However, for applications where maintainability trumps
speed this could be a nice feature to have.
I'm not sure if I entirely follow. As I understand it the importer...
>> * After making the schema update the importer could then run over this
>> raw data table, constructing Descriptor instances from it and
>> performing updates for any missing attributes.
>
> I can't say I can easily see the specifics of how all this would work, but
> if we had an always-up-to-date data model (mediated by Stem Relay Descriptor
> class, but not necessarily), this might work.. (The ORM <-> Stem Descriptor
> object mapping itself is trivial, so all is well in that regard.)
* Reads raw rsynced descriptor data.
* Uses it to construct stem Descriptor instances.
* Persists those to the database.
My suggestion is that for the first step it could read the rsynced
descriptors *or* the raw descriptor content from the database itself.
This means that the importer could be used to not only populate new
descriptors, but also back-fill after a schema update.
That is to say, adding a new column would simply be...
* Perform the schema update.
* Run the importer, which...
* Reads raw descriptor data from the database.
* Uses it to construct stem Descriptor instances.
* Performs an UPDATE for anything that's out of sync or missing from
the database.
Cheers! -Damian