[tor-bugs] #2921 [Metrics]: Improve bulk import of relay descriptors into metrics database

Tor Bug Tracker & Wiki torproject-admin at torproject.org
Fri Apr 15 10:29:19 UTC 2011


#2921: Improve bulk import of relay descriptors into metrics database
-------------------------+--------------------------------------------------
 Reporter:  karsten      |          Owner:  karsten
     Type:  enhancement  |         Status:  new    
 Priority:  normal       |      Milestone:         
Component:  Metrics      |        Version:         
 Keywords:               |         Parent:         
   Points:               |   Actualpoints:         
-------------------------+--------------------------------------------------
 We currently have two ways to import relay descriptors into the metrics
 database:

  - JDBC import:  We have a Java importer that connects to the metrics
 database via JDBC.  We use a few tweaks like committing batches of up to
 500 rows, but importing months of data is still a time-consuming task.

  - psql \copy:  The Java importer can be configured to parse relay
 descriptor files and write files for psql's \copy command.  The
 disadvantage is that \copy cannot handle duplicates very well, so that we
 have to pre-process the bulk import files.

 I wonder if there are better approaches than these two, or if there are
 improvements to how we implement them.  It would be good to compare the
 performance of these two approaches and any improvements to them for 1
 (12, 24) months of data.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2921>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list