Database schema for metrics portal

kevin berry xckjb88 at gmail.com
Tue May 25 06:02:42 UTC 2010


Karsten,

This is all extremely helpful. I certainly won't be able to test with the
full amount of data, but we can get a look at the performance implications
of running queries on both normalized and unnormalized schemas.

As far as a timeline - I think 2 weeks is a fair amount of time for the
schema to be finished. Should I finish early (which has a good chance of
happening), I can begin working on having R work with the database directly,
or making/modifying a descriptor database importer for this schema. I'm
really looking to get the db stuff down for a basis from which we can build
off. We can discuss this when the time comes.

Thanks,
Kevin


On 21 May 2010 07:06, Karsten Loesing <karsten.loesing at gmx.net> wrote:

> Hi Kevin,
>
> here are some thoughts on importing relay descriptors into a database.
>
> You may already have noticed that there are three documents holding
> information about a relay in the Tor network: network status entry,
> server descriptor, and extra-info descriptor. In the past, I have used
> three tables and joined them whenever necessary. It turned out that join
> operations are highly inefficient and that it doesn't even make sense to
> join extra-info descriptors with network status entries and server
> descriptors.
>
> Here's an example. The current network status consensus has this entry
> for gabelmoo:
>
> r gabelmoo 8gREE9rC4C49a89HNaGbyh3pcoE gL7MFClJBGG6l3KzCXJPh5fQnl8
> 2010-05-21 02:34:42 80.190.246.100 8080 8180
> s Authority Fast Guard HSDir Running Stable Unnamed V2Dir Valid
> v Tor 0.2.2.13-alpha
> w Bandwidth=512
> p reject 1-65535
>
>
> The referenced server descriptor gL7MFCl... in base64 or
> 80becc1429490461ba9772b309724f8797d09e5f in hex has this content:
>
> router gabelmoo 80.190.246.100 8080 0 8180
> platform Tor 0.2.2.13-alpha (git-a3db6300225c5f34) on Linux i686
> opt protocols Link 1 2 Circuit 1
> published 2010-05-21 02:34:42
> opt fingerprint F204 4413 DAC2 E02E 3D6B CF47 35A1 9BCA 1DE9 7281
> uptime 1154088
> bandwidth 512000 1280000 1166966
> opt extra-info-digest CECC7F83F5CD771BF0DF9827C08F561CE84A5DA3
> opt caches-extra-info
> onion-key
> -----BEGIN RSA PUBLIC KEY-----
> MIGJAoGBANbj/L3FZCbIKNdBKMH1tamU/r+9p+X4fI+1i9jDtHt11D4Rz+rsQ4CX
> 8no0NMx2yOHpvq7Cp8scO2ZcH4UjQ/bJIumgGqjuslWMn9KtrbBKdtNzVtOMNjYY
> 7zjZJ8UVmS1/hH2wl2dwZeAVbQ931u9DWX5R/u1ck28EQS8C2Z/jAgMBAAE=
> -----END RSA PUBLIC KEY-----
> signing-key
> -----BEGIN RSA PUBLIC KEY-----
> MIGJAoGBAMjPhp0QAN8WQeP8sDg+mnM7hnPExcXTOfsmt7Sl2k3fHcHeAJnYu10V
> /hb6RBhubi1HRg6fYF5PwZWOppxRj90WX3n2JEcQh88+4tuQNQ2jOxXQ/hBkyf0w
> klrx6Fh8ana1VX+QXInfRW5z3eaANXqvdCvDG5jfUrz6pj8WGPrxAgMBAAE=
> -----END RSA PUBLIC KEY-----
> opt hidden-service-dir
> contact 1024D/F7C11265 Karsten Loesing <karsten dot loesing AT gmx dot net>
> reject *:*
> router-signature
> -----BEGIN SIGNATURE-----
> FxYg32MXPM2aOoa5q244CgSs0/EUw4nKJsxde0sD2f3zav7dXdBJBTrv0GdPETUC
> iv125vk/zCNNljk2c3bbNAKVnFbBYyBXnJ7/BJL+kSAPYTpQGpIw6RXl3NtWhyZa
> 08weBwj1bY+xZzYzgB7yuhPvYzH4F/Ji6qLzRsTfJp0=
> -----END SIGNATURE-----
>
>
> And finally, the referenced extra-info descriptor cecc7f83... is:
>
> extra-info gabelmoo F2044413DAC2E02E3D6BCF4735A19BCA1DE97281
> published 2010-05-21 02:34:42
> write-history 2010-05-21 02:32:48 (900 s)
>
> 544401408,540996608,642924544,583867392,653665280,637046784,684775424,598684672,586347520,658676736,630778880,719997952,486724608,622078976,748093440,589464576,539460608,801062912,825155584,805288960,791465984,678842368,636246016,630557696,679609344,666486784,621635584,758945792,654635008,513009664,617194496,636549120,748435456,865181696,736525312,715303936,742599680,690369536,870909952,873070592,690373632,661690368,703024128,689483776,837165056,626258944,741469184,897375232,802021376,669841408,719076352,717147136,819868672,731346944,717606912,574259200,624780288,682221568,708850688,666567680,734041088,725368832,799907840,881100800,896225280,907387904,854701056,934236160,910132224,856087552,894130176,831595520,846332928,831883264,925052928,840542208,701243392,741879808,707265536,858382336,931992576,926849024,923104256,888730624,934275072,929859584,938267648,938893312,929412096,928772096,876532736,828409856,885524480,806822912,656070656,632248320
> read-history 2010-05-21 02:32:48 (900 s)
>
> 540297216,534199296,628869120,572840960,648410112,631605248,663860224,588089344,580211712,650313728,608541696,701733888,478575616,611846144,726459392,572938240,526454784,786325504,806251520,796734464,787733504,669499392,621448192,617405440,674842624,660013056,599517184,743075840,644049920,506447872,603792384,622928896,742297600,860128256,712641536,694036480,735109120,682270720,852083712,859357184,683282432,656016384,680463360,673383424,831110144,622981120,720413696,891073536,798917632,669814784,705187840,709612544,818158592,729244672,703837184,564427776,625363968,682468352,691994624,660142080,733149184,725949440,785703936,875602944,896345088,909398016,846983168,933785600,917148672,854514688,884723712,822371328,844339200,821991424,913144832,828115968,695621632,737174528,691364864,847837184,934027264,929199104,912046080,877563904,940472320,937701376,935174144,940459008,940481536,938059776,867339264,817305600,884368384,800684032,641873920,621056000
> dirreq-stats-end 2010-05-20 17:59:54 (86400 s)
> dirreq-v3-ips
>
> us=880,de=752,fr=320,it=224,kr=224,ru=200,gb=176,pl=128,ca=96,ng=88,nl=88,ir=80,jp=80,pk=80,br=72,se=72,at=64,au=56,ch=56,es=56,ph=48,ua=48,mx=40,sn=40,th=40,tr=40,be=32,fi=32,in=32,tw=32,ar=24,bg=24,cz=24,dk=24,eg=24,gh=24,gr=24,ie=24,no=24,ro=24,sg=24,vn=24,a1=16,by=16,ci=16,co=16,dz=16,hk=16,hr=16,hu=16,id=16,il=16,kw=16,kz=16,ma=16,my=16,nz=16,pt=16,sa=16,si=16,sk=16,ve=16,za=16,??=8,a2=8,ae=8,ao=8,aw=8,bd=8,bj=8,bn=8,bo=8,bs=8,cl=8,cr=8,cy=8,do=8,ec=8,ee=8,et=8,eu=8,ge=8,gg=8,gt=8,hn=8,ht=8,iq=8,is=8,jo=8,ke=8,kg=8,lb=8,lt=8,lu=8,lv=8,md=8,mn=8,mt=8,mu=8,nc=8,pa=8,pe=8,pf=8,pr=8,ps=8,qa=8,rs=8,sc=8,sd=8,sm=8,so=8,sv=8,sy=8,tn=8,tt=8,tz=8,uz=8,ye=8
> dirreq-v2-ips ng=8
> dirreq-v3-reqs
>
> us=1992,de=1296,fr=568,ru=344,kr=304,gb=280,it=256,ng=208,nl=200,se=176,pl=152,at=136,ca=112,br=104,jp=104,ir=88,ph=88,ar=80,es=80,pk=80,ua=80,au=72,ch=72,tw=64,fi=56,be=48,cz=48,th=48,tr=48,bg=40,in=40,mx=40,sn=40,dk=32,gr=32,no=32,ro=32,sg=32,eg=24,gh=24,hk=24,id=24,ie=24,il=24,pt=24,sk=24,vn=24,a1=16,a2=16,ae=16,by=16,ci=16,cl=16,co=16,dz=16,ee=16,hr=16,hu=16,kw=16,kz=16,lb=16,lv=16,ma=16,mn=16,my=16,nz=16,ps=16,sa=16,si=16,tn=16,ve=16,za=16,??=8,ao=8,aw=8,bd=8,bj=8,bn=8,bo=8,bs=8,cr=8,cy=8,do=8,ec=8,et=8,eu=8,ge=8,gg=8,gt=8,hn=8,ht=8,iq=8,is=8,jo=8,ke=8,kg=8,lt=8,lu=8,md=8,mt=8,mu=8,nc=8,pa=8,pe=8,pf=8,pr=8,qa=8,rs=8,sc=8,sd=8,sm=8,so=8,sv=8,sy=8,tt=8,tz=8,uz=8,ye=8
> dirreq-v2-reqs ng=8
> dirreq-v3-resp
> ok=8000,not-enough-sigs=0,unavailable=0,not-found=0,not-modified=0,busy=0
> dirreq-v2-resp ok=8,unavailable=0,not-found=0,not-modified=0,busy=0
> dirreq-v2-share 0.00%
> dirreq-v3-share 0.00%
> dirreq-v3-direct-dl
>
> complete=3924,timeout=24,running=0,min=529,d1=49744,d2=84867,q1=103848,d3=119618,d4=165491,md=215631,d6=293536,d7=428605,q3=500448,d8=656839,d9=1507344,max=25818600
> dirreq-v2-direct-dl complete=4,timeout=0,running=0
> dirreq-v3-tunneled-dl
>
> complete=3968,timeout=88,running=4,min=161,d1=24326,d2=51796,q1=61586,d3=71895,d4=91512,md=121749,d6=161907,d7=231554,q3=285958,d8=343692,d9=529069,max=25762200
> dirreq-v2-tunneled-dl complete=0,timeout=0,running=0
> entry-stats-end 2010-05-20 17:59:54 (86400 s)
> entry-ips
>
> de=1832,us=1680,kr=912,fr=680,it=672,ru=416,gb=384,pl=352,ir=336,jp=264,ng=240,ca=216,br=208,es=168,in=152,pk=144,at=128,au=128,nl=128,ph=128,se=128,tr=128,ua=112,ch=104,th=104,mx=96,tw=80,cz=72,ar=64,fi=64,sn=64,be=56,hu=56,ie=56,il=56,tn=56,vn=56,dk=48,gh=48,gr=48,id=48,kw=48,ma=48,no=48,ro=48,sa=48,??=40,bg=40,ci=40,eg=40,hk=40,my=40,pt=40,za=40,co=32,dz=32,nz=32,sg=32,a1=24,by=24,cl=24,kz=24,sk=24,ve=24,a2=16,bd=16,bj=16,hr=16,jo=16,ke=16,lb=16,lt=16,lu=16,pe=16,ps=16,rs=16,sd=16,si=16,sy=16,ye=16,ae=8,af=8,al=8,am=8,ao=8,aw=8,az=8,ba=8,bb=8,bh=8,bn=8,bo=8,bs=8,cm=8,cn=8,cr=8,cy=8,dj=8,do=8,ec=8,ee=8,eu=8,fj=8,fo=8,ge=8,gg=8,gt=8,gy=8,hn=8,ht=8,iq=8,is=8,jm=8,kg=8,ky=8,lk=8,lv=8,ly=8,mc=8,md=8,me=8,mg=8,mk=8,ml=8,mo=8,mr=8,mt=8,mu=8,mv=8,nc=8,ni=8,np=8,pa=8,pf=8,pr=8,py=8,qa=8,sc=8,so=8,sv=8,tg=8,tt=8,tz=8,ug=8,uy=8,uz=8
> cell-stats-end 2010-05-20 17:59:54 (86400 s)
> cell-processed-cells 9890,208,69,27,16,7,5,5,4,2
> cell-queued-cells 2.63,0.77,0.02,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> cell-time-in-queue 911,505,39,11,15,9,9,20,46,197
> cell-circuits-per-decile 11482
> exit-stats-end 2010-05-20 17:59:54 (86400 s)
> exit-kibibytes-written other=0
> exit-kibibytes-read other=0
> exit-streams-opened other=0
> router-signature
> -----BEGIN SIGNATURE-----
> N1CbWArG7rBEXa0r6dprMSh/HkdcpjtRyG0zScAJAYp76j4Nm/bIZz8UbDTwauqN
> G9ysHp+qUf4IAuSDrYeLrSo5mrHuQOEDfEfA04eJ9bS2QzTFYZeO6kEFXlSprnc3
> IC23R/isVEpDqM2qaWKnBIo3mT7bKram7GO1weIDfR0=
> -----END SIGNATURE-----
>
>
> One thing you can see here is that all fields in extra-info descriptors
> have their own timestamps. None of the data in extra-info descriptors
> requires knowledge of the corresponding server descriptor or network
> status entry. If you want to analyze data from the extra-info
> descriptors, you can import them into a table of its own.
>
> So, what about joining network status entries and server descriptors?
> There are statistics when it makes sense to join these documents. For
> example, you might want to evaluate assigned relay flags (Stable, Guard;
> contained in network status entry) and uptime (contained in server
> descriptor) at the same time. In particular, it's almost never useful to
> base statistics solely on server descriptors, because relays are not
> even confirmed to be reachable when you only have a server descriptor.
>
> How do you avoid join operations in your queries? I'd think that having
> 2 import tables for network status entries and server descriptors and 1
> unnormalized table containing all data for aggregation/querying might be
> a good approach. You might be able to update that 1 table by using
> triggers on the other 2 tables. You'll have lots of redundant data in
> your database: all fields in your server descriptors can be contained in
> up to 24 rows, as the maximum amount of time a server descriptor is
> valid is 24 hours. But that's probably fine, as there's not so much
> stuff in server descriptors that's useful for statistics anyway. And
> you'll be excited how fast queries are without having to join tables. :)
>
>
> Also, as discussed on #tor-dev, below are some notes on creating a
> database schema for the BIRT-based metrics portal we had a couple of
> months ago (and only for a very short time). I don't think you can
> re-use everything as-is, but maybe you find some of the ideas useful for
> your database schema.
>
> Best,
> --Karsten
>
>
> Data table
> ----------
>
> The data table only includes entries from network statuses, not from
> server descriptors or extra-info descriptors.
>
>  CREATE TABLE statusentry (
>    validafter TIMESTAMP WITHOUT TIME ZONE NOT NULL,
>    fingerprint CHARACTER(40),
>    hashed_fingerprint CHARACTER(40) NOT NULL,
>    bridge BOOLEAN NOT NULL DEFAULT false,
>    descriptor CHARACTER(40) NOT NULL,
>    authority BOOLEAN NOT NULL DEFAULT false,
>    badexit BOOLEAN NOT NULL DEFAULT false,
>    baddirectory BOOLEAN NOT NULL DEFAULT false,
>    exit BOOLEAN NOT NULL DEFAULT false,
>    fast BOOLEAN NOT NULL DEFAULT false,
>    guard BOOLEAN NOT NULL DEFAULT false,
>    hsdir BOOLEAN NOT NULL DEFAULT false,
>    named BOOLEAN NOT NULL DEFAULT false,
>    stable BOOLEAN NOT NULL DEFAULT false,
>    running BOOLEAN NOT NULL DEFAULT false,
>    unnamed BOOLEAN NOT NULL DEFAULT false,
>    valid BOOLEAN NOT NULL DEFAULT false,
>    v2dir BOOLEAN NOT NULL DEFAULT false,
>    v3dir BOOLEAN NOT NULL DEFAULT false,
>    bandwidthmeasured BIGINT CHECK (bandwidthmeasured >= 0),
>    country CHARACTER(2),
>    PRIMARY KEY (validafter, hashed_fingerprint, bridge));
>
>
> Network size
> ------------
>
> The network size is determined as the mean number of running relays or
> bridges on a given day. These numbers are calculated from the network
> status by counting the number of running relays or bridges over the day
> and dividing them by the number of network statuses per that day:
>
>  CREATE VIEW network_size_v AS
>    SELECT DATE(validafter),
>      SUM(CASE WHEN bridge IS FALSE THEN 1 ELSE 0 END) /
>        relay_statuses_per_day.count AS avg_relays,
>      SUM(CASE WHEN bridge IS TRUE THEN 1 ELSE 0 END) /
>        bridge_statuses_per_day.count AS avg_bridges
>    FROM statusentry,
>      (SELECT COUNT(*) AS count, DATE(validafter) AS date
>        FROM (SELECT DISTINCT validafter FROM statusentry
>          WHERE bridge IS FALSE)
>        distinct_consensuses GROUP BY DATE(validafter))
>        relay_statuses_per_day,
>      (SELECT COUNT(*) AS count, DATE(validafter) AS date
>        FROM (SELECT DISTINCT validafter FROM statusentry
>          WHERE bridge IS TRUE)
>        distinct_consensuses GROUP BY DATE(validafter))
>        bridge_statuses_per_day
>    WHERE statusentry.running IS TRUE
>      AND DATE(validafter) = relay_statuses_per_day.date
>      AND DATE(validafter) = bridge_statuses_per_day.date
>    GROUP BY DATE(validafter), relay_statuses_per_day.count,
>      bridge_statuses_per_day.count
>    ORDER BY DATE(validafter);
>
> The resulting view is 'materialized' by creating a table with the
> contents of the view. Finally, the query role gets select rights for the
> materialized view:
>
>  CREATE TABLE network_size_mv as SELECT * FROM network_size_v;
>
>  GRANT SELECT ON network_size_mv TO query;
>
>
> Platforms and relay versions
> ----------------------------
>
> The reports on platforms and relay versions are based on network status
> entries in combination with relay descriptors. The information whether a
> relay was running and contained in a network status comes from the
> statusentry relation, whereas the simplified platform string comes from
> the descriptor relation. For each date, the number of running relays are
> counted, grouped by platform string, and divided by the number of stored
> network statuses at that day:
>
>  CREATE VIEW platforms_v AS
>    SELECT DATE(validafter) AS date, platform,
>    COUNT(*) / relay_statuses_per_day.count AS count
>    FROM (
>      SELECT COUNT(*) AS count, DATE(validafter) AS date
>      FROM (
>        SELECT DISTINCT validafter
>        FROM statusentry
>        WHERE bridge IS FALSE) distinct_consensuses
>      GROUP BY DATE(validafter)) relay_statuses_per_day
>    JOIN statusentry ON relay_statuses_per_day.date = DATE(validafter)
>    LEFT JOIN descriptor
>      ON statusentry.descriptor = descriptor.descriptor
>    WHERE running IS TRUE AND bridge IS FALSE
>    GROUP BY DATE(validafter), platform, relay_statuses_per_day.count,
>      relay_statuses_per_day.date
>    ORDER BY DATE(validafter), platform;
>
> The mean numbers of relays per day running a certain Tor version are
> determined similarly:
>
>  CREATE VIEW versions_v AS
>    SELECT DATE(validafter) AS date,
>      SUBSTRING(version, 1, 5) AS version,
>      COUNT(*) / relay_statuses_per_day.count AS count
>    FROM (
>      SELECT COUNT(*) AS count, DATE(validafter) AS date
>      FROM (
>        SELECT DISTINCT validafter
>        FROM statusentry
>        WHERE bridge IS FALSE) distinct_consensuses
>      GROUP BY DATE(validafter)) relay_statuses_per_day
>    JOIN statusentry ON relay_statuses_per_day.date = DATE(validafter)
>    LEFT JOIN descriptor
>      ON statusentry.descriptor = descriptor.descriptor
>    WHERE running IS TRUE AND bridge IS FALSE
>    GROUP BY DATE(validafter), SUBSTRING(version, 1, 5),
>      relay_statuses_per_day.count, relay_statuses_per_day.date
>    ORDER BY DATE(validafter), SUBSTRING(version, 1, 5);
>
> Finally, two tables are created with the contents of these two views in
> order to accelerate querying, and select rights are granted to the query
> role:
>
>  CREATE TABLE platforms_mv as SELECT * FROM platforms_v;
>
>  CREATE TABLE versions_mv as SELECT * FROM versions_v;
>
>  GRANT SELECT ON platforms_mv TO query;
>
>  GRANT SELECT ON versions_mv TO query;
>
>
> Bridge churn
> ------------
>
> Bridge churn is visualized as time plot with four lines showing how many
> of the running bridges at a given time are still running 1
> day/week/month later. The fewer bridges are missing after those times,
> the more stable the bridge population is.
>
> The necessary query table for this report is constructed in multiple
> steps. To begin with, the set of bridges in network statuses is reduced
> to the first network status of each day:
>
>  CREATE VIEW bridges_day_v AS
>    SELECT DATE(validafter), hashed_fingerprint
>    FROM statusentry
>    WHERE validafter IN (
>      SELECT MIN(validafter)
>      FROM statusentry
>      WHERE bridge IS true AND running IS true
>      GROUP BY DATE(validafter))
>    AND bridge IS true AND running IS true;
>
> The resulting view is materialized, as it is used by multiple other views:
>
>  CREATE TABLE bridges_day_mv AS SELECT * FROM bridges_day_v;
>
> In the next step, four views are created that determine how many bridges
> overlap in the sets of a given date and a date 1 day/week/month in the
> future from then:
>
>  CREATE VIEW bridges_churn_now_v AS
>    SELECT date, COUNT(*)
>    FROM bridges_day_mv
>    GROUP BY date;
>
>  CREATE VIEW bridges_churn_day_v AS
>    SELECT now.date, COUNT(*)
>    FROM bridges_day_mv now, bridges_day_mv future
>    WHERE now.date = DATE(future.date + interval '1 day')
>      AND now.hashed_fingerprint = future.hashed_fingerprint
>    GROUP BY now.date, future.date;
>
>  CREATE VIEW bridges_churn_week_v AS
>    SELECT now.date, COUNT(*)
>    FROM bridges_day_mv now, bridges_day_mv future
>    WHERE now.date = date(future.date + interval '1 week')
>      AND now.hashed_fingerprint = future.hashed_fingerprint
>    GROUP BY now.date, future.date;
>
>  CREATE VIEW bridges_churn_month_v AS
>    SELECT now.date, COUNT(*)
>    FROM bridges_day_mv now, bridges_day_mv future
>    WHERE now.date = date(future.date + interval '1 month')
>      AND now.hashed_fingerprint = future.hashed_fingerprint
>    GROUP BY now.date, future.date;
>
> In the last step, these four views are joined into one common view:
>
>  CREATE VIEW bridges_churn_v AS
>    SELECT bridges_churn_now_v.date, bridges_churn_now_v.count AS now,
>      bridges_churn_day_v.count AS day,
>      bridges_churn_week_v.count AS week,
>      bridges_churn_month_v.count AS month
>    FROM bridges_churn_now_v
>      LEFT JOIN bridges_churn_day_v
>        ON bridges_churn_now_v.date = bridges_churn_day_v.date
>      LEFT JOIN bridges_churn_week_v
>        ON bridges_churn_day_v.date = bridges_churn_week_v.date
>      LEFT JOIN bridges_churn_month_v
>        ON bridges_churn_week_v.date = bridges_churn_month_v.date
>      ORDER BY bridges_churn_now_v.date;
>
> In order to improve query performance, a materialized view is created
> based on the last view and is given select rights to the query role:
>
>  CREATE TABLE bridges_churn_mv AS SELECT * FROM bridges_churn_v;
>
>  GRANT SELECT ON bridges_churn_mv TO query;
>
>
> Bridge churn histogram
> ----------------------
>
> Bridge churn can be visualized as histogram with bars representing the
> number of bridges that have been seen as running for a certain number of
> days throughout 1 month.  The higher the bars for high numbers of days,
> the more stable the bridge population is.
>
> The underlying data set for this report is built from a sequence of
> views, starting with a view on the bridges seen on the first days of a
> month:
>
>  CREATE VIEW bridges_hist_first_day_v AS
>    SELECT DATE_TRUNC('month', validafter) AS first_day,
>      hashed_fingerprint
>    FROM statusentry
>    WHERE DATE(validafter) IN (
>      SELECT DATE(MIN(validafter))
>      FROM statusentry
>      WHERE bridge IS true AND running IS true
>      GROUP BY DATE_TRUNC('month', validafter))
>    AND bridge IS true AND running IS true
>    GROUP BY DATE_TRUNC('month', validafter), hashed_fingerprint;
>
> In the next step, all days are selected on which these bridges have been
> seen as running:
>
>  CREATE VIEW bridges_hist_seen_v AS
>    SELECT DATE(statusentry.validafter), statusentry.hashed_fingerprint
>    FROM statusentry, bridges_hist_first_day_v
>    WHERE DATE_TRUNC('month', statusentry.validafter) =
>        bridges_hist_first_day_v.first_day
>      AND statusentry.hashed_fingerprint =
>        bridges_hist_first_day_v.hashed_fingerprint
>    GROUP BY DATE(statusentry.validafter), statusentry.hashed_fingerprint;
>
> For each distinct bridge in each month, the number of days that this
> bridge has been seen as running are counted:
>
>  CREATE VIEW bridges_hist_pre_v AS
>    SELECT COUNT(*) AS days, DATE_TRUNC('month', date) AS month
>    FROM bridges_hist_seen_v
>    GROUP BY DATE_TRUNC('month', date), hashed_fingerprint;
>
> Finally, the number of bridges for a given number of days are added up:
>
>  CREATE VIEW bridges_hist_v AS
>    SELECT COUNT(*), days, DATE(month) AS month
>    FROM bridges_hist_pre_v
>    GROUP BY month, days
>    ORDER BY month, days;
>
> A materialized view is created and given the select rights for the query
> role:
>
>  CREATE TABLE bridges_hist_mv AS SELECT * FROM bridges_hist_v;
>
>  GRANT SELECT ON bridges_hist_mv TO query;
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20100525/8d4021e5/attachment.htm>


More information about the tor-dev mailing list