Hi George,
You sell yourself short. It was a good first attempt. Now I should clarify. The last time I spoke to Karsten about this they indicated that the measurement team has other priorities (not obvious from the outdated roadmap). Karsten quoted an approximation of a year+ before a replacement is expected.
I'm just an anon to them so I cannot change these things. I hope that clarifies your question of interest.
On the other hand, my interest in the censorship detector started as an improvement to metrics-lib and onionoo. In it's basic form the fork takes the data, recognizes patterns using applied linguistics, and performs some actions. Getting the data for analysis of censorship is in some ways a simplification. However progress will be slower than you might like because the effort here will be split between this and the fork of metrics-lib.
I really do appreciate your interest (and that of Joss) so I'd like to keep this discussion going.
In the paper by Joss Wright et al, events besides just censorship were found to be of use as an indicator of an environment where censoring services leads to an increase in tor use. This sounds like the database you mention. If such a database included events like China's attack on GitHub, or Turkey blocking twitter, or various other social-political indicators, this would make for a concrete improvement from the perspective of public-research stakeholders. I was also inspired by a recent paper that showed how linguistics can be applied to sample the social-political discourse to predict events. In the absence of data for a country, and service, if social indicators show dissatisfaction with a policy to block the service, you can consider this an entry to the database. Over time this sampling would lead to differing discourses which could be used to not just predict anomalies but to help identify why people use tor, and what motivates the censor. The only downside here is I'm not fluent in multiple spoken languages, so there may be some loss of context if the data source is chosen arbitrarily.
When it comes to distinguishing reachability and interference, a client may try to use tor at a laundry center in an otherwise `democratic` and `free` country. This location is independently controlled by the owner, and if they decide to block tor, that's ok. That shouldn't be included. This type of event is unlikely to influence results terribly anyway. I do wish OONI Project could help more here.
That just leaves the tor project developer stakeholder. I think I will leave this stakeholder to it's own devices. It's questionable to ask someone who's being censored to run any test without some assurance of their safety.
That's all from me for now.
Danke
--leeroy