January 2019 - tor-commits - lists.torproject.org

[translation/donatepages-messagespot] Update translations for donatepages-messagespot
by translation＠torproject.org 11 Jan '19

11 Jan '19

commit 765ac87c461191783dbbc5831ee0b8d6bbc65486 Author: Translation commit bot <translation(a)torproject.org> Date: Fri Jan 11 11:45:23 2019 +0000 Update translations for donatepages-messagespot --- locale/ru/LC_MESSAGES/messages.po | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/locale/ru/LC_MESSAGES/messages.po b/locale/ru/LC_MESSAGES/messages.po index f3da7aaf7..214c405ed 100644 --- a/locale/ru/LC_MESSAGES/messages.po +++ b/locale/ru/LC_MESSAGES/messages.po @@ -706,6 +706,8 @@ msgid "" "We need people to run relays, write code, organize the community and spread " "the word about our good work." msgstr "" +"Нам нужны люди чтобы поддерживать сеть, писать код, организовывать " +"сообщество и распространять информацию о нашей хорошей работе." #: tmp/cache_locale/54/5420828d7720daccac45a05e74a0bdde5ef138020bd4901a7e81ad8817d3f8e8.php:129 msgid "Learn how you can help." @@ -1207,6 +1209,8 @@ msgid "" "If you pay taxes in the United States, your donation to Tor is tax " "deductible to the full extent required by law." msgstr "" +"Если вы платите налоги в Соединённых Штатах, ваше пожертвование Tor не " +"облагается налогом в полной мере, требуемой законом." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:246 msgid "Following is information you may need for reporting purposes:"

1 0

[translation/donatepages-messagespot] Update translations for donatepages-messagespot
by translation＠torproject.org 11 Jan '19

11 Jan '19

commit 6c871d816d3dfbd9f1f115114908faa7719ee067 Author: Translation commit bot <translation(a)torproject.org> Date: Fri Jan 11 11:15:23 2019 +0000 Update translations for donatepages-messagespot --- locale/ru/LC_MESSAGES/messages.po | 51 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 4 deletions(-) diff --git a/locale/ru/LC_MESSAGES/messages.po b/locale/ru/LC_MESSAGES/messages.po index a4a0072f7..f3da7aaf7 100644 --- a/locale/ru/LC_MESSAGES/messages.po +++ b/locale/ru/LC_MESSAGES/messages.po @@ -503,7 +503,7 @@ msgstr "Комментарии" #: tmp/cache_locale/93/936f5ca9f26662b60293a725343573df95cb28c99d7c3f12b1c94ed37a453012.php:476 #: tmp/cache_locale/04/0421bb9119a5b92b0e2e4a49c25d718283ccfa1495534b2a08ff967a0f4fd06a.php:470 msgid "Donating:" -msgstr "" +msgstr "Пожертвование:" #: tmp/cache_locale/93/936f5ca9f26662b60293a725343573df95cb28c99d7c3f12b1c94ed37a453012.php:483 #: tmp/cache_locale/04/0421bb9119a5b92b0e2e4a49c25d718283ccfa1495534b2a08ff967a0f4fd06a.php:477 @@ -1200,7 +1200,7 @@ msgstr "" #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:240 msgid "Is my donation tax-deductible?" -msgstr "" +msgstr "Моё пожертвование не облагается налогом?" #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:244 msgid "" @@ -1262,12 +1262,16 @@ msgid "" "That would be a big administrative burden for a small organization, and we " "don't think it's a good idea for us." msgstr "" +"Это было бы большим бременем для маленькой организации, и мы не думаем, что " +"это хорошая идея для нас." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:286 msgid "" "However, we would be very happy to hear your ideas and feedback about our " "work." msgstr "" +"Тем не менее, мы были бы рады услышать ваши идеи и получить обратную связь, " +"связанную с нашей работой." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:288 msgid "" @@ -1348,6 +1352,8 @@ msgid "" "This allows our payment processor to verify your identity, process your " "payment, and prevent fraudulent charges to your credit card." msgstr "" +"Это позволяет нашей платёжной системе подтвердить вашу личность, произвести " +"перевод и предотвратить обвинения в мошенничестве с вашей кредитной картой." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:332 msgid "" @@ -1366,12 +1372,18 @@ msgid "" "People who have stolen credit card information often donate to nonprofits as" " a way of testing whether the card works." msgstr "" +"Люди, обладающие краденой информацией о кредитных картах, часто совершают " +"пожертвования некоммерческим организациям, чтобы проверить, верна ли их " +"информация о кредитной карте. " #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:344 msgid "" "These people typically use a very small amount for their testing, and we've " "found that setting a $1 minimum donation seems to deter them." msgstr "" +"Эти люди обычно используют очень маленький объём денежных средств для " +"тестирования, и мы обнаружили, что минимальная сумма пожертвования в $1, " +"кажется, отпугивает их." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:350 msgid "Is there a maximum donation?" @@ -1415,6 +1427,9 @@ msgid "" "href=\"https://www.torproject.org/donate/donate-" "options.html.en#cash\">sending us a postal money order</a>." msgstr "" +"Вы можете пожертвовать нас с помощью<a class=\"hyperlinks links\" " +"target=\"_blank\" href=\"https://www.torproject.org/donate/donate-" +"options.html.en#cash\">денежного почтового перевода</a>." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:376 msgid "" @@ -1427,12 +1442,15 @@ msgstr "" #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:378 msgid "You can buy cash gift cards and mail them to us." msgstr "" +"Вы можете приобрести денежные подарочные карты и отправить нам их почтой." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:380 msgid "" "There are probably other ways to donate anonymously that we haven't thought " "of-- maybe you will :)" msgstr "" +"Вероятно, существуют другие пути для анонимных пожертвований, о которых мы " +"не слышали - может быть вы их знаете :)" #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:387 msgid "" @@ -1459,18 +1477,24 @@ msgid "" "We are not required to identify donors to any other organization or " "authority, and we do not." msgstr "" +"Мы не обязаны раскрывать имена жертвователей какой-либо организации или " +"властям, и мы этого не делаем." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:397 msgid "" "(Also, if you wanted, you could give us $4,999 in late 2018 and $4,999 in " "early 2019.)" msgstr "" +"(А так же, если хотите, вы можете превести нам $4,999 подзно в 2018 и $4,999" +" рано в 2019 годах.)" #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:403 msgid "" "In your privacy policy, you say you will never publicly identify me as a " "donor without my permission." msgstr "" +"В вашей политике конфиденциальности указано, что вы никогда не будете " +"разглашать моё имя, как жертвователя, без моего согласия." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:405 msgid "What does that mean?" @@ -1485,12 +1509,16 @@ msgid "" "If you donate to the Tor Project, there will be some people at the Tor " "Project who know about your donation." msgstr "" +"Если вы совершите пожервование Tor Project, будут некоторые люди в самом Tor" +" Project, которые будут знать о вашем пожертвовании." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:413 msgid "" "However, we will never publicly identify you as a donor, unless you have " "given us permission to do so." msgstr "" +"Тем не менее, мы никогда не будем разглашать ваше имя как имя жертвователя, " +"кроме случаем, когда вы даёте нам своё разрешение." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:415 msgid "" @@ -1498,12 +1526,18 @@ msgid "" "do anything else that would publicly identify you as someone who has " "donated." msgstr "" +"Это означает, что мы не будем публиковать ваше имя на нашем сайте, " +"благодарить вас в Twitter или делать что-либо ещё, указывающее не то, что вы" +" переводили нам пожертвования." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:417 msgid "" "If we decide we would like to publicly name you as a donor, we will ask you " "first, and will not do it until and unless you say it's okay." msgstr "" +"Если мы захотим опубликовать ваше имя как жертвователя, вы спросим вас об " +"этом однажды и не будем спрашивать до тех пор, пока вы не скажете, что более" +" не хотите этого." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:423 msgid "" @@ -1553,23 +1587,28 @@ msgstr "" #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:448 msgid "What is your donor privacy policy?" -msgstr "" +msgstr "Какова политика конфиденциальности по отношению к пожертвовавшим?" #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:452 msgid "" "Here is the Tor Project <a class=\"hyperlinks links\" target=\"_blank\" " "href=\"/%langcode%/privacy-policy\">donor privacy policy</a>." msgstr "" +"Вот здесь <a class=\"hyperlinks links\" target=\"_blank\" href=\"/%langcode" +"%/privacy-policy\">политика конфиденциальности для пожервовавших</a> Tor " +"Project." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:458 msgid "What is your refund policy?" -msgstr "" +msgstr "Какова политика возврата пожертвований?" #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:462 msgid "" "If you want your donation refunded, please tell us by emailing <span " "class=\"email\">giving(at)torproject.org</span>." msgstr "" +"Если вы хотите вернуть ваше пожертвование, пожалуйста сообщите нам об этом " +"по электронной почте: <span class=\"email\">giving(at)torproject.org</span>." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:464 msgid "" @@ -1577,10 +1616,14 @@ msgid "" "amount you donated, your full name, the payment method you used and your " "country of origin." msgstr "" +"Чтобы вернуть пожертвование нам будет необходимо знать дату вашего " +"пожертвование, количество, ваше полное имя, метод оплаты который вы " +"использовали и страну происхождения." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:466 msgid "Please also tell us why you're asking for a refund." msgstr "" +"Также, пожалуйста, сообщите нам, почему вы просите о возврате средств." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:468 msgid ""

1 0

[translation/support-portal] Update translations for support-portal
by translation＠torproject.org 11 Jan '19

11 Jan '19

commit 98afeda0f7d9e75347e29923b5d247743fd27107 Author: Translation commit bot <translation(a)torproject.org> Date: Fri Jan 11 10:49:36 2019 +0000 Update translations for support-portal --- contents+ru.po | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/contents+ru.po b/contents+ru.po index 34c0c0a18..f1e6f6e3a 100644 --- a/contents+ru.po +++ b/contents+ru.po @@ -1635,7 +1635,7 @@ msgstr "" #: http//localhost/tbb/get-rid-of-captchas/ #: (content/tbb/tbb-35/contents+en.lrquestion.seo_slug) msgid "get-rid-of-captchas" -msgstr "" +msgstr "избавиться-от-капч" #: http//localhost/tbb/run-multible-instances-of-tor-browser/ #: (content/tbb/tbb-36/contents+en.lrquestion.title) @@ -1783,7 +1783,7 @@ msgstr "" #: http//localhost/tbb/network-admin-know-i-am-using-tor/ #: (content/tbb/tbb-38/contents+en.lrquestion.seo_slug) msgid "network-admin-know-i-am-using-tor" -msgstr "" +msgstr "сетевой-администратор-знает-что-я-использую-Tor" #: http//localhost/tbb/tor-browser-issues-facebook-twitter-websites/ #: (content/tbb/tbb-39/contents+en.lrquestion.title) @@ -2035,7 +2035,7 @@ msgstr "" #: http//localhost/tbb/make-tor-browser-default-browser/ #: (content/tbb/tbb-6/contents+en.lrquestion.seo_slug) msgid "make-tor-browser-default-browser" -msgstr "" +msgstr "сделать-Tor-браузер-браузером-по-умолчанию" #: http//localhost/tbb/website-blocking-access-over-tor/ #: (content/tbb/tbb-7/contents+en.lrquestion.title) @@ -4010,7 +4010,7 @@ msgstr "" #: http//localhost/misc/tor-glossary/ #: (content/misc/glossary/contents+en.lrquestion.title) msgid "General Glossary" -msgstr "" +msgstr "Общий Голоссарий" #: http//localhost/misc/tor-glossary/ #: (content/misc/glossary/contents+en.lrquestion.description)

1 0

[translation/donatepages-messagespot] Update translations for donatepages-messagespot
by translation＠torproject.org 11 Jan '19

11 Jan '19

commit 14826db7ea0bde3ed209140b4407f9b9670080c1 Author: Translation commit bot <translation(a)torproject.org> Date: Fri Jan 11 10:45:25 2019 +0000 Update translations for donatepages-messagespot --- locale/ru/LC_MESSAGES/messages.po | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/locale/ru/LC_MESSAGES/messages.po b/locale/ru/LC_MESSAGES/messages.po index 6344b5597..a4a0072f7 100644 --- a/locale/ru/LC_MESSAGES/messages.po +++ b/locale/ru/LC_MESSAGES/messages.po @@ -12,12 +12,12 @@ # Vladislav Berg <vladislavp(a)tuta.io>, 2018 # Lowrider <pams(a)imail.ru>, 2019 # Kristina Tyskiewicz <savannah5k(a)yahoo.com>, 2019 -# Evgeny Aleksandrov <5678lutya(a)gmail.com>, 2019 # diana azaryan <dianazryn(a)gmail.com>, 2019 +# Evgeny Aleksandrov <5678lutya(a)gmail.com>, 2019 # msgid "" msgstr "" -"Last-Translator: diana azaryan <dianazryn(a)gmail.com>, 2019\n" +"Last-Translator: Evgeny Aleksandrov <5678lutya(a)gmail.com>, 2019\n" "Language-Team: Russian (https://www.transifex.com/otf/teams/1519/ru/)\n" "Language: ru\n" "Plural-Forms: nplurals=4; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : n%10==0 || (n%10>=5 && n%10<=9) || (n%100>=11 && n%100<=14)? 2 : 3);\n" @@ -1274,6 +1274,8 @@ msgid "" "If you're donating using a mechanism that allows for comments, feel free to " "send your thoughts that way." msgstr "" +"Если вы совершаете пожертвование с помощью механизма, позволяющего оставить " +"сообщение, то не стесняйтесь отправлять свои мысли таким путём" #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:294 msgid "Can I donate while using Tor Browser?" @@ -1338,6 +1340,8 @@ msgid "" "required to process your credit card payment, including your billing " "address." msgstr "" +"Если вы переведёте пожертвование с кредитной карты, вас попросят указать " +"необходимую для перевода денег информацию, включая ваш платёжный адрес." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:330 msgid "" @@ -1350,6 +1354,8 @@ msgid "" "We don't ask for information beyond what's required by the payment " "processor." msgstr "" +"Мы не запрашиваем информацию, помимо той, которая необходима для процесса " +"перевода пожертвований. " #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:338 msgid "Why is there a minimum donation?" @@ -1415,6 +1421,8 @@ msgid "" "You can donate via bitcoin if you have bitcoin set up in a way that " "preserves your anonymity." msgstr "" +"Вы можете перевести пожертвование через Bitcoin, если у вас есть способ, " +"который защищает вашу анонимность." #: tmp/cache_locale/4a/4ab2d928dab25aeb8c96bb2d1c2ad651173d6c029f40a442edf6925bfd038cd2.php:378 msgid "You can buy cash gift cards and mail them to us."

1 0

[metrics-web/master] Simplify Rserve setup.
by karsten＠torproject.org 11 Jan '19

11 Jan '19

commit e82de493279a0e74b55e5fd66a4056a1cecf19c5 Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Fri Jan 11 11:39:12 2019 +0100 Simplify Rserve setup. --- src/main/R/rserver/Rserv.conf | 2 - src/main/R/rserver/graphs.R | 1539 ------------------------------------ src/main/R/rserver/rserve-init.R | 1609 +++++++++++++++++++++++++++++++++++++- src/main/R/rserver/tables.R | 58 -- 4 files changed, 1600 insertions(+), 1608 deletions(-) diff --git a/src/main/R/rserver/Rserv.conf b/src/main/R/rserver/Rserv.conf deleted file mode 100644 index 1fb3039..0000000 --- a/src/main/R/rserver/Rserv.conf +++ /dev/null @@ -1,2 +0,0 @@ -workdir /srv/metrics.torproject.org/metrics/website/rserve/workdir -source rserve-init.R diff --git a/src/main/R/rserver/graphs.R b/src/main/R/rserver/graphs.R deleted file mode 100644 index 0d7a90c..0000000 --- a/src/main/R/rserver/graphs.R +++ /dev/null @@ -1,1539 +0,0 @@ -countrylist <- list( - "ad" = "Andorra", - "ae" = "the United Arab Emirates", - "af" = "Afghanistan", - "ag" = "Antigua and Barbuda", - "ai" = "Anguilla", - "al" = "Albania", - "am" = "Armenia", - "an" = "the Netherlands Antilles", - "ao" = "Angola", - "aq" = "Antarctica", - "ar" = "Argentina", - "as" = "American Samoa", - "at" = "Austria", - "au" = "Australia", - "aw" = "Aruba", - "ax" = "the Aland Islands", - "az" = "Azerbaijan", - "ba" = "Bosnia and Herzegovina", - "bb" = "Barbados", - "bd" = "Bangladesh", - "be" = "Belgium", - "bf" = "Burkina Faso", - "bg" = "Bulgaria", - "bh" = "Bahrain", - "bi" = "Burundi", - "bj" = "Benin", - "bl" = "Saint Bartelemey", - "bm" = "Bermuda", - "bn" = "Brunei", - "bo" = "Bolivia", - "bq" = "Bonaire, Sint Eustatius and Saba", - "br" = "Brazil", - "bs" = "the Bahamas", - "bt" = "Bhutan", - "bv" = "the Bouvet Island", - "bw" = "Botswana", - "by" = "Belarus", - "bz" = "Belize", - "ca" = "Canada", - "cc" = "the Cocos (Keeling) Islands", - "cd" = "the Democratic Republic of the Congo", - "cf" = "Central African Republic", - "cg" = "Congo", - "ch" = "Switzerland", - "ci" = "Côte d'Ivoire", - "ck" = "the Cook Islands", - "cl" = "Chile", - "cm" = "Cameroon", - "cn" = "China", - "co" = "Colombia", - "cr" = "Costa Rica", - "cu" = "Cuba", - "cv" = "Cape Verde", - "cw" = "Curaçao", - "cx" = "the Christmas Island", - "cy" = "Cyprus", - "cz" = "the Czech Republic", - "de" = "Germany", - "dj" = "Djibouti", - "dk" = "Denmark", - "dm" = "Dominica", - "do" = "the Dominican Republic", - "dz" = "Algeria", - "ec" = "Ecuador", - "ee" = "Estonia", - "eg" = "Egypt", - "eh" = "the Western Sahara", - "er" = "Eritrea", - "es" = "Spain", - "et" = "Ethiopia", - "fi" = "Finland", - "fj" = "Fiji", - "fk" = "the Falkland Islands (Malvinas)", - "fm" = "the Federated States of Micronesia", - "fo" = "the Faroe Islands", - "fr" = "France", - "ga" = "Gabon", - "gb" = "the United Kingdom", - "gd" = "Grenada", - "ge" = "Georgia", - "gf" = "French Guiana", - "gg" = "Guernsey", - "gh" = "Ghana", - "gi" = "Gibraltar", - "gl" = "Greenland", - "gm" = "Gambia", - "gn" = "Guinea", - "gp" = "Guadeloupe", - "gq" = "Equatorial Guinea", - "gr" = "Greece", - "gs" = "South Georgia and the South Sandwich Islands", - "gt" = "Guatemala", - "gu" = "Guam", - "gw" = "Guinea-Bissau", - "gy" = "Guyana", - "hk" = "Hong Kong", - "hm" = "Heard Island and McDonald Islands", - "hn" = "Honduras", - "hr" = "Croatia", - "ht" = "Haiti", - "hu" = "Hungary", - "id" = "Indonesia", - "ie" = "Ireland", - "il" = "Israel", - "im" = "the Isle of Man", - "in" = "India", - "io" = "the British Indian Ocean Territory", - "iq" = "Iraq", - "ir" = "Iran", - "is" = "Iceland", - "it" = "Italy", - "je" = "Jersey", - "jm" = "Jamaica", - "jo" = "Jordan", - "jp" = "Japan", - "ke" = "Kenya", - "kg" = "Kyrgyzstan", - "kh" = "Cambodia", - "ki" = "Kiribati", - "km" = "Comoros", - "kn" = "Saint Kitts and Nevis", - "kp" = "North Korea", - "kr" = "the Republic of Korea", - "kw" = "Kuwait", - "ky" = "the Cayman Islands", - "kz" = "Kazakhstan", - "la" = "Laos", - "lb" = "Lebanon", - "lc" = "Saint Lucia", - "li" = "Liechtenstein", - "lk" = "Sri Lanka", - "lr" = "Liberia", - "ls" = "Lesotho", - "lt" = "Lithuania", - "lu" = "Luxembourg", - "lv" = "Latvia", - "ly" = "Libya", - "ma" = "Morocco", - "mc" = "Monaco", - "md" = "the Republic of Moldova", - "me" = "Montenegro", - "mf" = "Saint Martin", - "mg" = "Madagascar", - "mh" = "the Marshall Islands", - "mk" = "Macedonia", - "ml" = "Mali", - "mm" = "Burma", - "mn" = "Mongolia", - "mo" = "Macau", - "mp" = "the Northern Mariana Islands", - "mq" = "Martinique", - "mr" = "Mauritania", - "ms" = "Montserrat", - "mt" = "Malta", - "mu" = "Mauritius", - "mv" = "the Maldives", - "mw" = "Malawi", - "mx" = "Mexico", - "my" = "Malaysia", - "mz" = "Mozambique", - "na" = "Namibia", - "nc" = "New Caledonia", - "ne" = "Niger", - "nf" = "Norfolk Island", - "ng" = "Nigeria", - "ni" = "Nicaragua", - "nl" = "the Netherlands", - "no" = "Norway", - "np" = "Nepal", - "nr" = "Nauru", - "nu" = "Niue", - "nz" = "New Zealand", - "om" = "Oman", - "pa" = "Panama", - "pe" = "Peru", - "pf" = "French Polynesia", - "pg" = "Papua New Guinea", - "ph" = "the Philippines", - "pk" = "Pakistan", - "pl" = "Poland", - "pm" = "Saint Pierre and Miquelon", - "pn" = "the Pitcairn Islands", - "pr" = "Puerto Rico", - "ps" = "the Palestinian Territory", - "pt" = "Portugal", - "pw" = "Palau", - "py" = "Paraguay", - "qa" = "Qatar", - "re" = "Reunion", - "ro" = "Romania", - "rs" = "Serbia", - "ru" = "Russia", - "rw" = "Rwanda", - "sa" = "Saudi Arabia", - "sb" = "the Solomon Islands", - "sc" = "the Seychelles", - "sd" = "Sudan", - "se" = "Sweden", - "sg" = "Singapore", - "sh" = "Saint Helena", - "si" = "Slovenia", - "sj" = "Svalbard and Jan Mayen", - "sk" = "Slovakia", - "sl" = "Sierra Leone", - "sm" = "San Marino", - "sn" = "Senegal", - "so" = "Somalia", - "sr" = "Suriname", - "ss" = "South Sudan", - "st" = "São Tomé and Príncipe", - "sv" = "El Salvador", - "sx" = "Sint Maarten", - "sy" = "the Syrian Arab Republic", - "sz" = "Swaziland", - "tc" = "Turks and Caicos Islands", - "td" = "Chad", - "tf" = "the French Southern Territories", - "tg" = "Togo", - "th" = "Thailand", - "tj" = "Tajikistan", - "tk" = "Tokelau", - "tl" = "East Timor", - "tm" = "Turkmenistan", - "tn" = "Tunisia", - "to" = "Tonga", - "tr" = "Turkey", - "tt" = "Trinidad and Tobago", - "tv" = "Tuvalu", - "tw" = "Taiwan", - "tz" = "the United Republic of Tanzania", - "ua" = "Ukraine", - "ug" = "Uganda", - "um" = "the United States Minor Outlying Islands", - "us" = "the United States", - "uy" = "Uruguay", - "uz" = "Uzbekistan", - "va" = "Vatican City", - "vc" = "Saint Vincent and the Grenadines", - "ve" = "Venezuela", - "vg" = "the British Virgin Islands", - "vi" = "the United States Virgin Islands", - "vn" = "Vietnam", - "vu" = "Vanuatu", - "wf" = "Wallis and Futuna", - "ws" = "Samoa", - "xk" = "Kosovo", - "ye" = "Yemen", - "yt" = "Mayotte", - "za" = "South Africa", - "zm" = "Zambia", - "zw" = "Zimbabwe") - -countryname <- function(country) { - res <- countrylist[[country]] - if (is.null(res)) - res <- "no-man's-land" - res -} - -# Helper function that takes date limits as input and returns major breaks as -# output. The main difference to the built-in major breaks is that we're trying -# harder to align major breaks with first days of weeks (Sundays), months, -# quarters, or years. -custom_breaks <- function(input) { - scales_index <- cut(as.numeric(max(input) - min(input)), - c(-1, 7, 12, 56, 180, 600, 2000, Inf), labels = FALSE) - from_print_format <- c("%F", "%F", "%Y-W%U-7", "%Y-%m-01", "%Y-01-01", - "%Y-01-01", "%Y-01-01")[scales_index] - from_parse_format <- ifelse(scales_index == 3, "%Y-W%U-%u", "%F") - by <- c("1 day", "2 days", "1 week", "1 month", "3 months", "1 year", - "2 years")[scales_index] - seq(as.Date(as.character(min(input), from_print_format), - format = from_parse_format), max(input), by = by) -} - -# Helper function that takes date limits as input and returns minor breaks as -# output. As opposed to the built-in minor breaks, we're not just adding one -# minor break half way through between two major breaks. Instead, we're plotting -# a minor break for every day, week, month, or quarter between two major breaks. -custom_minor_breaks <- function(input) { - scales_index <- cut(as.numeric(max(input) - min(input)), - c(-1, 7, 12, 56, 180, 600, 2000, Inf), labels = FALSE) - from_print_format <- c("%F", "%F", "%F", "%Y-W%U-7", "%Y-%m-01", "%Y-01-01", - "%Y-01-01")[scales_index] - from_parse_format <- ifelse(scales_index == 4, "%Y-W%U-%u", "%F") - by <- c("1 day", "1 day", "1 day", "1 week", "1 month", "3 months", - "1 year")[scales_index] - seq(as.Date(as.character(min(input), from_print_format), - format = from_parse_format), max(input), by = by) -} - -# Helper function that takes breaks as input and returns labels as output. We're -# going all ISO-8601 here, though we're not just writing %Y-%m-%d everywhere, -# but %Y-%m or %Y if all breaks are on the first of a month or even year. -custom_labels <- function(breaks) { - if (all(format(breaks, format = "%m-%d") == "01-01", na.rm = TRUE)) { - format(breaks, format = "%Y") - } else { - if (all(format(breaks, format = "%d") == "01", na.rm = TRUE)) { - format(breaks, format = "%Y-%m") - } else { - format(breaks, format = "%F") - } - } -} - -# Helper function to format numbers in non-scientific notation with spaces as -# thousands separator. -formatter <- function(x, ...) { - format(x, ..., scientific = FALSE, big.mark = " ") -} - -theme_update( - # Make plot title centered, and leave some room to the plot. - plot.title = element_text(hjust = 0.5, margin = margin(b = 11)), - - # Leave a little more room to the right for long x axis labels. - plot.margin = margin(5.5, 11, 5.5, 5.5) -) - -# Set the default line size of geom_line() to 1. -update_geom_defaults("line", list(size = 1)) - -copyright_notice <- "The Tor Project - https://metrics.torproject.org/" - -stats_dir <- "/srv/metrics.torproject.org/metrics/shared/stats/" - -rdata_dir <- "/srv/metrics.torproject.org/metrics/shared/RData/" - -# Helper function that copies the appropriate no data object to filename. -copy_no_data <- function(filename) { - len <- nchar(filename) - extension <- substr(filename, len - 3, len) - if (".csv" == extension) { - write("# No data available for the given parameters.", file=filename) - } else { - file.copy(paste(rdata_dir, "no-data-available", extension, sep = ""), - filename) - } -} - -# Helper function wrapping calls into error handling. -robust_call <- function(wrappee, filename) { - tryCatch(eval(wrappee), error = function(e) copy_no_data(filename), - finally = if (!file.exists(filename) || file.size(filename) == 0) { - copy_no_data(filename) - }) -} - -# Write the result of the given FUN, typically a prepare_ function, as .csv file -# to the given path_p. -write_data <- function(FUN, ..., path_p) { - FUN(...) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -# Disable readr's automatic progress bar. -options(readr.show_progress = FALSE) - -prepare_networksize <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "networksize.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - relays = col_double(), - bridges = col_double())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) -} - -plot_networksize <- function(start_p, end_p, path_p) { - prepare_networksize(start_p, end_p) %>% - gather(variable, value, -date) %>% - complete(date = full_seq(date, period = 1), - variable = c("relays", "bridges")) %>% - ggplot(aes(x = date, y = value, colour = variable)) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - scale_colour_hue("", breaks = c("relays", "bridges"), - labels = c("Relays", "Bridges")) + - ggtitle("Number of relays") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_versions <- function(start_p = NULL, end_p = NULL) { - read_csv(paste(stats_dir, "versions.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - version = col_character(), - relays = col_double())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) -} - -plot_versions <- function(start_p, end_p, path_p) { - s <- prepare_versions(start_p, end_p) - known_versions <- unique(s$version) - getPalette <- colorRampPalette(brewer.pal(12, "Paired")) - colours <- data.frame(breaks = known_versions, - values = rep(brewer.pal(min(12, length(known_versions)), "Paired"), - len = length(known_versions)), - stringsAsFactors = FALSE) - versions <- s[s$version %in% known_versions, ] - visible_versions <- sort(unique(versions$version)) - versions <- versions %>% - complete(date = full_seq(date, period = 1), nesting(version)) %>% - ggplot(aes(x = date, y = relays, colour = version)) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - scale_colour_manual(name = "Tor version", - values = colours[colours$breaks %in% visible_versions, 2], - breaks = visible_versions) + - ggtitle("Relay versions") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_platforms <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "platforms.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - platform = col_factor(levels = NULL), - relays = col_double())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - mutate(platform = tolower(platform)) %>% - spread(platform, relays) -} - -plot_platforms <- function(start_p, end_p, path_p) { - prepare_platforms(start_p, end_p) %>% - gather(platform, relays, -date) %>% - complete(date = full_seq(date, period = 1), nesting(platform)) %>% - ggplot(aes(x = date, y = relays, colour = platform)) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - scale_colour_manual(name = "Platform", - breaks = c("linux", "macos", "bsd", "windows", "other"), - labels = c("Linux", "macOS", "BSD", "Windows", "Other"), - values = c("linux" = "#56B4E9", "macos" = "#333333", "bsd" = "#E69F00", - "windows" = "#0072B2", "other" = "#009E73")) + - ggtitle("Relay platforms") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_dirbytes <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "bandwidth.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - isexit = col_logical(), - isguard = col_logical(), - bwread = col_skip(), - bwwrite = col_skip(), - dirread = col_double(), - dirwrite = col_double())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(is.na(isexit)) %>% - filter(is.na(isguard)) %>% - mutate(dirread = dirread * 8 / 1e9, - dirwrite = dirwrite * 8 / 1e9) %>% - select(date, dirread, dirwrite) -} - -plot_dirbytes <- function(start_p, end_p, path_p) { - prepare_dirbytes(start_p, end_p) %>% - gather(variable, value, -date) %>% - complete(date = full_seq(date, period = 1), nesting(variable)) %>% - ggplot(aes(x = date, y = value, colour = variable)) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = unit_format(unit = "Gbit/s"), - limits = c(0, NA)) + - scale_colour_hue(name = "", - breaks = c("dirwrite", "dirread"), - labels = c("Written dir bytes", "Read dir bytes")) + - ggtitle("Number of bytes spent on answering directory requests") + - labs(caption = copyright_notice) + - theme(legend.position = "top") - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_relayflags <- function(start_p = NULL, end_p = NULL, flag_p = NULL) { - read_csv(file = paste(stats_dir, "relayflags.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - flag = col_factor(levels = NULL), - relays = col_double())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(if (!is.null(flag_p)) flag %in% flag_p else TRUE) -} - -plot_relayflags <- function(start_p, end_p, flag_p, path_p) { - prepare_relayflags(start_p, end_p, flag_p) %>% - complete(date = full_seq(date, period = 1), flag = unique(flag)) %>% - ggplot(aes(x = date, y = relays, colour = flag)) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - scale_colour_manual(name = "Relay flags", values = c("#E69F00", - "#56B4E9", "#009E73", "#EE6A50", "#000000", "#0072B2"), - breaks = flag_p, labels = flag_p) + - ggtitle("Number of relays with relay flags assigned") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_torperf <- function(start_p = NULL, end_p = NULL, server_p = NULL, - filesize_p = NULL) { - read_csv(file = paste(stats_dir, "torperf-1.1.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - filesize = col_double(), - source = col_character(), - server = col_character(), - q1 = col_double(), - md = col_double(), - q3 = col_double(), - timeouts = col_skip(), - failures = col_skip(), - requests = col_skip())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(if (!is.null(server_p)) server == server_p else TRUE) %>% - filter(if (!is.null(filesize_p)) - filesize == ifelse(filesize_p == "50kb", 50 * 1024, - ifelse(filesize_p == "1mb", 1024 * 1024, 5 * 1024 * 1024)) else - TRUE) %>% - transmute(date, filesize, source, server, q1 = q1 / 1e3, md = md / 1e3, - q3 = q3 / 1e3) -} - -plot_torperf <- function(start_p, end_p, server_p, filesize_p, path_p) { - prepare_torperf(start_p, end_p, server_p, filesize_p) %>% - filter(source != "") %>% - complete(date = full_seq(date, period = 1), nesting(source)) %>% - ggplot(aes(x = date, y = md, ymin = q1, ymax = q3, fill = source)) + - geom_ribbon(alpha = 0.5) + - geom_line(aes(colour = source), size = 0.75) + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = unit_format(unit = "s"), - limits = c(0, NA)) + - scale_fill_hue(name = "Source") + - scale_colour_hue(name = "Source") + - ggtitle(paste("Time to complete", - ifelse(filesize_p == "50kb", "50 KiB", - ifelse(filesize_p == "1mb", "1 MiB", "5 MiB")), - "request to", server_p, "server")) + - labs(caption = copyright_notice) + - theme(legend.position = "top") - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_torperf_failures <- function(start_p = NULL, end_p = NULL, - server_p = NULL, filesize_p = NULL) { - read_csv(file = paste(stats_dir, "torperf-1.1.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - filesize = col_double(), - source = col_character(), - server = col_character(), - q1 = col_skip(), - md = col_skip(), - q3 = col_skip(), - timeouts = col_double(), - failures = col_double(), - requests = col_double())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(if (!is.null(filesize_p)) - filesize == ifelse(filesize_p == "50kb", 50 * 1024, - ifelse(filesize_p == "1mb", 1024 * 1024, 5 * 1024 * 1024)) else - TRUE) %>% - filter(if (!is.null(server_p)) server == server_p else TRUE) %>% - filter(requests > 0) %>% - transmute(date, filesize, source, server, timeouts = timeouts / requests, - failures = failures / requests) -} - -plot_torperf_failures <- function(start_p, end_p, server_p, filesize_p, - path_p) { - prepare_torperf_failures(start_p, end_p, server_p, filesize_p) %>% - filter(source != "") %>% - gather(variable, value, -c(date, filesize, source, server)) %>% - mutate(variable = factor(variable, levels = c("timeouts", "failures"), - labels = c("Timeouts", "Failures"))) %>% - ggplot(aes(x = date, y = value, colour = source)) + - geom_point(size = 2, alpha = 0.5) + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = percent, limits = c(0, NA)) + - scale_colour_hue(name = "Source") + - facet_grid(variable ~ .) + - ggtitle(paste("Timeouts and failures of", - ifelse(filesize_p == "50kb", "50 KiB", - ifelse(filesize_p == "1mb", "1 MiB", "5 MiB")), - "requests to", server_p, "server")) + - labs(caption = copyright_notice) + - theme(legend.position = "top") - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_onionperf_buildtimes <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "buildtimes.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - source = col_character(), - position = col_double(), - q1 = col_double(), - md = col_double(), - q3 = col_double())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) -} - -plot_onionperf_buildtimes <- function(start_p, end_p, path_p) { - prepare_onionperf_buildtimes(start_p, end_p) %>% - filter(source != "") %>% - mutate(date = as.Date(date), - position = factor(position, levels = seq(1, 3, 1), - labels = c("1st hop", "2nd hop", "3rd hop"))) %>% - complete(date = full_seq(date, period = 1), nesting(source, position)) %>% - ggplot(aes(x = date, y = md, ymin = q1, ymax = q3, fill = source)) + - geom_ribbon(alpha = 0.5) + - geom_line(aes(colour = source), size = 0.75) + - facet_grid(position ~ .) + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = unit_format(unit = "ms"), - limits = c(0, NA)) + - scale_fill_hue(name = "Source") + - scale_colour_hue(name = "Source") + - ggtitle("Circuit build times") + - labs(caption = copyright_notice) + - theme(legend.position = "top") - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_onionperf_latencies <- function(start_p = NULL, end_p = NULL, - server_p = NULL) { - read_csv(file = paste(stats_dir, "latencies.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - source = col_character(), - server = col_character(), - q1 = col_double(), - md = col_double(), - q3 = col_double())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(if (!is.null(server_p)) server == server_p else TRUE) -} - -plot_onionperf_latencies <- function(start_p, end_p, server_p, path_p) { - prepare_onionperf_latencies(start_p, end_p, server_p) %>% - filter(source != "") %>% - complete(date = full_seq(date, period = 1), nesting(source)) %>% - ggplot(aes(x = date, y = md, ymin = q1, ymax = q3, fill = source)) + - geom_ribbon(alpha = 0.5) + - geom_line(aes(colour = source), size = 0.75) + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = unit_format(unit = "ms"), - limits = c(0, NA)) + - scale_fill_hue(name = "Source") + - scale_colour_hue(name = "Source") + - ggtitle(paste("Circuit round-trip latencies to", server_p, "server")) + - labs(caption = copyright_notice) + - theme(legend.position = "top") - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_connbidirect <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "connbidirect2.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - direction = col_factor(levels = NULL), - quantile = col_double(), - fraction = col_double())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - mutate(quantile = paste("X", quantile, sep = ""), - fraction = fraction / 100) %>% - spread(quantile, fraction) %>% - rename(q1 = X0.25, md = X0.5, q3 = X0.75) -} - -plot_connbidirect <- function(start_p, end_p, path_p) { - prepare_connbidirect(start_p, end_p) %>% - complete(date = full_seq(date, period = 1), nesting(direction)) %>% - ggplot(aes(x = date, y = md, ymin = q1, ymax = q3, fill = direction)) + - geom_ribbon(alpha = 0.5) + - geom_line(aes(colour = direction), size = 0.75) + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = percent, limits = c(0, NA)) + - scale_colour_hue(name = "Medians and interquartile ranges", - breaks = c("both", "write", "read"), - labels = c("Both reading and writing", "Mostly writing", - "Mostly reading")) + - scale_fill_hue(name = "Medians and interquartile ranges", - breaks = c("both", "write", "read"), - labels = c("Both reading and writing", "Mostly writing", - "Mostly reading")) + - ggtitle("Fraction of connections used uni-/bidirectionally") + - labs(caption = copyright_notice) + - theme(legend.position = "top") - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_bandwidth_flags <- function(start_p = NULL, end_p = NULL) { - advbw <- read_csv(file = paste(stats_dir, "advbw.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - isexit = col_logical(), - isguard = col_logical(), - advbw = col_double())) %>% - transmute(date, have_guard_flag = isguard, have_exit_flag = isexit, - variable = "advbw", value = advbw * 8 / 1e9) - bwhist <- read_csv(file = paste(stats_dir, "bandwidth.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - isexit = col_logical(), - isguard = col_logical(), - bwread = col_double(), - bwwrite = col_double(), - dirread = col_double(), - dirwrite = col_double())) %>% - transmute(date, have_guard_flag = isguard, have_exit_flag = isexit, - variable = "bwhist", value = (bwread + bwwrite) * 8 / 2e9) - rbind(advbw, bwhist) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(!is.na(have_exit_flag)) %>% - filter(!is.na(have_guard_flag)) %>% - spread(variable, value) -} - -plot_bandwidth_flags <- function(start_p, end_p, path_p) { - prepare_bandwidth_flags(start_p, end_p) %>% - gather(variable, value, c(advbw, bwhist)) %>% - unite(flags, have_guard_flag, have_exit_flag) %>% - mutate(flags = factor(flags, - levels = c("FALSE_TRUE", "TRUE_TRUE", "TRUE_FALSE", "FALSE_FALSE"), - labels = c("Exit only", "Guard and Exit", "Guard only", - "Neither Guard nor Exit"))) %>% - mutate(variable = ifelse(variable == "advbw", - "Advertised bandwidth", "Consumed bandwidth")) %>% - ggplot(aes(x = date, y = value, fill = flags)) + - geom_area() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = unit_format(unit = "Gbit/s"), - limits = c(0, NA)) + - scale_fill_manual(name = "", - values = c("#03B3FF", "#39FF02", "#FFFF00", "#AAAA99")) + - facet_grid(variable ~ .) + - ggtitle("Advertised and consumed bandwidth by relay flags") + - labs(caption = copyright_notice) + - theme(legend.position = "top") - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_userstats_relay_country <- function(start_p = NULL, end_p = NULL, - country_p = NULL, events_p = NULL) { - read_csv(file = paste(stats_dir, "clients.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - node = col_character(), - country = col_character(), - transport = col_character(), - version = col_character(), - lower = col_double(), - upper = col_double(), - clients = col_double(), - frac = col_double()), - na = character()) %>% - filter(node == "relay") %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(if (!is.null(country_p)) - country == ifelse(country_p == "all", "", country_p) else TRUE) %>% - filter(transport == "") %>% - filter(version == "") %>% - select(date, country, clients, lower, upper, frac) %>% - rename(users = clients) -} - -plot_userstats_relay_country <- function(start_p, end_p, country_p, events_p, - path_p) { - u <- prepare_userstats_relay_country(start_p, end_p, country_p, events_p) %>% - complete(date = full_seq(date, period = 1)) - plot <- ggplot(u, aes(x = date, y = users)) - if (length(na.omit(u$users)) > 0 & events_p != "off" & - country_p != "all") { - upturns <- u[u$users > u$upper, c("date", "users")] - downturns <- u[u$users < u$lower, c("date", "users")] - if (events_p == "on") { - u[!is.na(u$lower) & u$lower < 0, "lower"] <- 0 - plot <- plot + - geom_ribbon(data = u, aes(ymin = lower, ymax = upper), fill = "gray") - } - if (length(upturns$date) > 0) - plot <- plot + - geom_point(data = upturns, aes(x = date, y = users), size = 5, - colour = "dodgerblue2") - if (length(downturns$date) > 0) - plot <- plot + - geom_point(data = downturns, aes(x = date, y = users), size = 5, - colour = "firebrick2") - } - plot <- plot + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - ggtitle(paste("Directly connecting users", - ifelse(country_p == "all", "", - paste(" from", countryname(country_p))), sep = "")) + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_userstats_bridge_country <- function(start_p = NULL, end_p = NULL, - country_p = NULL) { - read_csv(file = paste(stats_dir, "clients.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - node = col_character(), - country = col_character(), - transport = col_character(), - version = col_character(), - lower = col_double(), - upper = col_double(), - clients = col_double(), - frac = col_double()), - na = character()) %>% - filter(node == "bridge") %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(if (!is.null(country_p)) - country == ifelse(country_p == "all", "", country_p) else TRUE) %>% - filter(transport == "") %>% - filter(version == "") %>% - select(date, country, clients, frac) %>% - rename(users = clients) -} - -plot_userstats_bridge_country <- function(start_p, end_p, country_p, path_p) { - prepare_userstats_bridge_country(start_p, end_p, country_p) %>% - complete(date = full_seq(date, period = 1)) %>% - ggplot(aes(x = date, y = users)) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - ggtitle(paste("Bridge users", - ifelse(country_p == "all", "", - paste(" from", countryname(country_p))), sep = "")) + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_userstats_bridge_transport <- function(start_p = NULL, end_p = NULL, - transport_p = NULL) { - u <- read_csv(file = paste(stats_dir, "clients.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - node = col_character(), - country = col_character(), - transport = col_character(), - version = col_character(), - lower = col_double(), - upper = col_double(), - clients = col_double(), - frac = col_double())) %>% - filter(node == "bridge") %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(is.na(country)) %>% - filter(is.na(version)) %>% - filter(!is.na(transport)) %>% - select(date, transport, clients, frac) - if (is.null(transport_p) || "!<OR>" %in% transport_p) { - n <- u %>% - filter(transport != "<OR>") %>% - group_by(date, frac) %>% - summarize(clients = sum(clients)) - u <- rbind(u, data.frame(date = n$date, transport = "!<OR>", - clients = n$clients, frac = n$frac)) - } - u %>% - filter(if (!is.null(transport_p)) transport %in% transport_p else TRUE) %>% - select(date, transport, clients, frac) %>% - rename(users = clients) %>% - arrange(date, transport) -} - -plot_userstats_bridge_transport <- function(start_p, end_p, transport_p, - path_p) { - if (length(transport_p) > 1) { - title <- paste("Bridge users by transport") - } else { - title <- paste("Bridge users using", - ifelse(transport_p == "<??>", "unknown pluggable transport(s)", - ifelse(transport_p == "<OR>", "default OR protocol", - ifelse(transport_p == "!<OR>", "any pluggable transport", - ifelse(transport_p == "fte", "FTE", - ifelse(transport_p == "websocket", "Flash proxy/websocket", - paste("transport", transport_p))))))) - } - u <- prepare_userstats_bridge_transport(start_p, end_p, transport_p) %>% - complete(date = full_seq(date, period = 1), nesting(transport)) - if (length(transport_p) > 1) { - plot <- ggplot(u, aes(x = date, y = users, colour = transport)) - } else { - plot <- ggplot(u, aes(x = date, y = users)) - } - plot <- plot + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - ggtitle(title) + - labs(caption = copyright_notice) - if (length(transport_p) > 1) { - plot <- plot + - scale_colour_hue(name = "", breaks = transport_p, - labels = ifelse(transport_p == "<??>", "Unknown PT", - ifelse(transport_p == "<OR>", "Default OR protocol", - ifelse(transport_p == "!<OR>", "Any PT", - ifelse(transport_p == "fte", "FTE", - ifelse(transport_p == "websocket", "Flash proxy/websocket", - transport_p)))))) - } - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_userstats_bridge_version <- function(start_p = NULL, end_p = NULL, - version_p = NULL) { - read_csv(file = paste(stats_dir, "clients.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - node = col_character(), - country = col_character(), - transport = col_character(), - version = col_character(), - lower = col_double(), - upper = col_double(), - clients = col_double(), - frac = col_double())) %>% - filter(node == "bridge") %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(is.na(country)) %>% - filter(is.na(transport)) %>% - filter(if (!is.null(version_p)) version == version_p else TRUE) %>% - select(date, version, clients, frac) %>% - rename(users = clients) -} - -plot_userstats_bridge_version <- function(start_p, end_p, version_p, path_p) { - prepare_userstats_bridge_version(start_p, end_p, version_p) %>% - complete(date = full_seq(date, period = 1)) %>% - ggplot(aes(x = date, y = users)) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - ggtitle(paste("Bridge users using IP", version_p, sep = "")) + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_userstats_bridge_combined <- function(start_p = NULL, end_p = NULL, - country_p = NULL) { - if (!is.null(country_p) && country_p == "all") { - prepare_userstats_bridge_country(start_p, end_p, country_p) - } else { - read_csv(file = paste(stats_dir, "userstats-combined.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - node = col_skip(), - country = col_character(), - transport = col_character(), - version = col_skip(), - frac = col_double(), - low = col_double(), - high = col_double()), - na = character()) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(if (!is.null(country_p)) country == country_p else TRUE) %>% - select(date, country, transport, low, high, frac) %>% - arrange(date, country, transport) - } -} - -plot_userstats_bridge_combined <- function(start_p, end_p, country_p, path_p) { - if (country_p == "all") { - plot_userstats_bridge_country(start_p, end_p, country_p, path_p) - } else { - top <- 3 - u <- prepare_userstats_bridge_combined(start_p, end_p, country_p) - a <- aggregate(list(mid = (u$high + u$low) / 2), - by = list(transport = u$transport), FUN = sum) - a <- a[order(a$mid, decreasing = TRUE)[1:top], ] - u <- u[u$transport %in% a$transport, ] %>% - complete(date = full_seq(date, period = 1), nesting(country, transport)) - title <- paste("Bridge users by transport from ", - countryname(country_p), sep = "") - ggplot(u, aes(x = as.Date(date), ymin = low, ymax = high, - fill = transport)) + - geom_ribbon(alpha = 0.5, size = 0.5) + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", limits = c(0, NA), labels = formatter) + - scale_colour_hue("Top-3 transports") + - scale_fill_hue("Top-3 transports") + - ggtitle(title) + - labs(caption = copyright_notice) + - theme(legend.position = "top") - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) - } -} - -prepare_advbwdist_perc <- function(start_p = NULL, end_p = NULL, p_p = NULL) { - read_csv(file = paste(stats_dir, "advbwdist.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - isexit = col_logical(), - relay = col_skip(), - percentile = col_integer(), - advbw = col_double())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(if (!is.null(p_p)) percentile %in% as.numeric(p_p) else - percentile != "") %>% - transmute(date, percentile = as.factor(percentile), - variable = ifelse(is.na(isexit), "all", "exits"), - advbw = advbw * 8 / 1e9) %>% - spread(variable, advbw) %>% - rename(p = percentile) -} - -plot_advbwdist_perc <- function(start_p, end_p, p_p, path_p) { - prepare_advbwdist_perc(start_p, end_p, p_p) %>% - gather(variable, advbw, -c(date, p)) %>% - mutate(variable = ifelse(variable == "all", "All relays", - "Exits only")) %>% - complete(date = full_seq(date, period = 1), nesting(p, variable)) %>% - ggplot(aes(x = date, y = advbw, colour = p)) + - facet_grid(variable ~ .) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = unit_format(unit = "Gbit/s"), - limits = c(0, NA)) + - scale_colour_hue(name = "Percentile") + - ggtitle("Advertised bandwidth distribution") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_advbwdist_relay <- function(start_p = NULL, end_p = NULL, n_p = NULL) { - read_csv(file = paste(stats_dir, "advbwdist.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - isexit = col_logical(), - relay = col_integer(), - percentile = col_skip(), - advbw = col_double())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(if (!is.null(n_p)) relay %in% as.numeric(n_p) else - relay != "") %>% - transmute(date, relay = as.factor(relay), - variable = ifelse(is.na(isexit), "all", "exits"), - advbw = advbw * 8 / 1e9) %>% - spread(variable, advbw) %>% - rename(n = relay) -} - -plot_advbwdist_relay <- function(start_p, end_p, n_p, path_p) { - prepare_advbwdist_relay(start_p, end_p, n_p) %>% - gather(variable, advbw, -c(date, n)) %>% - mutate(variable = ifelse(variable == "all", "All relays", - "Exits only")) %>% - complete(date = full_seq(date, period = 1), nesting(n, variable)) %>% - ggplot(aes(x = date, y = advbw, colour = n)) + - facet_grid(variable ~ .) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = unit_format(unit = "Gbit/s"), - limits = c(0, NA)) + - scale_colour_hue(name = "n") + - ggtitle("Advertised bandwidth of n-th fastest relays") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_hidserv_dir_onions_seen <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "hidserv.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - type = col_factor(levels = NULL), - wmean = col_skip(), - wmedian = col_skip(), - wiqm = col_double(), - frac = col_double(), - stats = col_skip())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(type == "dir-onions-seen") %>% - transmute(date, onions = ifelse(frac >= 0.01, wiqm, NA), frac) -} - -plot_hidserv_dir_onions_seen <- function(start_p, end_p, path_p) { - prepare_hidserv_dir_onions_seen(start_p, end_p) %>% - complete(date = full_seq(date, period = 1)) %>% - ggplot(aes(x = date, y = onions)) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", limits = c(0, NA), labels = formatter) + - ggtitle("Unique .onion addresses") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_hidserv_rend_relayed_cells <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "hidserv.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - type = col_factor(levels = NULL), - wmean = col_skip(), - wmedian = col_skip(), - wiqm = col_double(), - frac = col_double(), - stats = col_skip())) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(type == "rend-relayed-cells") %>% - transmute(date, - relayed = ifelse(frac >= 0.01, wiqm * 8 * 512 / (86400 * 1e9), NA), frac) -} - -plot_hidserv_rend_relayed_cells <- function(start_p, end_p, path_p) { - prepare_hidserv_rend_relayed_cells(start_p, end_p) %>% - complete(date = full_seq(date, period = 1)) %>% - ggplot(aes(x = date, y = relayed)) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = unit_format(unit = "Gbit/s"), - limits = c(0, NA)) + - ggtitle("Onion-service traffic") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_webstats_tb <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), - col_types = cols( - log_date = col_date(format = ""), - request_type = col_factor(levels = NULL), - platform = col_skip(), - channel = col_skip(), - locale = col_skip(), - incremental = col_skip(), - count = col_double())) %>% - filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% - filter(request_type %in% c("tbid", "tbsd", "tbup", "tbur")) %>% - group_by(log_date, request_type) %>% - summarize(count = sum(count)) %>% - spread(request_type, count) %>% - rename(date = log_date, initial_downloads = tbid, - signature_downloads = tbsd, update_pings = tbup, - update_requests = tbur) -} - -plot_webstats_tb <- function(start_p, end_p, path_p) { - prepare_webstats_tb(start_p, end_p) %>% - gather(request_type, count, -date) %>% - mutate(request_type = factor(request_type, - levels = c("initial_downloads", "signature_downloads", "update_pings", - "update_requests"), - labels = c("Initial downloads", "Signature downloads", "Update pings", - "Update requests"))) %>% - ungroup() %>% - complete(date = full_seq(date, period = 1), nesting(request_type)) %>% - ggplot(aes(x = date, y = count)) + - geom_point() + - geom_line() + - facet_grid(request_type ~ ., scales = "free_y") + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - theme(strip.text.y = element_text(angle = 0, hjust = 0, size = rel(1.5)), - strip.background = element_rect(fill = NA)) + - ggtitle("Tor Browser downloads and updates") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_webstats_tb_platform <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), - col_types = cols( - log_date = col_date(format = ""), - request_type = col_factor(levels = NULL), - platform = col_factor(levels = NULL), - channel = col_skip(), - locale = col_skip(), - incremental = col_skip(), - count = col_double())) %>% - filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% - filter(request_type %in% c("tbid", "tbup")) %>% - group_by(log_date, platform, request_type) %>% - summarize(count = sum(count)) %>% - spread(request_type, count, fill = 0) %>% - rename(date = log_date, initial_downloads = tbid, update_pings = tbup) -} - -plot_webstats_tb_platform <- function(start_p, end_p, path_p) { - prepare_webstats_tb_platform(start_p, end_p) %>% - gather(request_type, count, -c(date, platform)) %>% - mutate(request_type = factor(request_type, - levels = c("initial_downloads", "update_pings"), - labels = c("Initial downloads", "Update pings"))) %>% - ungroup() %>% - complete(date = full_seq(date, period = 1), - nesting(platform, request_type)) %>% - ggplot(aes(x = date, y = count, colour = platform)) + - geom_point() + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - scale_colour_hue(name = "Platform", - breaks = c("w", "m", "l", "o", ""), - labels = c("Windows", "macOS", "Linux", "Other", "Unknown")) + - facet_grid(request_type ~ ., scales = "free_y") + - theme(strip.text.y = element_text(angle = 0, hjust = 0, size = rel(1.5)), - strip.background = element_rect(fill = NA), - legend.position = "top") + - ggtitle("Tor Browser downloads and updates by platform") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_webstats_tb_locale <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), - col_types = cols( - log_date = col_date(format = ""), - request_type = col_factor(levels = NULL), - platform = col_skip(), - channel = col_skip(), - locale = col_factor(levels = NULL), - incremental = col_skip(), - count = col_double())) %>% - filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% - filter(request_type %in% c("tbid", "tbup")) %>% - rename(date = log_date) %>% - group_by(date, locale, request_type) %>% - summarize(count = sum(count)) %>% - mutate(request_type = factor(request_type, levels = c("tbid", "tbup"))) %>% - spread(request_type, count, fill = 0) %>% - rename(initial_downloads = tbid, update_pings = tbup) -} - -plot_webstats_tb_locale <- function(start_p, end_p, path_p) { - d <- prepare_webstats_tb_locale(start_p, end_p) %>% - gather(request_type, count, -c(date, locale)) %>% - mutate(request_type = factor(request_type, - levels = c("initial_downloads", "update_pings"), - labels = c("Initial downloads", "Update pings"))) - e <- d - e <- aggregate(list(count = e$count), by = list(locale = e$locale), FUN = sum) - e <- e[order(e$count, decreasing = TRUE), ] - e <- e[1:5, ] - d <- aggregate(list(count = d$count), by = list(date = d$date, - request_type = d$request_type, - locale = ifelse(d$locale %in% e$locale, d$locale, "(other)")), FUN = sum) - d %>% - complete(date = full_seq(date, period = 1), - nesting(locale, request_type)) %>% - ggplot(aes(x = date, y = count, colour = locale)) + - geom_point() + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - scale_colour_hue(name = "Locale", - breaks = c(e$locale, "(other)"), - labels = c(as.character(e$locale), "Other")) + - facet_grid(request_type ~ ., scales = "free_y") + - theme(strip.text.y = element_text(angle = 0, hjust = 0, size = rel(1.5)), - strip.background = element_rect(fill = NA), - legend.position = "top") + - guides(col = guide_legend(nrow = 1)) + - ggtitle("Tor Browser downloads and updates by locale") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_webstats_tm <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), - col_types = cols( - log_date = col_date(format = ""), - request_type = col_factor(levels = NULL), - platform = col_skip(), - channel = col_skip(), - locale = col_skip(), - incremental = col_skip(), - count = col_double())) %>% - filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% - filter(request_type %in% c("tmid", "tmup")) %>% - group_by(log_date, request_type) %>% - summarize(count = sum(count)) %>% - mutate(request_type = factor(request_type, levels = c("tmid", "tmup"))) %>% - spread(request_type, count, drop = FALSE, fill = 0) %>% - rename(date = log_date, initial_downloads = tmid, update_pings = tmup) -} - -plot_webstats_tm <- function(start_p, end_p, path_p) { - prepare_webstats_tm(start_p, end_p) %>% - gather(request_type, count, -date) %>% - mutate(request_type = factor(request_type, - levels = c("initial_downloads", "update_pings"), - labels = c("Initial downloads", "Update pings"))) %>% - ungroup() %>% - complete(date = full_seq(date, period = 1), nesting(request_type)) %>% - ggplot(aes(x = date, y = count)) + - geom_point() + - geom_line() + - facet_grid(request_type ~ ., scales = "free_y") + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - theme(strip.text.y = element_text(angle = 0, hjust = 0, size = rel(1.5)), - strip.background = element_rect(fill = NA)) + - ggtitle("Tor Messenger downloads and updates") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_relays_ipv6 <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "ipv6servers.csv", sep = ""), - col_types = cols( - valid_after_date = col_date(format = ""), - server = col_factor(levels = NULL), - guard_relay = col_skip(), - exit_relay = col_skip(), - announced_ipv6 = col_logical(), - exiting_ipv6_relay = col_logical(), - reachable_ipv6_relay = col_logical(), - server_count_sum_avg = col_double(), - advertised_bandwidth_bytes_sum_avg = col_skip())) %>% - filter(if (!is.null(start_p)) - valid_after_date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) - valid_after_date <= as.Date(end_p) else TRUE) %>% - filter(server == "relay") %>% - group_by(valid_after_date) %>% - summarize(total = sum(server_count_sum_avg), - announced = sum(server_count_sum_avg[announced_ipv6]), - reachable = sum(server_count_sum_avg[reachable_ipv6_relay]), - exiting = sum(server_count_sum_avg[exiting_ipv6_relay])) %>% - rename(date = valid_after_date) -} - -plot_relays_ipv6 <- function(start_p, end_p, path_p) { - prepare_relays_ipv6(start_p, end_p) %>% - complete(date = full_seq(date, period = 1)) %>% - gather(category, count, -date) %>% - ggplot(aes(x = date, y = count, colour = category)) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - scale_colour_hue(name = "", h.start = 90, - breaks = c("total", "announced", "reachable", "exiting"), - labels = c("Total (IPv4) OR", "IPv6 announced OR", "IPv6 reachable OR", - "IPv6 exiting")) + - ggtitle("Relays by IP version") + - labs(caption = copyright_notice) + - theme(legend.position = "top") - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_bridges_ipv6 <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "ipv6servers.csv", sep = ""), - col_types = cols( - valid_after_date = col_date(format = ""), - server = col_factor(levels = NULL), - guard_relay = col_skip(), - exit_relay = col_skip(), - announced_ipv6 = col_logical(), - exiting_ipv6_relay = col_skip(), - reachable_ipv6_relay = col_skip(), - server_count_sum_avg = col_double(), - advertised_bandwidth_bytes_sum_avg = col_skip())) %>% - filter(if (!is.null(start_p)) - valid_after_date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) - valid_after_date <= as.Date(end_p) else TRUE) %>% - filter(server == "bridge") %>% - group_by(valid_after_date) %>% - summarize(total = sum(server_count_sum_avg), - announced = sum(server_count_sum_avg[announced_ipv6])) %>% - rename(date = valid_after_date) -} - -plot_bridges_ipv6 <- function(start_p, end_p, path_p) { - prepare_bridges_ipv6(start_p, end_p) %>% - complete(date = full_seq(date, period = 1)) %>% - gather(category, count, -date) %>% - ggplot(aes(x = date, y = count, colour = category)) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - scale_colour_hue(name = "", h.start = 90, - breaks = c("total", "announced"), - labels = c("Total (IPv4) OR", "IPv6 announced OR")) + - ggtitle("Bridges by IP version") + - labs(caption = copyright_notice) + - theme(legend.position = "top") - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_advbw_ipv6 <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "ipv6servers.csv", sep = ""), - col_types = cols( - valid_after_date = col_date(format = ""), - server = col_factor(levels = NULL), - guard_relay = col_logical(), - exit_relay = col_logical(), - announced_ipv6 = col_logical(), - exiting_ipv6_relay = col_logical(), - reachable_ipv6_relay = col_logical(), - server_count_sum_avg = col_skip(), - advertised_bandwidth_bytes_sum_avg = col_double())) %>% - filter(if (!is.null(start_p)) - valid_after_date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) - valid_after_date <= as.Date(end_p) else TRUE) %>% - filter(server == "relay") %>% - mutate(advertised_bandwidth_bytes_sum_avg = - advertised_bandwidth_bytes_sum_avg * 8 / 1e9) %>% - group_by(valid_after_date) %>% - summarize(total = sum(advertised_bandwidth_bytes_sum_avg), - total_guard = sum(advertised_bandwidth_bytes_sum_avg[guard_relay]), - total_exit = sum(advertised_bandwidth_bytes_sum_avg[exit_relay]), - reachable_guard = sum(advertised_bandwidth_bytes_sum_avg[ - reachable_ipv6_relay & guard_relay]), - reachable_exit = sum(advertised_bandwidth_bytes_sum_avg[ - reachable_ipv6_relay & exit_relay]), - exiting = sum(advertised_bandwidth_bytes_sum_avg[ - exiting_ipv6_relay])) %>% - rename(date = valid_after_date) -} - -plot_advbw_ipv6 <- function(start_p, end_p, path_p) { - prepare_advbw_ipv6(start_p, end_p) %>% - complete(date = full_seq(date, period = 1)) %>% - gather(category, advbw, -date) %>% - ggplot(aes(x = date, y = advbw, colour = category)) + - geom_line() + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = unit_format(unit = "Gbit/s"), - limits = c(0, NA)) + - scale_colour_hue(name = "", h.start = 90, - breaks = c("total", "total_guard", "total_exit", "reachable_guard", - "reachable_exit", "exiting"), - labels = c("Total (IPv4) OR", "Guard total (IPv4)", "Exit total (IPv4)", - "Reachable guard IPv6 OR", "Reachable exit IPv6 OR", "IPv6 exiting")) + - ggtitle("Advertised bandwidth by IP version") + - labs(caption = copyright_notice) + - theme(legend.position = "top") + - guides(colour = guide_legend(nrow = 2, byrow = TRUE)) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - -prepare_totalcw <- function(start_p = NULL, end_p = NULL) { - read_csv(file = paste(stats_dir, "totalcw.csv", sep = ""), - col_types = cols( - valid_after_date = col_date(format = ""), - nickname = col_character(), - have_guard_flag = col_logical(), - have_exit_flag = col_logical(), - measured_sum_avg = col_double())) %>% - filter(if (!is.null(start_p)) - valid_after_date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) - valid_after_date <= as.Date(end_p) else TRUE) %>% - group_by(valid_after_date, nickname) %>% - summarize(measured_sum_avg = sum(measured_sum_avg)) %>% - rename(date = valid_after_date, totalcw = measured_sum_avg) %>% - arrange(date, nickname) -} - -plot_totalcw <- function(start_p, end_p, path_p) { - prepare_totalcw(start_p, end_p) %>% - mutate(nickname = ifelse(is.na(nickname), "consensus", nickname)) %>% - mutate(nickname = factor(nickname, - levels = c("consensus", unique(nickname[nickname != "consensus"])))) %>% - ungroup() %>% - complete(date = full_seq(date, period = 1), nesting(nickname)) %>% - ggplot(aes(x = date, y = totalcw, colour = nickname)) + - geom_line(na.rm = TRUE) + - scale_x_date(name = "", breaks = custom_breaks, - labels = custom_labels, minor_breaks = custom_minor_breaks) + - scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - scale_colour_hue(name = "") + - ggtitle("Total consensus weights across bandwidth authorities") + - labs(caption = copyright_notice) - ggsave(filename = path_p, width = 8, height = 5, dpi = 150) -} - - diff --git a/src/main/R/rserver/rserve-init.R b/src/main/R/rserver/rserve-init.R index f160698..57e14f5 100644 --- a/src/main/R/rserver/rserve-init.R +++ b/src/main/R/rserver/rserve-init.R @@ -1,12 +1,1603 @@ -##Pre-loaded libraries and graphing functions to speed things up +require(ggplot2) +require(RColorBrewer) +require(scales) +require(dplyr) +require(tidyr) +require(readr) -library("ggplot2") -library("RColorBrewer") -library("scales") -library(dplyr) -library(tidyr) -library(readr) +countrylist <- list( + "ad" = "Andorra", + "ae" = "the United Arab Emirates", + "af" = "Afghanistan", + "ag" = "Antigua and Barbuda", + "ai" = "Anguilla", + "al" = "Albania", + "am" = "Armenia", + "an" = "the Netherlands Antilles", + "ao" = "Angola", + "aq" = "Antarctica", + "ar" = "Argentina", + "as" = "American Samoa", + "at" = "Austria", + "au" = "Australia", + "aw" = "Aruba", + "ax" = "the Aland Islands", + "az" = "Azerbaijan", + "ba" = "Bosnia and Herzegovina", + "bb" = "Barbados", + "bd" = "Bangladesh", + "be" = "Belgium", + "bf" = "Burkina Faso", + "bg" = "Bulgaria", + "bh" = "Bahrain", + "bi" = "Burundi", + "bj" = "Benin", + "bl" = "Saint Bartelemey", + "bm" = "Bermuda", + "bn" = "Brunei", + "bo" = "Bolivia", + "bq" = "Bonaire, Sint Eustatius and Saba", + "br" = "Brazil", + "bs" = "the Bahamas", + "bt" = "Bhutan", + "bv" = "the Bouvet Island", + "bw" = "Botswana", + "by" = "Belarus", + "bz" = "Belize", + "ca" = "Canada", + "cc" = "the Cocos (Keeling) Islands", + "cd" = "the Democratic Republic of the Congo", + "cf" = "Central African Republic", + "cg" = "Congo", + "ch" = "Switzerland", + "ci" = "Côte d'Ivoire", + "ck" = "the Cook Islands", + "cl" = "Chile", + "cm" = "Cameroon", + "cn" = "China", + "co" = "Colombia", + "cr" = "Costa Rica", + "cu" = "Cuba", + "cv" = "Cape Verde", + "cw" = "Curaçao", + "cx" = "the Christmas Island", + "cy" = "Cyprus", + "cz" = "the Czech Republic", + "de" = "Germany", + "dj" = "Djibouti", + "dk" = "Denmark", + "dm" = "Dominica", + "do" = "the Dominican Republic", + "dz" = "Algeria", + "ec" = "Ecuador", + "ee" = "Estonia", + "eg" = "Egypt", + "eh" = "the Western Sahara", + "er" = "Eritrea", + "es" = "Spain", + "et" = "Ethiopia", + "fi" = "Finland", + "fj" = "Fiji", + "fk" = "the Falkland Islands (Malvinas)", + "fm" = "the Federated States of Micronesia", + "fo" = "the Faroe Islands", + "fr" = "France", + "ga" = "Gabon", + "gb" = "the United Kingdom", + "gd" = "Grenada", + "ge" = "Georgia", + "gf" = "French Guiana", + "gg" = "Guernsey", + "gh" = "Ghana", + "gi" = "Gibraltar", + "gl" = "Greenland", + "gm" = "Gambia", + "gn" = "Guinea", + "gp" = "Guadeloupe", + "gq" = "Equatorial Guinea", + "gr" = "Greece", + "gs" = "South Georgia and the South Sandwich Islands", + "gt" = "Guatemala", + "gu" = "Guam", + "gw" = "Guinea-Bissau", + "gy" = "Guyana", + "hk" = "Hong Kong", + "hm" = "Heard Island and McDonald Islands", + "hn" = "Honduras", + "hr" = "Croatia", + "ht" = "Haiti", + "hu" = "Hungary", + "id" = "Indonesia", + "ie" = "Ireland", + "il" = "Israel", + "im" = "the Isle of Man", + "in" = "India", + "io" = "the British Indian Ocean Territory", + "iq" = "Iraq", + "ir" = "Iran", + "is" = "Iceland", + "it" = "Italy", + "je" = "Jersey", + "jm" = "Jamaica", + "jo" = "Jordan", + "jp" = "Japan", + "ke" = "Kenya", + "kg" = "Kyrgyzstan", + "kh" = "Cambodia", + "ki" = "Kiribati", + "km" = "Comoros", + "kn" = "Saint Kitts and Nevis", + "kp" = "North Korea", + "kr" = "the Republic of Korea", + "kw" = "Kuwait", + "ky" = "the Cayman Islands", + "kz" = "Kazakhstan", + "la" = "Laos", + "lb" = "Lebanon", + "lc" = "Saint Lucia", + "li" = "Liechtenstein", + "lk" = "Sri Lanka", + "lr" = "Liberia", + "ls" = "Lesotho", + "lt" = "Lithuania", + "lu" = "Luxembourg", + "lv" = "Latvia", + "ly" = "Libya", + "ma" = "Morocco", + "mc" = "Monaco", + "md" = "the Republic of Moldova", + "me" = "Montenegro", + "mf" = "Saint Martin", + "mg" = "Madagascar", + "mh" = "the Marshall Islands", + "mk" = "Macedonia", + "ml" = "Mali", + "mm" = "Burma", + "mn" = "Mongolia", + "mo" = "Macau", + "mp" = "the Northern Mariana Islands", + "mq" = "Martinique", + "mr" = "Mauritania", + "ms" = "Montserrat", + "mt" = "Malta", + "mu" = "Mauritius", + "mv" = "the Maldives", + "mw" = "Malawi", + "mx" = "Mexico", + "my" = "Malaysia", + "mz" = "Mozambique", + "na" = "Namibia", + "nc" = "New Caledonia", + "ne" = "Niger", + "nf" = "Norfolk Island", + "ng" = "Nigeria", + "ni" = "Nicaragua", + "nl" = "the Netherlands", + "no" = "Norway", + "np" = "Nepal", + "nr" = "Nauru", + "nu" = "Niue", + "nz" = "New Zealand", + "om" = "Oman", + "pa" = "Panama", + "pe" = "Peru", + "pf" = "French Polynesia", + "pg" = "Papua New Guinea", + "ph" = "the Philippines", + "pk" = "Pakistan", + "pl" = "Poland", + "pm" = "Saint Pierre and Miquelon", + "pn" = "the Pitcairn Islands", + "pr" = "Puerto Rico", + "ps" = "the Palestinian Territory", + "pt" = "Portugal", + "pw" = "Palau", + "py" = "Paraguay", + "qa" = "Qatar", + "re" = "Reunion", + "ro" = "Romania", + "rs" = "Serbia", + "ru" = "Russia", + "rw" = "Rwanda", + "sa" = "Saudi Arabia", + "sb" = "the Solomon Islands", + "sc" = "the Seychelles", + "sd" = "Sudan", + "se" = "Sweden", + "sg" = "Singapore", + "sh" = "Saint Helena", + "si" = "Slovenia", + "sj" = "Svalbard and Jan Mayen", + "sk" = "Slovakia", + "sl" = "Sierra Leone", + "sm" = "San Marino", + "sn" = "Senegal", + "so" = "Somalia", + "sr" = "Suriname", + "ss" = "South Sudan", + "st" = "São Tomé and Príncipe", + "sv" = "El Salvador", + "sx" = "Sint Maarten", + "sy" = "the Syrian Arab Republic", + "sz" = "Swaziland", + "tc" = "Turks and Caicos Islands", + "td" = "Chad", + "tf" = "the French Southern Territories", + "tg" = "Togo", + "th" = "Thailand", + "tj" = "Tajikistan", + "tk" = "Tokelau", + "tl" = "East Timor", + "tm" = "Turkmenistan", + "tn" = "Tunisia", + "to" = "Tonga", + "tr" = "Turkey", + "tt" = "Trinidad and Tobago", + "tv" = "Tuvalu", + "tw" = "Taiwan", + "tz" = "the United Republic of Tanzania", + "ua" = "Ukraine", + "ug" = "Uganda", + "um" = "the United States Minor Outlying Islands", + "us" = "the United States", + "uy" = "Uruguay", + "uz" = "Uzbekistan", + "va" = "Vatican City", + "vc" = "Saint Vincent and the Grenadines", + "ve" = "Venezuela", + "vg" = "the British Virgin Islands", + "vi" = "the United States Virgin Islands", + "vn" = "Vietnam", + "vu" = "Vanuatu", + "wf" = "Wallis and Futuna", + "ws" = "Samoa", + "xk" = "Kosovo", + "ye" = "Yemen", + "yt" = "Mayotte", + "za" = "South Africa", + "zm" = "Zambia", + "zw" = "Zimbabwe") -source('graphs.R') -source('tables.R') +countryname <- function(country) { + res <- countrylist[[country]] + if (is.null(res)) + res <- "no-man's-land" + res +} + +# Helper function that takes date limits as input and returns major breaks as +# output. The main difference to the built-in major breaks is that we're trying +# harder to align major breaks with first days of weeks (Sundays), months, +# quarters, or years. +custom_breaks <- function(input) { + scales_index <- cut(as.numeric(max(input) - min(input)), + c(-1, 7, 12, 56, 180, 600, 2000, Inf), labels = FALSE) + from_print_format <- c("%F", "%F", "%Y-W%U-7", "%Y-%m-01", "%Y-01-01", + "%Y-01-01", "%Y-01-01")[scales_index] + from_parse_format <- ifelse(scales_index == 3, "%Y-W%U-%u", "%F") + by <- c("1 day", "2 days", "1 week", "1 month", "3 months", "1 year", + "2 years")[scales_index] + seq(as.Date(as.character(min(input), from_print_format), + format = from_parse_format), max(input), by = by) +} + +# Helper function that takes date limits as input and returns minor breaks as +# output. As opposed to the built-in minor breaks, we're not just adding one +# minor break half way through between two major breaks. Instead, we're plotting +# a minor break for every day, week, month, or quarter between two major breaks. +custom_minor_breaks <- function(input) { + scales_index <- cut(as.numeric(max(input) - min(input)), + c(-1, 7, 12, 56, 180, 600, 2000, Inf), labels = FALSE) + from_print_format <- c("%F", "%F", "%F", "%Y-W%U-7", "%Y-%m-01", "%Y-01-01", + "%Y-01-01")[scales_index] + from_parse_format <- ifelse(scales_index == 4, "%Y-W%U-%u", "%F") + by <- c("1 day", "1 day", "1 day", "1 week", "1 month", "3 months", + "1 year")[scales_index] + seq(as.Date(as.character(min(input), from_print_format), + format = from_parse_format), max(input), by = by) +} + +# Helper function that takes breaks as input and returns labels as output. We're +# going all ISO-8601 here, though we're not just writing %Y-%m-%d everywhere, +# but %Y-%m or %Y if all breaks are on the first of a month or even year. +custom_labels <- function(breaks) { + if (all(format(breaks, format = "%m-%d") == "01-01", na.rm = TRUE)) { + format(breaks, format = "%Y") + } else { + if (all(format(breaks, format = "%d") == "01", na.rm = TRUE)) { + format(breaks, format = "%Y-%m") + } else { + format(breaks, format = "%F") + } + } +} + +# Helper function to format numbers in non-scientific notation with spaces as +# thousands separator. +formatter <- function(x, ...) { + format(x, ..., scientific = FALSE, big.mark = " ") +} + +theme_update( + # Make plot title centered, and leave some room to the plot. + plot.title = element_text(hjust = 0.5, margin = margin(b = 11)), + + # Leave a little more room to the right for long x axis labels. + plot.margin = margin(5.5, 11, 5.5, 5.5) +) + +# Set the default line size of geom_line() to 1. +update_geom_defaults("line", list(size = 1)) + +copyright_notice <- "The Tor Project - https://metrics.torproject.org/" + +stats_dir <- "/srv/metrics.torproject.org/metrics/shared/stats/" + +rdata_dir <- "/srv/metrics.torproject.org/metrics/shared/RData/" + +# Helper function that copies the appropriate no data object to filename. +copy_no_data <- function(filename) { + len <- nchar(filename) + extension <- substr(filename, len - 3, len) + if (".csv" == extension) { + write("# No data available for the given parameters.", file=filename) + } else { + file.copy(paste(rdata_dir, "no-data-available", extension, sep = ""), + filename) + } +} + +# Helper function wrapping calls into error handling. +robust_call <- function(wrappee, filename) { + tryCatch(eval(wrappee), error = function(e) copy_no_data(filename), + finally = if (!file.exists(filename) || file.size(filename) == 0) { + copy_no_data(filename) + }) +} + +# Write the result of the given FUN, typically a prepare_ function, as .csv file +# to the given path_p. +write_data <- function(FUN, ..., path_p) { + FUN(...) %>% + write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") +} + +# Disable readr's automatic progress bar. +options(readr.show_progress = FALSE) + +prepare_networksize <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "networksize.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + relays = col_double(), + bridges = col_double())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) +} + +plot_networksize <- function(start_p, end_p, path_p) { + prepare_networksize(start_p, end_p) %>% + gather(variable, value, -date) %>% + complete(date = full_seq(date, period = 1), + variable = c("relays", "bridges")) %>% + ggplot(aes(x = date, y = value, colour = variable)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + scale_colour_hue("", breaks = c("relays", "bridges"), + labels = c("Relays", "Bridges")) + + ggtitle("Number of relays") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_versions <- function(start_p = NULL, end_p = NULL) { + read_csv(paste(stats_dir, "versions.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + version = col_character(), + relays = col_double())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) +} + +plot_versions <- function(start_p, end_p, path_p) { + s <- prepare_versions(start_p, end_p) + known_versions <- unique(s$version) + getPalette <- colorRampPalette(brewer.pal(12, "Paired")) + colours <- data.frame(breaks = known_versions, + values = rep(brewer.pal(min(12, length(known_versions)), "Paired"), + len = length(known_versions)), + stringsAsFactors = FALSE) + versions <- s[s$version %in% known_versions, ] + visible_versions <- sort(unique(versions$version)) + versions <- versions %>% + complete(date = full_seq(date, period = 1), nesting(version)) %>% + ggplot(aes(x = date, y = relays, colour = version)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + scale_colour_manual(name = "Tor version", + values = colours[colours$breaks %in% visible_versions, 2], + breaks = visible_versions) + + ggtitle("Relay versions") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_platforms <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "platforms.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + platform = col_factor(levels = NULL), + relays = col_double())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + mutate(platform = tolower(platform)) %>% + spread(platform, relays) +} + +plot_platforms <- function(start_p, end_p, path_p) { + prepare_platforms(start_p, end_p) %>% + gather(platform, relays, -date) %>% + complete(date = full_seq(date, period = 1), nesting(platform)) %>% + ggplot(aes(x = date, y = relays, colour = platform)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + scale_colour_manual(name = "Platform", + breaks = c("linux", "macos", "bsd", "windows", "other"), + labels = c("Linux", "macOS", "BSD", "Windows", "Other"), + values = c("linux" = "#56B4E9", "macos" = "#333333", "bsd" = "#E69F00", + "windows" = "#0072B2", "other" = "#009E73")) + + ggtitle("Relay platforms") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_dirbytes <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "bandwidth.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + isexit = col_logical(), + isguard = col_logical(), + bwread = col_skip(), + bwwrite = col_skip(), + dirread = col_double(), + dirwrite = col_double())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(is.na(isexit)) %>% + filter(is.na(isguard)) %>% + mutate(dirread = dirread * 8 / 1e9, + dirwrite = dirwrite * 8 / 1e9) %>% + select(date, dirread, dirwrite) +} + +plot_dirbytes <- function(start_p, end_p, path_p) { + prepare_dirbytes(start_p, end_p) %>% + gather(variable, value, -date) %>% + complete(date = full_seq(date, period = 1), nesting(variable)) %>% + ggplot(aes(x = date, y = value, colour = variable)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = unit_format(unit = "Gbit/s"), + limits = c(0, NA)) + + scale_colour_hue(name = "", + breaks = c("dirwrite", "dirread"), + labels = c("Written dir bytes", "Read dir bytes")) + + ggtitle("Number of bytes spent on answering directory requests") + + labs(caption = copyright_notice) + + theme(legend.position = "top") + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_relayflags <- function(start_p = NULL, end_p = NULL, flag_p = NULL) { + read_csv(file = paste(stats_dir, "relayflags.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + flag = col_factor(levels = NULL), + relays = col_double())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(if (!is.null(flag_p)) flag %in% flag_p else TRUE) +} + +plot_relayflags <- function(start_p, end_p, flag_p, path_p) { + prepare_relayflags(start_p, end_p, flag_p) %>% + complete(date = full_seq(date, period = 1), flag = unique(flag)) %>% + ggplot(aes(x = date, y = relays, colour = flag)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + scale_colour_manual(name = "Relay flags", values = c("#E69F00", + "#56B4E9", "#009E73", "#EE6A50", "#000000", "#0072B2"), + breaks = flag_p, labels = flag_p) + + ggtitle("Number of relays with relay flags assigned") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_torperf <- function(start_p = NULL, end_p = NULL, server_p = NULL, + filesize_p = NULL) { + read_csv(file = paste(stats_dir, "torperf-1.1.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + filesize = col_double(), + source = col_character(), + server = col_character(), + q1 = col_double(), + md = col_double(), + q3 = col_double(), + timeouts = col_skip(), + failures = col_skip(), + requests = col_skip())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(if (!is.null(server_p)) server == server_p else TRUE) %>% + filter(if (!is.null(filesize_p)) + filesize == ifelse(filesize_p == "50kb", 50 * 1024, + ifelse(filesize_p == "1mb", 1024 * 1024, 5 * 1024 * 1024)) else + TRUE) %>% + transmute(date, filesize, source, server, q1 = q1 / 1e3, md = md / 1e3, + q3 = q3 / 1e3) +} + +plot_torperf <- function(start_p, end_p, server_p, filesize_p, path_p) { + prepare_torperf(start_p, end_p, server_p, filesize_p) %>% + filter(source != "") %>% + complete(date = full_seq(date, period = 1), nesting(source)) %>% + ggplot(aes(x = date, y = md, ymin = q1, ymax = q3, fill = source)) + + geom_ribbon(alpha = 0.5) + + geom_line(aes(colour = source), size = 0.75) + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = unit_format(unit = "s"), + limits = c(0, NA)) + + scale_fill_hue(name = "Source") + + scale_colour_hue(name = "Source") + + ggtitle(paste("Time to complete", + ifelse(filesize_p == "50kb", "50 KiB", + ifelse(filesize_p == "1mb", "1 MiB", "5 MiB")), + "request to", server_p, "server")) + + labs(caption = copyright_notice) + + theme(legend.position = "top") + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_torperf_failures <- function(start_p = NULL, end_p = NULL, + server_p = NULL, filesize_p = NULL) { + read_csv(file = paste(stats_dir, "torperf-1.1.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + filesize = col_double(), + source = col_character(), + server = col_character(), + q1 = col_skip(), + md = col_skip(), + q3 = col_skip(), + timeouts = col_double(), + failures = col_double(), + requests = col_double())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(if (!is.null(filesize_p)) + filesize == ifelse(filesize_p == "50kb", 50 * 1024, + ifelse(filesize_p == "1mb", 1024 * 1024, 5 * 1024 * 1024)) else + TRUE) %>% + filter(if (!is.null(server_p)) server == server_p else TRUE) %>% + filter(requests > 0) %>% + transmute(date, filesize, source, server, timeouts = timeouts / requests, + failures = failures / requests) +} + +plot_torperf_failures <- function(start_p, end_p, server_p, filesize_p, + path_p) { + prepare_torperf_failures(start_p, end_p, server_p, filesize_p) %>% + filter(source != "") %>% + gather(variable, value, -c(date, filesize, source, server)) %>% + mutate(variable = factor(variable, levels = c("timeouts", "failures"), + labels = c("Timeouts", "Failures"))) %>% + ggplot(aes(x = date, y = value, colour = source)) + + geom_point(size = 2, alpha = 0.5) + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = percent, limits = c(0, NA)) + + scale_colour_hue(name = "Source") + + facet_grid(variable ~ .) + + ggtitle(paste("Timeouts and failures of", + ifelse(filesize_p == "50kb", "50 KiB", + ifelse(filesize_p == "1mb", "1 MiB", "5 MiB")), + "requests to", server_p, "server")) + + labs(caption = copyright_notice) + + theme(legend.position = "top") + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_onionperf_buildtimes <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "buildtimes.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + source = col_character(), + position = col_double(), + q1 = col_double(), + md = col_double(), + q3 = col_double())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) +} + +plot_onionperf_buildtimes <- function(start_p, end_p, path_p) { + prepare_onionperf_buildtimes(start_p, end_p) %>% + filter(source != "") %>% + mutate(date = as.Date(date), + position = factor(position, levels = seq(1, 3, 1), + labels = c("1st hop", "2nd hop", "3rd hop"))) %>% + complete(date = full_seq(date, period = 1), nesting(source, position)) %>% + ggplot(aes(x = date, y = md, ymin = q1, ymax = q3, fill = source)) + + geom_ribbon(alpha = 0.5) + + geom_line(aes(colour = source), size = 0.75) + + facet_grid(position ~ .) + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = unit_format(unit = "ms"), + limits = c(0, NA)) + + scale_fill_hue(name = "Source") + + scale_colour_hue(name = "Source") + + ggtitle("Circuit build times") + + labs(caption = copyright_notice) + + theme(legend.position = "top") + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_onionperf_latencies <- function(start_p = NULL, end_p = NULL, + server_p = NULL) { + read_csv(file = paste(stats_dir, "latencies.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + source = col_character(), + server = col_character(), + q1 = col_double(), + md = col_double(), + q3 = col_double())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(if (!is.null(server_p)) server == server_p else TRUE) +} + +plot_onionperf_latencies <- function(start_p, end_p, server_p, path_p) { + prepare_onionperf_latencies(start_p, end_p, server_p) %>% + filter(source != "") %>% + complete(date = full_seq(date, period = 1), nesting(source)) %>% + ggplot(aes(x = date, y = md, ymin = q1, ymax = q3, fill = source)) + + geom_ribbon(alpha = 0.5) + + geom_line(aes(colour = source), size = 0.75) + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = unit_format(unit = "ms"), + limits = c(0, NA)) + + scale_fill_hue(name = "Source") + + scale_colour_hue(name = "Source") + + ggtitle(paste("Circuit round-trip latencies to", server_p, "server")) + + labs(caption = copyright_notice) + + theme(legend.position = "top") + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_connbidirect <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "connbidirect2.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + direction = col_factor(levels = NULL), + quantile = col_double(), + fraction = col_double())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + mutate(quantile = paste("X", quantile, sep = ""), + fraction = fraction / 100) %>% + spread(quantile, fraction) %>% + rename(q1 = X0.25, md = X0.5, q3 = X0.75) +} + +plot_connbidirect <- function(start_p, end_p, path_p) { + prepare_connbidirect(start_p, end_p) %>% + complete(date = full_seq(date, period = 1), nesting(direction)) %>% + ggplot(aes(x = date, y = md, ymin = q1, ymax = q3, fill = direction)) + + geom_ribbon(alpha = 0.5) + + geom_line(aes(colour = direction), size = 0.75) + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = percent, limits = c(0, NA)) + + scale_colour_hue(name = "Medians and interquartile ranges", + breaks = c("both", "write", "read"), + labels = c("Both reading and writing", "Mostly writing", + "Mostly reading")) + + scale_fill_hue(name = "Medians and interquartile ranges", + breaks = c("both", "write", "read"), + labels = c("Both reading and writing", "Mostly writing", + "Mostly reading")) + + ggtitle("Fraction of connections used uni-/bidirectionally") + + labs(caption = copyright_notice) + + theme(legend.position = "top") + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_bandwidth_flags <- function(start_p = NULL, end_p = NULL) { + advbw <- read_csv(file = paste(stats_dir, "advbw.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + isexit = col_logical(), + isguard = col_logical(), + advbw = col_double())) %>% + transmute(date, have_guard_flag = isguard, have_exit_flag = isexit, + variable = "advbw", value = advbw * 8 / 1e9) + bwhist <- read_csv(file = paste(stats_dir, "bandwidth.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + isexit = col_logical(), + isguard = col_logical(), + bwread = col_double(), + bwwrite = col_double(), + dirread = col_double(), + dirwrite = col_double())) %>% + transmute(date, have_guard_flag = isguard, have_exit_flag = isexit, + variable = "bwhist", value = (bwread + bwwrite) * 8 / 2e9) + rbind(advbw, bwhist) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(!is.na(have_exit_flag)) %>% + filter(!is.na(have_guard_flag)) %>% + spread(variable, value) +} + +plot_bandwidth_flags <- function(start_p, end_p, path_p) { + prepare_bandwidth_flags(start_p, end_p) %>% + gather(variable, value, c(advbw, bwhist)) %>% + unite(flags, have_guard_flag, have_exit_flag) %>% + mutate(flags = factor(flags, + levels = c("FALSE_TRUE", "TRUE_TRUE", "TRUE_FALSE", "FALSE_FALSE"), + labels = c("Exit only", "Guard and Exit", "Guard only", + "Neither Guard nor Exit"))) %>% + mutate(variable = ifelse(variable == "advbw", + "Advertised bandwidth", "Consumed bandwidth")) %>% + ggplot(aes(x = date, y = value, fill = flags)) + + geom_area() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = unit_format(unit = "Gbit/s"), + limits = c(0, NA)) + + scale_fill_manual(name = "", + values = c("#03B3FF", "#39FF02", "#FFFF00", "#AAAA99")) + + facet_grid(variable ~ .) + + ggtitle("Advertised and consumed bandwidth by relay flags") + + labs(caption = copyright_notice) + + theme(legend.position = "top") + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_userstats_relay_country <- function(start_p = NULL, end_p = NULL, + country_p = NULL, events_p = NULL) { + read_csv(file = paste(stats_dir, "clients.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + node = col_character(), + country = col_character(), + transport = col_character(), + version = col_character(), + lower = col_double(), + upper = col_double(), + clients = col_double(), + frac = col_double()), + na = character()) %>% + filter(node == "relay") %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(if (!is.null(country_p)) + country == ifelse(country_p == "all", "", country_p) else TRUE) %>% + filter(transport == "") %>% + filter(version == "") %>% + select(date, country, clients, lower, upper, frac) %>% + rename(users = clients) +} + +plot_userstats_relay_country <- function(start_p, end_p, country_p, events_p, + path_p) { + u <- prepare_userstats_relay_country(start_p, end_p, country_p, events_p) %>% + complete(date = full_seq(date, period = 1)) + plot <- ggplot(u, aes(x = date, y = users)) + if (length(na.omit(u$users)) > 0 & events_p != "off" & + country_p != "all") { + upturns <- u[u$users > u$upper, c("date", "users")] + downturns <- u[u$users < u$lower, c("date", "users")] + if (events_p == "on") { + u[!is.na(u$lower) & u$lower < 0, "lower"] <- 0 + plot <- plot + + geom_ribbon(data = u, aes(ymin = lower, ymax = upper), fill = "gray") + } + if (length(upturns$date) > 0) + plot <- plot + + geom_point(data = upturns, aes(x = date, y = users), size = 5, + colour = "dodgerblue2") + if (length(downturns$date) > 0) + plot <- plot + + geom_point(data = downturns, aes(x = date, y = users), size = 5, + colour = "firebrick2") + } + plot <- plot + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + ggtitle(paste("Directly connecting users", + ifelse(country_p == "all", "", + paste(" from", countryname(country_p))), sep = "")) + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_userstats_bridge_country <- function(start_p = NULL, end_p = NULL, + country_p = NULL) { + read_csv(file = paste(stats_dir, "clients.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + node = col_character(), + country = col_character(), + transport = col_character(), + version = col_character(), + lower = col_double(), + upper = col_double(), + clients = col_double(), + frac = col_double()), + na = character()) %>% + filter(node == "bridge") %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(if (!is.null(country_p)) + country == ifelse(country_p == "all", "", country_p) else TRUE) %>% + filter(transport == "") %>% + filter(version == "") %>% + select(date, country, clients, frac) %>% + rename(users = clients) +} + +plot_userstats_bridge_country <- function(start_p, end_p, country_p, path_p) { + prepare_userstats_bridge_country(start_p, end_p, country_p) %>% + complete(date = full_seq(date, period = 1)) %>% + ggplot(aes(x = date, y = users)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + ggtitle(paste("Bridge users", + ifelse(country_p == "all", "", + paste(" from", countryname(country_p))), sep = "")) + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_userstats_bridge_transport <- function(start_p = NULL, end_p = NULL, + transport_p = NULL) { + u <- read_csv(file = paste(stats_dir, "clients.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + node = col_character(), + country = col_character(), + transport = col_character(), + version = col_character(), + lower = col_double(), + upper = col_double(), + clients = col_double(), + frac = col_double())) %>% + filter(node == "bridge") %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(is.na(country)) %>% + filter(is.na(version)) %>% + filter(!is.na(transport)) %>% + select(date, transport, clients, frac) + if (is.null(transport_p) || "!<OR>" %in% transport_p) { + n <- u %>% + filter(transport != "<OR>") %>% + group_by(date, frac) %>% + summarize(clients = sum(clients)) + u <- rbind(u, data.frame(date = n$date, transport = "!<OR>", + clients = n$clients, frac = n$frac)) + } + u %>% + filter(if (!is.null(transport_p)) transport %in% transport_p else TRUE) %>% + select(date, transport, clients, frac) %>% + rename(users = clients) %>% + arrange(date, transport) +} + +plot_userstats_bridge_transport <- function(start_p, end_p, transport_p, + path_p) { + if (length(transport_p) > 1) { + title <- paste("Bridge users by transport") + } else { + title <- paste("Bridge users using", + ifelse(transport_p == "<??>", "unknown pluggable transport(s)", + ifelse(transport_p == "<OR>", "default OR protocol", + ifelse(transport_p == "!<OR>", "any pluggable transport", + ifelse(transport_p == "fte", "FTE", + ifelse(transport_p == "websocket", "Flash proxy/websocket", + paste("transport", transport_p))))))) + } + u <- prepare_userstats_bridge_transport(start_p, end_p, transport_p) %>% + complete(date = full_seq(date, period = 1), nesting(transport)) + if (length(transport_p) > 1) { + plot <- ggplot(u, aes(x = date, y = users, colour = transport)) + } else { + plot <- ggplot(u, aes(x = date, y = users)) + } + plot <- plot + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + ggtitle(title) + + labs(caption = copyright_notice) + if (length(transport_p) > 1) { + plot <- plot + + scale_colour_hue(name = "", breaks = transport_p, + labels = ifelse(transport_p == "<??>", "Unknown PT", + ifelse(transport_p == "<OR>", "Default OR protocol", + ifelse(transport_p == "!<OR>", "Any PT", + ifelse(transport_p == "fte", "FTE", + ifelse(transport_p == "websocket", "Flash proxy/websocket", + transport_p)))))) + } + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_userstats_bridge_version <- function(start_p = NULL, end_p = NULL, + version_p = NULL) { + read_csv(file = paste(stats_dir, "clients.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + node = col_character(), + country = col_character(), + transport = col_character(), + version = col_character(), + lower = col_double(), + upper = col_double(), + clients = col_double(), + frac = col_double())) %>% + filter(node == "bridge") %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(is.na(country)) %>% + filter(is.na(transport)) %>% + filter(if (!is.null(version_p)) version == version_p else TRUE) %>% + select(date, version, clients, frac) %>% + rename(users = clients) +} + +plot_userstats_bridge_version <- function(start_p, end_p, version_p, path_p) { + prepare_userstats_bridge_version(start_p, end_p, version_p) %>% + complete(date = full_seq(date, period = 1)) %>% + ggplot(aes(x = date, y = users)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + ggtitle(paste("Bridge users using IP", version_p, sep = "")) + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_userstats_bridge_combined <- function(start_p = NULL, end_p = NULL, + country_p = NULL) { + if (!is.null(country_p) && country_p == "all") { + prepare_userstats_bridge_country(start_p, end_p, country_p) + } else { + read_csv(file = paste(stats_dir, "userstats-combined.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + node = col_skip(), + country = col_character(), + transport = col_character(), + version = col_skip(), + frac = col_double(), + low = col_double(), + high = col_double()), + na = character()) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(if (!is.null(country_p)) country == country_p else TRUE) %>% + select(date, country, transport, low, high, frac) %>% + arrange(date, country, transport) + } +} + +plot_userstats_bridge_combined <- function(start_p, end_p, country_p, path_p) { + if (country_p == "all") { + plot_userstats_bridge_country(start_p, end_p, country_p, path_p) + } else { + top <- 3 + u <- prepare_userstats_bridge_combined(start_p, end_p, country_p) + a <- aggregate(list(mid = (u$high + u$low) / 2), + by = list(transport = u$transport), FUN = sum) + a <- a[order(a$mid, decreasing = TRUE)[1:top], ] + u <- u[u$transport %in% a$transport, ] %>% + complete(date = full_seq(date, period = 1), nesting(country, transport)) + title <- paste("Bridge users by transport from ", + countryname(country_p), sep = "") + ggplot(u, aes(x = as.Date(date), ymin = low, ymax = high, + fill = transport)) + + geom_ribbon(alpha = 0.5, size = 0.5) + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", limits = c(0, NA), labels = formatter) + + scale_colour_hue("Top-3 transports") + + scale_fill_hue("Top-3 transports") + + ggtitle(title) + + labs(caption = copyright_notice) + + theme(legend.position = "top") + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) + } +} + +prepare_advbwdist_perc <- function(start_p = NULL, end_p = NULL, p_p = NULL) { + read_csv(file = paste(stats_dir, "advbwdist.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + isexit = col_logical(), + relay = col_skip(), + percentile = col_integer(), + advbw = col_double())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(if (!is.null(p_p)) percentile %in% as.numeric(p_p) else + percentile != "") %>% + transmute(date, percentile = as.factor(percentile), + variable = ifelse(is.na(isexit), "all", "exits"), + advbw = advbw * 8 / 1e9) %>% + spread(variable, advbw) %>% + rename(p = percentile) +} + +plot_advbwdist_perc <- function(start_p, end_p, p_p, path_p) { + prepare_advbwdist_perc(start_p, end_p, p_p) %>% + gather(variable, advbw, -c(date, p)) %>% + mutate(variable = ifelse(variable == "all", "All relays", + "Exits only")) %>% + complete(date = full_seq(date, period = 1), nesting(p, variable)) %>% + ggplot(aes(x = date, y = advbw, colour = p)) + + facet_grid(variable ~ .) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = unit_format(unit = "Gbit/s"), + limits = c(0, NA)) + + scale_colour_hue(name = "Percentile") + + ggtitle("Advertised bandwidth distribution") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_advbwdist_relay <- function(start_p = NULL, end_p = NULL, n_p = NULL) { + read_csv(file = paste(stats_dir, "advbwdist.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + isexit = col_logical(), + relay = col_integer(), + percentile = col_skip(), + advbw = col_double())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(if (!is.null(n_p)) relay %in% as.numeric(n_p) else + relay != "") %>% + transmute(date, relay = as.factor(relay), + variable = ifelse(is.na(isexit), "all", "exits"), + advbw = advbw * 8 / 1e9) %>% + spread(variable, advbw) %>% + rename(n = relay) +} + +plot_advbwdist_relay <- function(start_p, end_p, n_p, path_p) { + prepare_advbwdist_relay(start_p, end_p, n_p) %>% + gather(variable, advbw, -c(date, n)) %>% + mutate(variable = ifelse(variable == "all", "All relays", + "Exits only")) %>% + complete(date = full_seq(date, period = 1), nesting(n, variable)) %>% + ggplot(aes(x = date, y = advbw, colour = n)) + + facet_grid(variable ~ .) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = unit_format(unit = "Gbit/s"), + limits = c(0, NA)) + + scale_colour_hue(name = "n") + + ggtitle("Advertised bandwidth of n-th fastest relays") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_hidserv_dir_onions_seen <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "hidserv.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + type = col_factor(levels = NULL), + wmean = col_skip(), + wmedian = col_skip(), + wiqm = col_double(), + frac = col_double(), + stats = col_skip())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(type == "dir-onions-seen") %>% + transmute(date, onions = ifelse(frac >= 0.01, wiqm, NA), frac) +} + +plot_hidserv_dir_onions_seen <- function(start_p, end_p, path_p) { + prepare_hidserv_dir_onions_seen(start_p, end_p) %>% + complete(date = full_seq(date, period = 1)) %>% + ggplot(aes(x = date, y = onions)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", limits = c(0, NA), labels = formatter) + + ggtitle("Unique .onion addresses") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_hidserv_rend_relayed_cells <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "hidserv.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + type = col_factor(levels = NULL), + wmean = col_skip(), + wmedian = col_skip(), + wiqm = col_double(), + frac = col_double(), + stats = col_skip())) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(type == "rend-relayed-cells") %>% + transmute(date, + relayed = ifelse(frac >= 0.01, wiqm * 8 * 512 / (86400 * 1e9), NA), frac) +} + +plot_hidserv_rend_relayed_cells <- function(start_p, end_p, path_p) { + prepare_hidserv_rend_relayed_cells(start_p, end_p) %>% + complete(date = full_seq(date, period = 1)) %>% + ggplot(aes(x = date, y = relayed)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = unit_format(unit = "Gbit/s"), + limits = c(0, NA)) + + ggtitle("Onion-service traffic") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_webstats_tb <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), + col_types = cols( + log_date = col_date(format = ""), + request_type = col_factor(levels = NULL), + platform = col_skip(), + channel = col_skip(), + locale = col_skip(), + incremental = col_skip(), + count = col_double())) %>% + filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% + filter(request_type %in% c("tbid", "tbsd", "tbup", "tbur")) %>% + group_by(log_date, request_type) %>% + summarize(count = sum(count)) %>% + spread(request_type, count) %>% + rename(date = log_date, initial_downloads = tbid, + signature_downloads = tbsd, update_pings = tbup, + update_requests = tbur) +} + +plot_webstats_tb <- function(start_p, end_p, path_p) { + prepare_webstats_tb(start_p, end_p) %>% + gather(request_type, count, -date) %>% + mutate(request_type = factor(request_type, + levels = c("initial_downloads", "signature_downloads", "update_pings", + "update_requests"), + labels = c("Initial downloads", "Signature downloads", "Update pings", + "Update requests"))) %>% + ungroup() %>% + complete(date = full_seq(date, period = 1), nesting(request_type)) %>% + ggplot(aes(x = date, y = count)) + + geom_point() + + geom_line() + + facet_grid(request_type ~ ., scales = "free_y") + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + theme(strip.text.y = element_text(angle = 0, hjust = 0, size = rel(1.5)), + strip.background = element_rect(fill = NA)) + + ggtitle("Tor Browser downloads and updates") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_webstats_tb_platform <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), + col_types = cols( + log_date = col_date(format = ""), + request_type = col_factor(levels = NULL), + platform = col_factor(levels = NULL), + channel = col_skip(), + locale = col_skip(), + incremental = col_skip(), + count = col_double())) %>% + filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% + filter(request_type %in% c("tbid", "tbup")) %>% + group_by(log_date, platform, request_type) %>% + summarize(count = sum(count)) %>% + spread(request_type, count, fill = 0) %>% + rename(date = log_date, initial_downloads = tbid, update_pings = tbup) +} + +plot_webstats_tb_platform <- function(start_p, end_p, path_p) { + prepare_webstats_tb_platform(start_p, end_p) %>% + gather(request_type, count, -c(date, platform)) %>% + mutate(request_type = factor(request_type, + levels = c("initial_downloads", "update_pings"), + labels = c("Initial downloads", "Update pings"))) %>% + ungroup() %>% + complete(date = full_seq(date, period = 1), + nesting(platform, request_type)) %>% + ggplot(aes(x = date, y = count, colour = platform)) + + geom_point() + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + scale_colour_hue(name = "Platform", + breaks = c("w", "m", "l", "o", ""), + labels = c("Windows", "macOS", "Linux", "Other", "Unknown")) + + facet_grid(request_type ~ ., scales = "free_y") + + theme(strip.text.y = element_text(angle = 0, hjust = 0, size = rel(1.5)), + strip.background = element_rect(fill = NA), + legend.position = "top") + + ggtitle("Tor Browser downloads and updates by platform") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_webstats_tb_locale <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), + col_types = cols( + log_date = col_date(format = ""), + request_type = col_factor(levels = NULL), + platform = col_skip(), + channel = col_skip(), + locale = col_factor(levels = NULL), + incremental = col_skip(), + count = col_double())) %>% + filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% + filter(request_type %in% c("tbid", "tbup")) %>% + rename(date = log_date) %>% + group_by(date, locale, request_type) %>% + summarize(count = sum(count)) %>% + mutate(request_type = factor(request_type, levels = c("tbid", "tbup"))) %>% + spread(request_type, count, fill = 0) %>% + rename(initial_downloads = tbid, update_pings = tbup) +} + +plot_webstats_tb_locale <- function(start_p, end_p, path_p) { + d <- prepare_webstats_tb_locale(start_p, end_p) %>% + gather(request_type, count, -c(date, locale)) %>% + mutate(request_type = factor(request_type, + levels = c("initial_downloads", "update_pings"), + labels = c("Initial downloads", "Update pings"))) + e <- d + e <- aggregate(list(count = e$count), by = list(locale = e$locale), FUN = sum) + e <- e[order(e$count, decreasing = TRUE), ] + e <- e[1:5, ] + d <- aggregate(list(count = d$count), by = list(date = d$date, + request_type = d$request_type, + locale = ifelse(d$locale %in% e$locale, d$locale, "(other)")), FUN = sum) + d %>% + complete(date = full_seq(date, period = 1), + nesting(locale, request_type)) %>% + ggplot(aes(x = date, y = count, colour = locale)) + + geom_point() + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + scale_colour_hue(name = "Locale", + breaks = c(e$locale, "(other)"), + labels = c(as.character(e$locale), "Other")) + + facet_grid(request_type ~ ., scales = "free_y") + + theme(strip.text.y = element_text(angle = 0, hjust = 0, size = rel(1.5)), + strip.background = element_rect(fill = NA), + legend.position = "top") + + guides(col = guide_legend(nrow = 1)) + + ggtitle("Tor Browser downloads and updates by locale") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_webstats_tm <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), + col_types = cols( + log_date = col_date(format = ""), + request_type = col_factor(levels = NULL), + platform = col_skip(), + channel = col_skip(), + locale = col_skip(), + incremental = col_skip(), + count = col_double())) %>% + filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% + filter(request_type %in% c("tmid", "tmup")) %>% + group_by(log_date, request_type) %>% + summarize(count = sum(count)) %>% + mutate(request_type = factor(request_type, levels = c("tmid", "tmup"))) %>% + spread(request_type, count, drop = FALSE, fill = 0) %>% + rename(date = log_date, initial_downloads = tmid, update_pings = tmup) +} + +plot_webstats_tm <- function(start_p, end_p, path_p) { + prepare_webstats_tm(start_p, end_p) %>% + gather(request_type, count, -date) %>% + mutate(request_type = factor(request_type, + levels = c("initial_downloads", "update_pings"), + labels = c("Initial downloads", "Update pings"))) %>% + ungroup() %>% + complete(date = full_seq(date, period = 1), nesting(request_type)) %>% + ggplot(aes(x = date, y = count)) + + geom_point() + + geom_line() + + facet_grid(request_type ~ ., scales = "free_y") + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + theme(strip.text.y = element_text(angle = 0, hjust = 0, size = rel(1.5)), + strip.background = element_rect(fill = NA)) + + ggtitle("Tor Messenger downloads and updates") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_relays_ipv6 <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "ipv6servers.csv", sep = ""), + col_types = cols( + valid_after_date = col_date(format = ""), + server = col_factor(levels = NULL), + guard_relay = col_skip(), + exit_relay = col_skip(), + announced_ipv6 = col_logical(), + exiting_ipv6_relay = col_logical(), + reachable_ipv6_relay = col_logical(), + server_count_sum_avg = col_double(), + advertised_bandwidth_bytes_sum_avg = col_skip())) %>% + filter(if (!is.null(start_p)) + valid_after_date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) + valid_after_date <= as.Date(end_p) else TRUE) %>% + filter(server == "relay") %>% + group_by(valid_after_date) %>% + summarize(total = sum(server_count_sum_avg), + announced = sum(server_count_sum_avg[announced_ipv6]), + reachable = sum(server_count_sum_avg[reachable_ipv6_relay]), + exiting = sum(server_count_sum_avg[exiting_ipv6_relay])) %>% + rename(date = valid_after_date) +} + +plot_relays_ipv6 <- function(start_p, end_p, path_p) { + prepare_relays_ipv6(start_p, end_p) %>% + complete(date = full_seq(date, period = 1)) %>% + gather(category, count, -date) %>% + ggplot(aes(x = date, y = count, colour = category)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + scale_colour_hue(name = "", h.start = 90, + breaks = c("total", "announced", "reachable", "exiting"), + labels = c("Total (IPv4) OR", "IPv6 announced OR", "IPv6 reachable OR", + "IPv6 exiting")) + + ggtitle("Relays by IP version") + + labs(caption = copyright_notice) + + theme(legend.position = "top") + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_bridges_ipv6 <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "ipv6servers.csv", sep = ""), + col_types = cols( + valid_after_date = col_date(format = ""), + server = col_factor(levels = NULL), + guard_relay = col_skip(), + exit_relay = col_skip(), + announced_ipv6 = col_logical(), + exiting_ipv6_relay = col_skip(), + reachable_ipv6_relay = col_skip(), + server_count_sum_avg = col_double(), + advertised_bandwidth_bytes_sum_avg = col_skip())) %>% + filter(if (!is.null(start_p)) + valid_after_date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) + valid_after_date <= as.Date(end_p) else TRUE) %>% + filter(server == "bridge") %>% + group_by(valid_after_date) %>% + summarize(total = sum(server_count_sum_avg), + announced = sum(server_count_sum_avg[announced_ipv6])) %>% + rename(date = valid_after_date) +} + +plot_bridges_ipv6 <- function(start_p, end_p, path_p) { + prepare_bridges_ipv6(start_p, end_p) %>% + complete(date = full_seq(date, period = 1)) %>% + gather(category, count, -date) %>% + ggplot(aes(x = date, y = count, colour = category)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + scale_colour_hue(name = "", h.start = 90, + breaks = c("total", "announced"), + labels = c("Total (IPv4) OR", "IPv6 announced OR")) + + ggtitle("Bridges by IP version") + + labs(caption = copyright_notice) + + theme(legend.position = "top") + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_advbw_ipv6 <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "ipv6servers.csv", sep = ""), + col_types = cols( + valid_after_date = col_date(format = ""), + server = col_factor(levels = NULL), + guard_relay = col_logical(), + exit_relay = col_logical(), + announced_ipv6 = col_logical(), + exiting_ipv6_relay = col_logical(), + reachable_ipv6_relay = col_logical(), + server_count_sum_avg = col_skip(), + advertised_bandwidth_bytes_sum_avg = col_double())) %>% + filter(if (!is.null(start_p)) + valid_after_date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) + valid_after_date <= as.Date(end_p) else TRUE) %>% + filter(server == "relay") %>% + mutate(advertised_bandwidth_bytes_sum_avg = + advertised_bandwidth_bytes_sum_avg * 8 / 1e9) %>% + group_by(valid_after_date) %>% + summarize(total = sum(advertised_bandwidth_bytes_sum_avg), + total_guard = sum(advertised_bandwidth_bytes_sum_avg[guard_relay]), + total_exit = sum(advertised_bandwidth_bytes_sum_avg[exit_relay]), + reachable_guard = sum(advertised_bandwidth_bytes_sum_avg[ + reachable_ipv6_relay & guard_relay]), + reachable_exit = sum(advertised_bandwidth_bytes_sum_avg[ + reachable_ipv6_relay & exit_relay]), + exiting = sum(advertised_bandwidth_bytes_sum_avg[ + exiting_ipv6_relay])) %>% + rename(date = valid_after_date) +} + +plot_advbw_ipv6 <- function(start_p, end_p, path_p) { + prepare_advbw_ipv6(start_p, end_p) %>% + complete(date = full_seq(date, period = 1)) %>% + gather(category, advbw, -date) %>% + ggplot(aes(x = date, y = advbw, colour = category)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = unit_format(unit = "Gbit/s"), + limits = c(0, NA)) + + scale_colour_hue(name = "", h.start = 90, + breaks = c("total", "total_guard", "total_exit", "reachable_guard", + "reachable_exit", "exiting"), + labels = c("Total (IPv4) OR", "Guard total (IPv4)", "Exit total (IPv4)", + "Reachable guard IPv6 OR", "Reachable exit IPv6 OR", "IPv6 exiting")) + + ggtitle("Advertised bandwidth by IP version") + + labs(caption = copyright_notice) + + theme(legend.position = "top") + + guides(colour = guide_legend(nrow = 2, byrow = TRUE)) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +prepare_totalcw <- function(start_p = NULL, end_p = NULL) { + read_csv(file = paste(stats_dir, "totalcw.csv", sep = ""), + col_types = cols( + valid_after_date = col_date(format = ""), + nickname = col_character(), + have_guard_flag = col_logical(), + have_exit_flag = col_logical(), + measured_sum_avg = col_double())) %>% + filter(if (!is.null(start_p)) + valid_after_date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) + valid_after_date <= as.Date(end_p) else TRUE) %>% + group_by(valid_after_date, nickname) %>% + summarize(measured_sum_avg = sum(measured_sum_avg)) %>% + rename(date = valid_after_date, totalcw = measured_sum_avg) %>% + arrange(date, nickname) +} + +plot_totalcw <- function(start_p, end_p, path_p) { + prepare_totalcw(start_p, end_p) %>% + mutate(nickname = ifelse(is.na(nickname), "consensus", nickname)) %>% + mutate(nickname = factor(nickname, + levels = c("consensus", unique(nickname[nickname != "consensus"])))) %>% + ungroup() %>% + complete(date = full_seq(date, period = 1), nesting(nickname)) %>% + ggplot(aes(x = date, y = totalcw, colour = nickname)) + + geom_line(na.rm = TRUE) + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + scale_colour_hue(name = "") + + ggtitle("Total consensus weights across bandwidth authorities") + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +countrynames <- function(countries) { + sapply(countries, countryname) +} + +write_userstats <- function(start, end, node, path) { + end <- min(end, as.character(Sys.Date())) + c <- read.csv(paste("/srv/metrics.torproject.org/metrics/shared/stats/", + "clients.csv", sep = ""), stringsAsFactors = FALSE) + c <- c[c$date >= start & c$date <= end & c$country != '' & + c$transport == '' & c$version == '' & c$node == node, ] + u <- data.frame(country = c$country, users = c$clients, + stringsAsFactors = FALSE) + u <- u[!is.na(u$users), ] + u <- aggregate(list(users = u$users), by = list(country = u$country), + mean) + total <- sum(u$users) + u <- u[!(u$country %in% c("zy", "??", "a1", "a2", "o1", "ap", "eu")), ] + u <- u[order(u$users, decreasing = TRUE), ] + u <- u[1:10, ] + u <- data.frame( + cc = as.character(u$country), + country = sub('the ', '', countrynames(as.character(u$country))), + abs = round(u$users), + rel = sprintf("%.2f", round(100 * u$users / total, 2))) + write.csv(u, path, quote = FALSE, row.names = FALSE) +} + +write_userstats_relay <- function(start, end, path) { + write_userstats(start, end, 'relay', path) +} + +write_userstats_bridge <- function(start, end, path) { + write_userstats(start, end, 'bridge', path) +} + +write_userstats_censorship_events <- function(start, end, path) { + end <- min(end, as.character(Sys.Date())) + c <- read.csv(paste("/srv/metrics.torproject.org/metrics/shared/stats/", + "clients.csv", sep = ""), stringsAsFactors = FALSE) + c <- c[c$date >= start & c$date <= end & c$country != '' & + c$transport == '' & c$version == '' & c$node == 'relay', ] + r <- data.frame(date = c$date, country = c$country, + upturn = ifelse(!is.na(c$upper) & + c$clients > c$upper, 1, 0), + downturn = ifelse(!is.na(c$lower) & + c$clients < c$lower, 1, 0)) + r <- aggregate(r[, c("upturn", "downturn")], + by = list(country = r$country), sum) + r <- r[(r$country %in% names(countrylist)), ] + r <- r[order(r$downturn, r$upturn, decreasing = TRUE), ] + r <- r[1:10, ] + r <- data.frame(cc = r$country, + country = sub('the ', '', countrynames(as.character(r$country))), + downturns = r$downturn, + upturns = r$upturn) + write.csv(r, path, quote = FALSE, row.names = FALSE) +} diff --git a/src/main/R/rserver/tables.R b/src/main/R/rserver/tables.R deleted file mode 100644 index 28bd3d5..0000000 --- a/src/main/R/rserver/tables.R +++ /dev/null @@ -1,58 +0,0 @@ -countrynames <- function(countries) { - sapply(countries, countryname) -} - -write_userstats <- function(start, end, node, path) { - end <- min(end, as.character(Sys.Date())) - c <- read.csv(paste("/srv/metrics.torproject.org/metrics/shared/stats/", - "clients.csv", sep = ""), stringsAsFactors = FALSE) - c <- c[c$date >= start & c$date <= end & c$country != '' & - c$transport == '' & c$version == '' & c$node == node, ] - u <- data.frame(country = c$country, users = c$clients, - stringsAsFactors = FALSE) - u <- u[!is.na(u$users), ] - u <- aggregate(list(users = u$users), by = list(country = u$country), - mean) - total <- sum(u$users) - u <- u[!(u$country %in% c("zy", "??", "a1", "a2", "o1", "ap", "eu")), ] - u <- u[order(u$users, decreasing = TRUE), ] - u <- u[1:10, ] - u <- data.frame( - cc = as.character(u$country), - country = sub('the ', '', countrynames(as.character(u$country))), - abs = round(u$users), - rel = sprintf("%.2f", round(100 * u$users / total, 2))) - write.csv(u, path, quote = FALSE, row.names = FALSE) -} - -write_userstats_relay <- function(start, end, path) { - write_userstats(start, end, 'relay', path) -} - -write_userstats_bridge <- function(start, end, path) { - write_userstats(start, end, 'bridge', path) -} - -write_userstats_censorship_events <- function(start, end, path) { - end <- min(end, as.character(Sys.Date())) - c <- read.csv(paste("/srv/metrics.torproject.org/metrics/shared/stats/", - "clients.csv", sep = ""), stringsAsFactors = FALSE) - c <- c[c$date >= start & c$date <= end & c$country != '' & - c$transport == '' & c$version == '' & c$node == 'relay', ] - r <- data.frame(date = c$date, country = c$country, - upturn = ifelse(!is.na(c$upper) & - c$clients > c$upper, 1, 0), - downturn = ifelse(!is.na(c$lower) & - c$clients < c$lower, 1, 0)) - r <- aggregate(r[, c("upturn", "downturn")], - by = list(country = r$country), sum) - r <- r[(r$country %in% names(countrylist)), ] - r <- r[order(r$downturn, r$upturn, decreasing = TRUE), ] - r <- r[1:10, ] - r <- data.frame(cc = r$country, - country = sub('the ', '', countrynames(as.character(r$country))), - downturns = r$downturn, - upturns = r$upturn) - write.csv(r, path, quote = FALSE, row.names = FALSE) -} -

1 0

[metrics-web/master] Switch to readr's read_csv() everywhere.
by karsten＠torproject.org 11 Jan '19

11 Jan '19

commit a94a3844644041f7c1f6e0a4451e19ce12cae9e8 Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Thu Jan 10 22:32:28 2019 +0100 Switch to readr's read_csv() everywhere. --- src/main/R/rserver/graphs.R | 230 +++++++++++++++++++++++++++++++++----------- 1 file changed, 175 insertions(+), 55 deletions(-) diff --git a/src/main/R/rserver/graphs.R b/src/main/R/rserver/graphs.R index 82a51e7..205afbe 100644 --- a/src/main/R/rserver/graphs.R +++ b/src/main/R/rserver/graphs.R @@ -359,8 +359,11 @@ write_data <- function(FUN, ..., path_p) { options(readr.show_progress = FALSE) prepare_networksize <- function(start_p = NULL, end_p = NULL) { - read.csv(paste(stats_dir, "networksize.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + read_csv(file = paste(stats_dir, "networksize.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + relays = col_double(), + bridges = col_double())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) } @@ -416,8 +419,11 @@ plot_versions <- function(start_p, end_p, path_p) { } prepare_platforms <- function(start_p = NULL, end_p = NULL) { - read.csv(paste(stats_dir, "platforms.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + read_csv(file = paste(stats_dir, "platforms.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + platform = col_factor(levels = NULL), + relays = col_double())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% mutate(platform = tolower(platform)) %>% @@ -443,12 +449,19 @@ plot_platforms <- function(start_p, end_p, path_p) { } prepare_dirbytes <- function(start_p = NULL, end_p = NULL) { - read.csv(paste(stats_dir, "bandwidth.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + read_csv(file = paste(stats_dir, "bandwidth.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + isexit = col_logical(), + isguard = col_logical(), + bwread = col_skip(), + bwwrite = col_skip(), + dirread = col_double(), + dirwrite = col_double())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(isexit == "") %>% - filter(isguard == "") %>% + filter(is.na(isexit)) %>% + filter(is.na(isguard)) %>% mutate(dirread = dirread * 8 / 1e9, dirwrite = dirwrite * 8 / 1e9) %>% select(date, dirread, dirwrite) @@ -473,8 +486,11 @@ plot_dirbytes <- function(start_p, end_p, path_p) { } prepare_relayflags <- function(start_p = NULL, end_p = NULL, flag_p = NULL) { - read.csv(paste(stats_dir, "relayflags.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + read_csv(file = paste(stats_dir, "relayflags.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + flag = col_factor(levels = NULL), + relays = col_double())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(if (!is.null(flag_p)) flag %in% flag_p else TRUE) @@ -483,7 +499,7 @@ prepare_relayflags <- function(start_p = NULL, end_p = NULL, flag_p = NULL) { plot_relayflags <- function(start_p, end_p, flag_p, path_p) { prepare_relayflags(start_p, end_p, flag_p) %>% complete(date = full_seq(date, period = 1), flag = unique(flag)) %>% - ggplot(aes(x = date, y = relays, colour = as.factor(flag))) + + ggplot(aes(x = date, y = relays, colour = flag)) + geom_line() + scale_x_date(name = "", breaks = custom_breaks, labels = custom_labels, minor_breaks = custom_minor_breaks) + @@ -498,8 +514,18 @@ plot_relayflags <- function(start_p, end_p, flag_p, path_p) { prepare_torperf <- function(start_p = NULL, end_p = NULL, server_p = NULL, filesize_p = NULL) { - read.csv(paste(stats_dir, "torperf-1.1.csv", sep = ""), - colClasses = c("date" = "Date", "source" = "character")) %>% + read_csv(file = paste(stats_dir, "torperf-1.1.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + filesize = col_double(), + source = col_character(), + server = col_character(), + q1 = col_double(), + md = col_double(), + q3 = col_double(), + timeouts = col_skip(), + failures = col_skip(), + requests = col_skip())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(if (!is.null(server_p)) server == server_p else TRUE) %>% @@ -535,8 +561,18 @@ plot_torperf <- function(start_p, end_p, server_p, filesize_p, path_p) { prepare_torperf_failures <- function(start_p = NULL, end_p = NULL, server_p = NULL, filesize_p = NULL) { - read.csv(paste(stats_dir, "torperf-1.1.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + read_csv(file = paste(stats_dir, "torperf-1.1.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + filesize = col_double(), + source = col_character(), + server = col_character(), + q1 = col_skip(), + md = col_skip(), + q3 = col_skip(), + timeouts = col_double(), + failures = col_double(), + requests = col_double())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(if (!is.null(filesize_p)) @@ -573,8 +609,14 @@ plot_torperf_failures <- function(start_p, end_p, server_p, filesize_p, } prepare_onionperf_buildtimes <- function(start_p = NULL, end_p = NULL) { - read.csv(paste(stats_dir, "buildtimes.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + read_csv(file = paste(stats_dir, "buildtimes.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + source = col_character(), + position = col_double(), + q1 = col_double(), + md = col_double(), + q3 = col_double())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) } @@ -604,8 +646,14 @@ plot_onionperf_buildtimes <- function(start_p, end_p, path_p) { prepare_onionperf_latencies <- function(start_p = NULL, end_p = NULL, server_p = NULL) { - read.csv(paste(stats_dir, "latencies.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + read_csv(file = paste(stats_dir, "latencies.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + source = col_character(), + server = col_character(), + q1 = col_double(), + md = col_double(), + q3 = col_double())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(if (!is.null(server_p)) server == server_p else TRUE) @@ -631,8 +679,12 @@ plot_onionperf_latencies <- function(start_p, end_p, server_p, path_p) { } prepare_connbidirect <- function(start_p = NULL, end_p = NULL) { - read.csv(paste(stats_dir, "connbidirect2.csv", sep = ""), - colClasses = c("date" = "Date", "direction" = "factor")) %>% + read_csv(file = paste(stats_dir, "connbidirect2.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + direction = col_factor(), + quantile = col_double(), + fraction = col_double())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% mutate(quantile = paste("X", quantile, sep = ""), @@ -665,19 +717,30 @@ plot_connbidirect <- function(start_p, end_p, path_p) { } prepare_bandwidth_flags <- function(start_p = NULL, end_p = NULL) { - advbw <- read.csv(paste(stats_dir, "advbw.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + advbw <- read_csv(file = paste(stats_dir, "advbw.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + isexit = col_logical(), + isguard = col_logical(), + advbw = col_double())) %>% transmute(date, have_guard_flag = isguard, have_exit_flag = isexit, variable = "advbw", value = advbw * 8 / 1e9) - bwhist <- read.csv(paste(stats_dir, "bandwidth.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + bwhist <- read_csv(file = paste(stats_dir, "bandwidth.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + isexit = col_logical(), + isguard = col_logical(), + bwread = col_double(), + bwwrite = col_double(), + dirread = col_double(), + dirwrite = col_double())) %>% transmute(date, have_guard_flag = isguard, have_exit_flag = isexit, variable = "bwhist", value = (bwread + bwwrite) * 8 / 2e9) rbind(advbw, bwhist) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(have_exit_flag != "") %>% - filter(have_guard_flag != "") %>% + filter(!is.na(have_exit_flag)) %>% + filter(!is.na(have_guard_flag)) %>% spread(variable, value) } @@ -685,7 +748,8 @@ plot_bandwidth_flags <- function(start_p, end_p, path_p) { prepare_bandwidth_flags(start_p, end_p) %>% gather(variable, value, c(advbw, bwhist)) %>% unite(flags, have_guard_flag, have_exit_flag) %>% - mutate(flags = factor(flags, levels = c("f_t", "t_t", "t_f", "f_f"), + mutate(flags = factor(flags, + levels = c("FALSE_TRUE", "TRUE_TRUE", "TRUE_FALSE", "FALSE_FALSE"), labels = c("Exit only", "Guard and Exit", "Guard only", "Neither Guard nor Exit"))) %>% mutate(variable = ifelse(variable == "advbw", @@ -968,14 +1032,19 @@ plot_userstats_bridge_combined <- function(start_p, end_p, country_p, path_p) { } prepare_advbwdist_perc <- function(start_p = NULL, end_p = NULL, p_p = NULL) { - read.csv(paste(stats_dir, "advbwdist.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + read_csv(file = paste(stats_dir, "advbwdist.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + isexit = col_logical(), + relay = col_skip(), + percentile = col_integer(), + advbw = col_double())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(if (!is.null(p_p)) percentile %in% as.numeric(p_p) else percentile != "") %>% transmute(date, percentile = as.factor(percentile), - variable = ifelse(isexit == "t", "exits", "all"), + variable = ifelse(is.na(isexit), "all", "exits"), advbw = advbw * 8 / 1e9) %>% spread(variable, advbw) %>% rename(p = percentile) @@ -1000,14 +1069,19 @@ plot_advbwdist_perc <- function(start_p, end_p, p_p, path_p) { } prepare_advbwdist_relay <- function(start_p = NULL, end_p = NULL, n_p = NULL) { - read.csv(paste(stats_dir, "advbwdist.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + read_csv(file = paste(stats_dir, "advbwdist.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + isexit = col_logical(), + relay = col_integer(), + percentile = col_skip(), + advbw = col_double())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(if (!is.null(n_p)) relay %in% as.numeric(n_p) else relay != "") %>% transmute(date, relay = as.factor(relay), - variable = ifelse(isexit != "t", "all", "exits"), + variable = ifelse(is.na(isexit), "all", "exits"), advbw = advbw * 8 / 1e9) %>% spread(variable, advbw) %>% rename(n = relay) @@ -1032,8 +1106,15 @@ plot_advbwdist_relay <- function(start_p, end_p, n_p, path_p) { } prepare_hidserv_dir_onions_seen <- function(start_p = NULL, end_p = NULL) { - read.csv(paste(stats_dir, "hidserv.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + read_csv(file = paste(stats_dir, "hidserv.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + type = col_factor(), + wmean = col_skip(), + wmedian = col_skip(), + wiqm = col_double(), + frac = col_double(), + stats = col_skip())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(type == "dir-onions-seen") %>% @@ -1053,8 +1134,15 @@ plot_hidserv_dir_onions_seen <- function(start_p, end_p, path_p) { } prepare_hidserv_rend_relayed_cells <- function(start_p = NULL, end_p = NULL) { - read.csv(paste(stats_dir, "hidserv.csv", sep = ""), - colClasses = c("date" = "Date")) %>% + read_csv(file = paste(stats_dir, "hidserv.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + type = col_factor(), + wmean = col_skip(), + wmedian = col_skip(), + wiqm = col_double(), + frac = col_double(), + stats = col_skip())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(type == "rend-relayed-cells") %>% @@ -1257,8 +1345,17 @@ plot_webstats_tm <- function(start_p, end_p, path_p) { } prepare_relays_ipv6 <- function(start_p = NULL, end_p = NULL) { - read.csv(paste(stats_dir, "ipv6servers.csv", sep = ""), - colClasses = c("valid_after_date" = "Date")) %>% + read_csv(file = paste(stats_dir, "ipv6servers.csv", sep = ""), + col_types = cols( + valid_after_date = col_date(format = ""), + server = col_factor(), + guard_relay = col_skip(), + exit_relay = col_skip(), + announced_ipv6 = col_logical(), + exiting_ipv6_relay = col_logical(), + reachable_ipv6_relay = col_logical(), + server_count_sum_avg = col_double(), + advertised_bandwidth_bytes_sum_avg = col_skip())) %>% filter(if (!is.null(start_p)) valid_after_date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) @@ -1266,9 +1363,9 @@ prepare_relays_ipv6 <- function(start_p = NULL, end_p = NULL) { filter(server == "relay") %>% group_by(valid_after_date) %>% summarize(total = sum(server_count_sum_avg), - announced = sum(server_count_sum_avg[announced_ipv6 == "t"]), - reachable = sum(server_count_sum_avg[reachable_ipv6_relay == "t"]), - exiting = sum(server_count_sum_avg[exiting_ipv6_relay == "t"])) %>% + announced = sum(server_count_sum_avg[announced_ipv6]), + reachable = sum(server_count_sum_avg[reachable_ipv6_relay]), + exiting = sum(server_count_sum_avg[exiting_ipv6_relay])) %>% complete(valid_after_date = full_seq(valid_after_date, period = 1)) %>% gather(total, announced, reachable, exiting, key = "category", value = "count") %>% @@ -1295,8 +1392,17 @@ plot_relays_ipv6 <- function(start_p, end_p, path_p) { } prepare_bridges_ipv6 <- function(start_p = NULL, end_p = NULL) { - read.csv(paste(stats_dir, "ipv6servers.csv", sep = ""), - colClasses = c("valid_after_date" = "Date")) %>% + read_csv(file = paste(stats_dir, "ipv6servers.csv", sep = ""), + col_types = cols( + valid_after_date = col_date(format = ""), + server = col_factor(), + guard_relay = col_skip(), + exit_relay = col_skip(), + announced_ipv6 = col_logical(), + exiting_ipv6_relay = col_skip(), + reachable_ipv6_relay = col_skip(), + server_count_sum_avg = col_double(), + advertised_bandwidth_bytes_sum_avg = col_skip())) %>% filter(if (!is.null(start_p)) valid_after_date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) @@ -1304,7 +1410,7 @@ prepare_bridges_ipv6 <- function(start_p = NULL, end_p = NULL) { filter(server == "bridge") %>% group_by(valid_after_date) %>% summarize(total = sum(server_count_sum_avg), - announced = sum(server_count_sum_avg[announced_ipv6 == "t"])) %>% + announced = sum(server_count_sum_avg[announced_ipv6])) %>% complete(valid_after_date = full_seq(valid_after_date, period = 1)) %>% rename(date = valid_after_date) } @@ -1327,8 +1433,17 @@ plot_bridges_ipv6 <- function(start_p, end_p, path_p) { } prepare_advbw_ipv6 <- function(start_p = NULL, end_p = NULL) { - read.csv(paste(stats_dir, "ipv6servers.csv", sep = ""), - colClasses = c("valid_after_date" = "Date")) %>% + read_csv(file = paste(stats_dir, "ipv6servers.csv", sep = ""), + col_types = cols( + valid_after_date = col_date(format = ""), + server = col_factor(), + guard_relay = col_logical(), + exit_relay = col_logical(), + announced_ipv6 = col_logical(), + exiting_ipv6_relay = col_logical(), + reachable_ipv6_relay = col_logical(), + server_count_sum_avg = col_skip(), + advertised_bandwidth_bytes_sum_avg = col_double())) %>% filter(if (!is.null(start_p)) valid_after_date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) @@ -1338,14 +1453,14 @@ prepare_advbw_ipv6 <- function(start_p = NULL, end_p = NULL) { advertised_bandwidth_bytes_sum_avg * 8 / 1e9) %>% group_by(valid_after_date) %>% summarize(total = sum(advertised_bandwidth_bytes_sum_avg), - total_guard = sum(advertised_bandwidth_bytes_sum_avg[guard_relay != "f"]), - total_exit = sum(advertised_bandwidth_bytes_sum_avg[exit_relay != "f"]), + total_guard = sum(advertised_bandwidth_bytes_sum_avg[guard_relay]), + total_exit = sum(advertised_bandwidth_bytes_sum_avg[exit_relay]), reachable_guard = sum(advertised_bandwidth_bytes_sum_avg[ - reachable_ipv6_relay != "f" & guard_relay != "f"]), + reachable_ipv6_relay & guard_relay]), reachable_exit = sum(advertised_bandwidth_bytes_sum_avg[ - reachable_ipv6_relay != "f" & exit_relay != "f"]), + reachable_ipv6_relay & exit_relay]), exiting = sum(advertised_bandwidth_bytes_sum_avg[ - exiting_ipv6_relay != "f"])) %>% + exiting_ipv6_relay])) %>% complete(valid_after_date = full_seq(valid_after_date, period = 1)) %>% rename(date = valid_after_date) } @@ -1372,8 +1487,13 @@ plot_advbw_ipv6 <- function(start_p, end_p, path_p) { } prepare_totalcw <- function(start_p = NULL, end_p = NULL) { - read.csv(paste(stats_dir, "totalcw.csv", sep = ""), - colClasses = c("valid_after_date" = "Date", "nickname" = "character")) %>% + read_csv(file = paste(stats_dir, "totalcw.csv", sep = ""), + col_types = cols( + valid_after_date = col_date(format = ""), + nickname = col_character(), + have_guard_flag = col_logical(), + have_exit_flag = col_logical(), + measured_sum_avg = col_double())) %>% filter(if (!is.null(start_p)) valid_after_date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p))

1 0

[metrics-web/master] Add levels = NULL to col_factor().
by karsten＠torproject.org 11 Jan '19

11 Jan '19

commit a9fac06cc120a13dd35ba77983dc2d54aacc75ed Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Fri Jan 11 09:30:53 2019 +0100 Add levels = NULL to col_factor(). This is not required for readr_1.3.0 which runs locally here but required for readr_1.1.1 which runs on the server. --- src/main/R/rserver/graphs.R | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/src/main/R/rserver/graphs.R b/src/main/R/rserver/graphs.R index 205afbe..18a9d3e 100644 --- a/src/main/R/rserver/graphs.R +++ b/src/main/R/rserver/graphs.R @@ -682,7 +682,7 @@ prepare_connbidirect <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "connbidirect2.csv", sep = ""), col_types = cols( date = col_date(format = ""), - direction = col_factor(), + direction = col_factor(levels = NULL), quantile = col_double(), fraction = col_double())) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% @@ -1109,7 +1109,7 @@ prepare_hidserv_dir_onions_seen <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "hidserv.csv", sep = ""), col_types = cols( date = col_date(format = ""), - type = col_factor(), + type = col_factor(levels = NULL), wmean = col_skip(), wmedian = col_skip(), wiqm = col_double(), @@ -1137,7 +1137,7 @@ prepare_hidserv_rend_relayed_cells <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "hidserv.csv", sep = ""), col_types = cols( date = col_date(format = ""), - type = col_factor(), + type = col_factor(levels = NULL), wmean = col_skip(), wmedian = col_skip(), wiqm = col_double(), @@ -1167,7 +1167,7 @@ prepare_webstats_tb <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), col_types = cols( log_date = col_date(format = ""), - request_type = col_factor(), + request_type = col_factor(levels = NULL), platform = col_skip(), channel = col_skip(), locale = col_skip(), @@ -1210,8 +1210,8 @@ prepare_webstats_tb_platform <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), col_types = cols( log_date = col_date(format = ""), - request_type = col_factor(), - platform = col_factor(), + request_type = col_factor(levels = NULL), + platform = col_factor(levels = NULL), channel = col_skip(), locale = col_skip(), incremental = col_skip(), @@ -1253,10 +1253,10 @@ prepare_webstats_tb_locale <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), col_types = cols( log_date = col_date(format = ""), - request_type = col_factor(), + request_type = col_factor(levels = NULL), platform = col_skip(), channel = col_skip(), - locale = col_factor(), + locale = col_factor(levels = NULL), incremental = col_skip(), count = col_double())) %>% filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% @@ -1308,7 +1308,7 @@ prepare_webstats_tm <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), col_types = cols( log_date = col_date(format = ""), - request_type = col_factor(), + request_type = col_factor(levels = NULL), platform = col_skip(), channel = col_skip(), locale = col_skip(), @@ -1348,7 +1348,7 @@ prepare_relays_ipv6 <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "ipv6servers.csv", sep = ""), col_types = cols( valid_after_date = col_date(format = ""), - server = col_factor(), + server = col_factor(levels = NULL), guard_relay = col_skip(), exit_relay = col_skip(), announced_ipv6 = col_logical(), @@ -1395,7 +1395,7 @@ prepare_bridges_ipv6 <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "ipv6servers.csv", sep = ""), col_types = cols( valid_after_date = col_date(format = ""), - server = col_factor(), + server = col_factor(levels = NULL), guard_relay = col_skip(), exit_relay = col_skip(), announced_ipv6 = col_logical(), @@ -1436,7 +1436,7 @@ prepare_advbw_ipv6 <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "ipv6servers.csv", sep = ""), col_types = cols( valid_after_date = col_date(format = ""), - server = col_factor(), + server = col_factor(levels = NULL), guard_relay = col_logical(), exit_relay = col_logical(), announced_ipv6 = col_logical(),

1 0

[metrics-web/master] Split up huge plot_userstats function.
by karsten＠torproject.org 11 Jan '19

11 Jan '19

commit f55e63d986ed9c1054ce19ff0d4a19b1c0bce26d Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Thu Jan 10 09:54:39 2019 +0100 Split up huge plot_userstats function. The mere size of this function made it hard to impossible to refactor things to using more recent R packages dplyr and tidyr. Now there are four plot_userstats_* functions with accompanying prepare_userstats_* that make the corresponding write_userstats_* functions really small. --- src/main/R/rserver/graphs.R | 269 +++++++++++++++++++------------------------- 1 file changed, 115 insertions(+), 154 deletions(-) diff --git a/src/main/R/rserver/graphs.R b/src/main/R/rserver/graphs.R index d3ea90a..ba8862c 100644 --- a/src/main/R/rserver/graphs.R +++ b/src/main/R/rserver/graphs.R @@ -751,9 +751,9 @@ write_bandwidth_flags <- function(start_p = NULL, end_p = NULL, path_p) { write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") } -plot_userstats <- function(start_p, end_p, node_p, variable_p, value_p, - events_p, path_p) { - c <- read_csv(file = paste(stats_dir, "clients.csv", sep = ""), +prepare_userstats_relay_country <- function(start_p, end_p, country_p, + events_p) { + read_csv(file = paste(stats_dir, "clients.csv", sep = ""), col_types = cols( date = col_date(format = ""), node = col_character(), @@ -763,97 +763,26 @@ plot_userstats <- function(start_p, end_p, node_p, variable_p, value_p, lower = col_double(), upper = col_double(), clients = col_double(), - frac = col_skip()), + frac = col_double()), na = character()) %>% - filter(node == node_p) - u <- c[c$date >= start_p & c$date <= end_p, c("date", "country", "transport", - "version", "lower", "upper", "clients")] - u <- rbind(u, data.frame(date = start_p, - country = ifelse(variable_p == "country" & value_p != "all", value_p, ""), - transport = ifelse(variable_p == "transport", value_p, ""), - version = ifelse(variable_p == "version", value_p, ""), - lower = 0, upper = 0, clients = 0)) - if (node_p == "relay") { - if (value_p != "all") { - u <- u[u$country == value_p, ] - title <- paste("Directly connecting users from", countryname(value_p)) - } else { - u <- u[u$country == "", ] - title <- "Directly connecting users" - } - u <- aggregate(list(lower = u$lower, upper = u$upper, - users = u$clients), - by = list(date = as.Date(u$date, "%Y-%m-%d"), - value = u$country), - FUN = sum) - } else if (variable_p == "transport") { - if ("!<OR>" %in% value_p) { - n <- u[u$transport != "" & u$transport != "<OR>", ] - n <- aggregate(list(lower = n$lower, upper = n$upper, - clients = n$clients), - by = list(date = n$date), - FUN = sum) - u <- rbind(u, data.frame(date = n$date, - country = "", transport = "!<OR>", - version = "", lower = n$lower, - upper = n$upper, clients = n$clients)) - } - if (length(value_p) > 1) { - u <- u[u$transport %in% value_p, ] - u <- aggregate(list(lower = u$lower, upper = u$upper, - users = u$clients), - by = list(date = as.Date(u$date, "%Y-%m-%d"), - value = u$transport), - FUN = sum) - title <- paste("Bridge users by transport") - } else { - u <- u[u$transport == value_p, ] - u <- aggregate(list(lower = u$lower, upper = u$upper, - users = u$clients), - by = list(date = as.Date(u$date, "%Y-%m-%d"), - value = u$transport), - FUN = sum) - title <- paste("Bridge users using", - ifelse(value_p == "<??>", "unknown pluggable transport(s)", - ifelse(value_p == "<OR>", "default OR protocol", - ifelse(value_p == "!<OR>", "any pluggable transport", - ifelse(value_p == "fte", "FTE", - ifelse(value_p == "websocket", "Flash proxy/websocket", - paste("transport", value_p))))))) - } - } else if (variable_p == "version") { - u <- u[u$version == value_p, ] - title <- paste("Bridge users using IP", value_p, sep = "") - u <- aggregate(list(lower = u$lower, upper = u$upper, - users = u$clients), - by = list(date = as.Date(u$date, "%Y-%m-%d"), - value = u$version), - FUN = sum) - } else { - if (value_p != "all") { - u <- u[u$country == value_p, ] - title <- paste("Bridge users from", countryname(value_p)) - } else { - u <- u[u$country == "" & u$transport == "" & u$version == "", ] - title <- "Bridge users" - } - u <- aggregate(list(lower = u$lower, upper = u$upper, - users = u$clients), - by = list(date = as.Date(u$date, "%Y-%m-%d"), - value = u$country), - FUN = sum) - } - u <- merge(x = u, all.y = TRUE, y = data.frame(expand.grid( - date = seq(from = as.Date(start_p, "%Y-%m-%d"), - to = as.Date(end_p, "%Y-%m-%d"), by = "1 day"), - value = ifelse(value_p == "all", "", value_p)))) - if (length(value_p) > 1) { - plot <- ggplot(u, aes(x = date, y = users, colour = value)) - } else { - plot <- ggplot(u, aes(x = date, y = users)) - } + filter(node == "relay") %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(if (!is.null(country_p)) + country == ifelse(country_p == "all", "", country_p) else TRUE) %>% + filter(transport == "") %>% + filter(version == "") %>% + select(date, country, clients, lower, upper, frac) %>% + rename(users = clients) +} + +plot_userstats_relay_country <- function(start_p, end_p, country_p, events_p, + path_p) { + u <- prepare_userstats_relay_country(start_p, end_p, country_p, events_p) %>% + complete(date = full_seq(date, period = 1)) + plot <- ggplot(u, aes(x = date, y = users)) if (length(na.omit(u$users)) > 0 & events_p != "off" & - variable_p == "country" & length(value_p) == 1 && value_p != "all") { + country_p != "all") { upturns <- u[u$users > u$upper, c("date", "users")] downturns <- u[u$users < u$lower, c("date", "users")] if (events_p == "on") { @@ -875,69 +804,20 @@ plot_userstats <- function(start_p, end_p, node_p, variable_p, value_p, scale_x_date(name = "", breaks = custom_breaks, labels = custom_labels, minor_breaks = custom_minor_breaks) + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + - ggtitle(title) + + ggtitle(paste("Directly connecting users", + ifelse(country_p == "all", "", + paste(" from", countryname(country_p))), sep = "")) + labs(caption = copyright_notice) - if (length(value_p) > 1) { - plot <- plot + - scale_colour_hue(name = "", breaks = value_p, - labels = ifelse(value_p == "<??>", "Unknown PT", - ifelse(value_p == "<OR>", "Default OR protocol", - ifelse(value_p == "!<OR>", "Any PT", - ifelse(value_p == "fte", "FTE", - ifelse(value_p == "websocket", "Flash proxy/websocket", - value_p)))))) - } ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -plot_userstats_relay_country <- function(start_p, end_p, country_p, events_p, - path_p) { - plot_userstats(start_p, end_p, "relay", "country", country_p, events_p, - path_p) -} - -plot_userstats_bridge_country <- function(start_p, end_p, country_p, path_p) { - plot_userstats(start_p, end_p, "bridge", "country", country_p, "off", path_p) -} - -plot_userstats_bridge_transport <- function(start_p, end_p, transport_p, - path_p) { - plot_userstats(start_p, end_p, "bridge", "transport", transport_p, "off", - path_p) -} - -plot_userstats_bridge_version <- function(start_p, end_p, version_p, path_p) { - plot_userstats(start_p, end_p, "bridge", "version", version_p, "off", path_p) -} - write_userstats_relay_country <- function(start_p = NULL, end_p = NULL, country_p = NULL, events_p = NULL, path_p) { - read_csv(file = paste(stats_dir, "clients.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - node = col_character(), - country = col_character(), - transport = col_character(), - version = col_character(), - lower = col_double(), - upper = col_double(), - clients = col_double(), - frac = col_double()), - na = character()) %>% - filter(node == "relay") %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(if (!is.null(country_p)) - country == ifelse(country_p == "all", "", country_p) else TRUE) %>% - filter(transport == "") %>% - filter(version == "") %>% - select(date, country, clients, lower, upper, frac) %>% - rename(users = clients) %>% + prepare_userstats_relay_country(start_p, end_p, country_p, events_p) %>% write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") } -write_userstats_bridge_country <- function(start_p = NULL, end_p = NULL, - country_p = NULL, path_p) { +prepare_userstats_bridge_country <- function(start_p, end_p, country_p) { read_csv(file = paste(stats_dir, "clients.csv", sep = ""), col_types = cols( date = col_date(format = ""), @@ -958,12 +838,32 @@ write_userstats_bridge_country <- function(start_p = NULL, end_p = NULL, filter(transport == "") %>% filter(version == "") %>% select(date, country, clients, frac) %>% - rename(users = clients) %>% + rename(users = clients) +} + +plot_userstats_bridge_country <- function(start_p, end_p, country_p, path_p) { + prepare_userstats_bridge_country(start_p, end_p, country_p) %>% + complete(date = full_seq(date, period = 1)) %>% + ggplot(aes(x = date, y = users)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + ggtitle(paste("Bridge users", + ifelse(country_p == "all", "", + paste(" from", countryname(country_p))), sep = "")) + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +write_userstats_bridge_country <- function(start_p = NULL, end_p = NULL, + country_p = NULL, path_p) { + prepare_userstats_bridge_country(start_p, end_p, country_p) %>% write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") } -write_userstats_bridge_transport <- function(start_p = NULL, end_p = NULL, - transport_p = NULL, path_p) { +prepare_userstats_bridge_transport <- function(start_p = NULL, end_p = NULL, + transport_p = NULL) { u <- read_csv(file = paste(stats_dir, "clients.csv", sep = ""), col_types = cols( date = col_date(format = ""), @@ -992,15 +892,58 @@ write_userstats_bridge_transport <- function(start_p = NULL, end_p = NULL, } u %>% filter(if (!is.null(transport_p)) transport %in% transport_p else TRUE) %>% - group_by(date, transport) %>% select(date, transport, clients, frac) %>% rename(users = clients) %>% - arrange(date, transport) %>% + arrange(date, transport) +} + +plot_userstats_bridge_transport <- function(start_p, end_p, transport_p, + path_p) { + if (length(transport_p) > 1) { + title <- paste("Bridge users by transport") + } else { + title <- paste("Bridge users using", + ifelse(transport_p == "<??>", "unknown pluggable transport(s)", + ifelse(transport_p == "<OR>", "default OR protocol", + ifelse(transport_p == "!<OR>", "any pluggable transport", + ifelse(transport_p == "fte", "FTE", + ifelse(transport_p == "websocket", "Flash proxy/websocket", + paste("transport", transport_p))))))) + } + u <- prepare_userstats_bridge_transport(start_p, end_p, transport_p) %>% + complete(date = full_seq(date, period = 1), nesting(transport)) + if (length(transport_p) > 1) { + plot <- ggplot(u, aes(x = date, y = users, colour = transport)) + } else { + plot <- ggplot(u, aes(x = date, y = users)) + } + plot <- plot + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + ggtitle(title) + + labs(caption = copyright_notice) + if (length(transport_p) > 1) { + plot <- plot + + scale_colour_hue(name = "", breaks = transport_p, + labels = ifelse(transport_p == "<??>", "Unknown PT", + ifelse(transport_p == "<OR>", "Default OR protocol", + ifelse(transport_p == "!<OR>", "Any PT", + ifelse(transport_p == "fte", "FTE", + ifelse(transport_p == "websocket", "Flash proxy/websocket", + transport_p)))))) + } + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +write_userstats_bridge_transport <- function(start_p = NULL, end_p = NULL, + transport_p = NULL, path_p) { + prepare_userstats_bridge_transport(start_p, end_p, transport_p) %>% write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") } -write_userstats_bridge_version <- function(start_p = NULL, end_p = NULL, - version_p = NULL, path_p) { +prepare_userstats_bridge_version <- function(start_p, end_p, version_p) { read_csv(file = paste(stats_dir, "clients.csv", sep = ""), col_types = cols( date = col_date(format = ""), @@ -1019,7 +962,25 @@ write_userstats_bridge_version <- function(start_p = NULL, end_p = NULL, filter(is.na(transport)) %>% filter(if (!is.null(version_p)) version == version_p else TRUE) %>% select(date, version, clients, frac) %>% - rename(users = clients) %>% + rename(users = clients) +} + +plot_userstats_bridge_version <- function(start_p, end_p, version_p, path_p) { + prepare_userstats_bridge_version(start_p, end_p, version_p) %>% + complete(date = full_seq(date, period = 1)) %>% + ggplot(aes(x = date, y = users)) + + geom_line() + + scale_x_date(name = "", breaks = custom_breaks, + labels = custom_labels, minor_breaks = custom_minor_breaks) + + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + + ggtitle(paste("Bridge users using IP", version_p, sep = "")) + + labs(caption = copyright_notice) + ggsave(filename = path_p, width = 8, height = 5, dpi = 150) +} + +write_userstats_bridge_version <- function(start_p = NULL, end_p = NULL, + version_p = NULL, path_p) { + prepare_userstats_bridge_version(start_p, end_p, version_p) %>% write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") }

1 0

[metrics-web/master] Make write_* functions obsolete.
by karsten＠torproject.org 11 Jan '19

11 Jan '19

commit 0d2f1e2afd5f4b9e5c533d256586bb03d7466d5f Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Thu Jan 10 15:39:04 2019 +0100 Make write_* functions obsolete. In most cases these functions would call their prepare_* equivalents, possibly tweak the result, and write it to a .csv file. This patch moves all those tweaks to the prepare_* functions, possibly reverts them in the plot_* functions, and makes the write_* functions obsolete. The result is not only less code. We're also going to find bugs in written .csv files sooner, because the same code is now run for writing graph files, and the latter happens much more often. --- src/main/R/rserver/graphs.R | 414 +++++++-------------- .../torproject/metrics/web/RObjectGenerator.java | 2 +- 2 files changed, 140 insertions(+), 276 deletions(-) diff --git a/src/main/R/rserver/graphs.R b/src/main/R/rserver/graphs.R index 27f399d..82a51e7 100644 --- a/src/main/R/rserver/graphs.R +++ b/src/main/R/rserver/graphs.R @@ -348,10 +348,17 @@ robust_call <- function(wrappee, filename) { }) } +# Write the result of the given FUN, typically a prepare_ function, as .csv file +# to the given path_p. +write_data <- function(FUN, ..., path_p) { + FUN(...) %>% + write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") +} + # Disable readr's automatic progress bar. options(readr.show_progress = FALSE) -prepare_networksize <- function(start_p, end_p) { +prepare_networksize <- function(start_p = NULL, end_p = NULL) { read.csv(paste(stats_dir, "networksize.csv", sep = ""), colClasses = c("date" = "Date")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% @@ -375,12 +382,7 @@ plot_networksize <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_networksize <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_networksize(start_p, end_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_versions <- function(start_p, end_p) { +prepare_versions <- function(start_p = NULL, end_p = NULL) { read_csv(paste(stats_dir, "versions.csv", sep = ""), col_types = cols( date = col_date(format = ""), @@ -413,42 +415,34 @@ plot_versions <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_versions <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_versions(start_p, end_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_platforms <- function(start_p, end_p) { +prepare_platforms <- function(start_p = NULL, end_p = NULL) { read.csv(paste(stats_dir, "platforms.csv", sep = ""), colClasses = c("date" = "Date")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + mutate(platform = tolower(platform)) %>% + spread(platform, relays) } plot_platforms <- function(start_p, end_p, path_p) { prepare_platforms(start_p, end_p) %>% + gather(platform, relays, -date) %>% ggplot(aes(x = date, y = relays, colour = platform)) + geom_line() + scale_x_date(name = "", breaks = custom_breaks, labels = custom_labels, minor_breaks = custom_minor_breaks) + scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + scale_colour_manual(name = "Platform", - breaks = c("Linux", "macOS", "BSD", "Windows", "Other"), - values = c("Linux" = "#56B4E9", "macOS" = "#333333", "BSD" = "#E69F00", - "Windows" = "#0072B2", "Other" = "#009E73")) + + breaks = c("linux", "macos", "bsd", "windows", "other"), + labels = c("Linux", "macOS", "BSD", "Windows", "Other"), + values = c("linux" = "#56B4E9", "macos" = "#333333", "bsd" = "#E69F00", + "windows" = "#0072B2", "other" = "#009E73")) + ggtitle("Relay platforms") + labs(caption = copyright_notice) ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_platforms <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_platforms(start_p, end_p) %>% - mutate(platform = tolower(platform)) %>% - spread(platform, relays) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_dirbytes <- function(start_p, end_p, path_p) { +prepare_dirbytes <- function(start_p = NULL, end_p = NULL) { read.csv(paste(stats_dir, "bandwidth.csv", sep = ""), colClasses = c("date" = "Date")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% @@ -478,12 +472,7 @@ plot_dirbytes <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_dirbytes <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_dirbytes(start_p, end_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_relayflags <- function(start_p, end_p, flag_p) { +prepare_relayflags <- function(start_p = NULL, end_p = NULL, flag_p = NULL) { read.csv(paste(stats_dir, "relayflags.csv", sep = ""), colClasses = c("date" = "Date")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% @@ -507,13 +496,8 @@ plot_relayflags <- function(start_p, end_p, flag_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_relayflags <- function(start_p = NULL, end_p = NULL, flag_p = NULL, - path_p) { - prepare_relayflags(start_p, end_p, flag_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_torperf <- function(start_p, end_p, server_p, filesize_p, path_p) { +prepare_torperf <- function(start_p = NULL, end_p = NULL, server_p = NULL, + filesize_p = NULL) { read.csv(paste(stats_dir, "torperf-1.1.csv", sep = ""), colClasses = c("date" = "Date", "source" = "character")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% @@ -528,7 +512,7 @@ prepare_torperf <- function(start_p, end_p, server_p, filesize_p, path_p) { } plot_torperf <- function(start_p, end_p, server_p, filesize_p, path_p) { - prepare_torperf(start_p, end_p, server_p, filesize_p, path_p) %>% + prepare_torperf(start_p, end_p, server_p, filesize_p) %>% filter(source != "") %>% complete(date = full_seq(date, period = 1), nesting(source)) %>% ggplot(aes(x = date, y = md, ymin = q1, ymax = q3, fill = source)) + @@ -549,13 +533,8 @@ plot_torperf <- function(start_p, end_p, server_p, filesize_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_torperf <- function(start_p = NULL, end_p = NULL, server_p = NULL, - filesize_p = NULL, path_p) { - prepare_torperf(start_p, end_p, server_p, filesize_p, path_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_torperf_failures <- function(start_p, end_p, server_p, filesize_p) { +prepare_torperf_failures <- function(start_p = NULL, end_p = NULL, + server_p = NULL, filesize_p = NULL) { read.csv(paste(stats_dir, "torperf-1.1.csv", sep = ""), colClasses = c("date" = "Date")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% @@ -593,24 +572,13 @@ plot_torperf_failures <- function(start_p, end_p, server_p, filesize_p, ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_torperf_failures <- function(start_p = NULL, end_p = NULL, - server_p = NULL, filesize_p = NULL, path_p) { - prepare_torperf_failures(start_p, end_p, server_p, filesize_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_onionperf_buildtimes <- function(start_p, end_p) { +prepare_onionperf_buildtimes <- function(start_p = NULL, end_p = NULL) { read.csv(paste(stats_dir, "buildtimes.csv", sep = ""), colClasses = c("date" = "Date")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) } -write_onionperf_buildtimes <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_onionperf_buildtimes(start_p, end_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - plot_onionperf_buildtimes <- function(start_p, end_p, path_p) { prepare_onionperf_buildtimes(start_p, end_p) %>% filter(source != "") %>% @@ -634,20 +602,15 @@ plot_onionperf_buildtimes <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -prepare_onionperf_latencies <- function(start_p, end_p, server_p) { - read.csv(paste(stats_dir, "latencies.csv", sep = ""), +prepare_onionperf_latencies <- function(start_p = NULL, end_p = NULL, + server_p = NULL) { + read.csv(paste(stats_dir, "latencies.csv", sep = ""), colClasses = c("date" = "Date")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(if (!is.null(server_p)) server == server_p else TRUE) } -write_onionperf_latencies <- function(start_p = NULL, end_p = NULL, - server_p = NULL, path_p) { - prepare_onionperf_latencies(start_p, end_p, server_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - plot_onionperf_latencies <- function(start_p, end_p, server_p, path_p) { prepare_onionperf_latencies(start_p, end_p, server_p) %>% filter(source != "") %>% @@ -667,21 +630,22 @@ plot_onionperf_latencies <- function(start_p, end_p, server_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -prepare_connbidirect <- function(start_p, end_p) { +prepare_connbidirect <- function(start_p = NULL, end_p = NULL) { read.csv(paste(stats_dir, "connbidirect2.csv", sep = ""), colClasses = c("date" = "Date", "direction" = "factor")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% mutate(quantile = paste("X", quantile, sep = ""), fraction = fraction / 100) %>% - spread(quantile, fraction) + spread(quantile, fraction) %>% + rename(q1 = X0.25, md = X0.5, q3 = X0.75) } plot_connbidirect <- function(start_p, end_p, path_p) { prepare_connbidirect(start_p, end_p) %>% - ggplot(aes(x = date, y = X0.5, colour = direction)) + + ggplot(aes(x = date, y = md, colour = direction)) + geom_line(size = 0.75) + - geom_ribbon(aes(x = date, ymin = X0.25, ymax = X0.75, + geom_ribbon(aes(x = date, ymin = q1, ymax = q3, fill = direction), alpha = 0.5, show.legend = FALSE) + scale_x_date(name = "", breaks = custom_breaks, labels = custom_labels, minor_breaks = custom_minor_breaks) + @@ -700,13 +664,7 @@ plot_connbidirect <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_connbidirect <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_connbidirect(start_p, end_p) %>% - rename(q1 = X0.25, md = X0.5, q3 = X0.75) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_bandwidth_flags <- function(start_p, end_p) { +prepare_bandwidth_flags <- function(start_p = NULL, end_p = NULL) { advbw <- read.csv(paste(stats_dir, "advbw.csv", sep = ""), colClasses = c("date" = "Date")) %>% transmute(date, have_guard_flag = isguard, have_exit_flag = isexit, @@ -719,11 +677,13 @@ prepare_bandwidth_flags <- function(start_p, end_p) { filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% filter(have_exit_flag != "") %>% - filter(have_guard_flag != "") + filter(have_guard_flag != "") %>% + spread(variable, value) } plot_bandwidth_flags <- function(start_p, end_p, path_p) { prepare_bandwidth_flags(start_p, end_p) %>% + gather(variable, value, c(advbw, bwhist)) %>% unite(flags, have_guard_flag, have_exit_flag) %>% mutate(flags = factor(flags, levels = c("f_t", "t_t", "t_f", "f_f"), labels = c("Exit only", "Guard and Exit", "Guard only", @@ -745,14 +705,8 @@ plot_bandwidth_flags <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_bandwidth_flags <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_bandwidth_flags(start_p, end_p) %>% - spread(variable, value) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_userstats_relay_country <- function(start_p, end_p, country_p, - events_p) { +prepare_userstats_relay_country <- function(start_p = NULL, end_p = NULL, + country_p = NULL, events_p = NULL) { read_csv(file = paste(stats_dir, "clients.csv", sep = ""), col_types = cols( date = col_date(format = ""), @@ -811,13 +765,8 @@ plot_userstats_relay_country <- function(start_p, end_p, country_p, events_p, ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_userstats_relay_country <- function(start_p = NULL, end_p = NULL, - country_p = NULL, events_p = NULL, path_p) { - prepare_userstats_relay_country(start_p, end_p, country_p, events_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_userstats_bridge_country <- function(start_p, end_p, country_p) { +prepare_userstats_bridge_country <- function(start_p = NULL, end_p = NULL, + country_p = NULL) { read_csv(file = paste(stats_dir, "clients.csv", sep = ""), col_types = cols( date = col_date(format = ""), @@ -856,12 +805,6 @@ plot_userstats_bridge_country <- function(start_p, end_p, country_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_userstats_bridge_country <- function(start_p = NULL, end_p = NULL, - country_p = NULL, path_p) { - prepare_userstats_bridge_country(start_p, end_p, country_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - prepare_userstats_bridge_transport <- function(start_p = NULL, end_p = NULL, transport_p = NULL) { u <- read_csv(file = paste(stats_dir, "clients.csv", sep = ""), @@ -937,13 +880,8 @@ plot_userstats_bridge_transport <- function(start_p, end_p, transport_p, ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_userstats_bridge_transport <- function(start_p = NULL, end_p = NULL, - transport_p = NULL, path_p) { - prepare_userstats_bridge_transport(start_p, end_p, transport_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_userstats_bridge_version <- function(start_p, end_p, version_p) { +prepare_userstats_bridge_version <- function(start_p = NULL, end_p = NULL, + version_p = NULL) { read_csv(file = paste(stats_dir, "clients.csv", sep = ""), col_types = cols( date = col_date(format = ""), @@ -978,27 +916,28 @@ plot_userstats_bridge_version <- function(start_p, end_p, version_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_userstats_bridge_version <- function(start_p = NULL, end_p = NULL, - version_p = NULL, path_p) { - prepare_userstats_bridge_version(start_p, end_p, version_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_userstats_bridge_combined <- function(start_p, end_p, country_p) { - read_csv(file = paste(stats_dir, "userstats-combined.csv", sep = ""), - col_types = cols( - date = col_date(format = ""), - node = col_skip(), - country = col_character(), - transport = col_character(), - version = col_skip(), - frac = col_double(), - low = col_double(), - high = col_double()), - na = character()) %>% - filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% - filter(if (!is.null(country_p)) country == country_p else TRUE) +prepare_userstats_bridge_combined <- function(start_p = NULL, end_p = NULL, + country_p = NULL) { + if (!is.null(country_p) && country_p == "all") { + prepare_userstats_bridge_country(start_p, end_p, country_p) + } else { + read_csv(file = paste(stats_dir, "userstats-combined.csv", sep = ""), + col_types = cols( + date = col_date(format = ""), + node = col_skip(), + country = col_character(), + transport = col_character(), + version = col_skip(), + frac = col_double(), + low = col_double(), + high = col_double()), + na = character()) %>% + filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) date <= as.Date(end_p) else TRUE) %>% + filter(if (!is.null(country_p)) country == country_p else TRUE) %>% + select(date, country, transport, low, high, frac) %>% + arrange(date, country, transport) + } } plot_userstats_bridge_combined <- function(start_p, end_p, country_p, path_p) { @@ -1028,19 +967,7 @@ plot_userstats_bridge_combined <- function(start_p, end_p, country_p, path_p) { } } -write_userstats_bridge_combined <- function(start_p = NULL, end_p = NULL, - country_p = NULL, path_p) { - if (!is.null(country_p) && country_p == "all") { - write_userstats_bridge_country(start_p, end_p, country_p, path_p) - } else { - prepare_userstats_bridge_combined(start_p, end_p, country_p) %>% - select(date, country, transport, low, high, frac) %>% - arrange(date, country, transport) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") - } -} - -prepare_advbwdist_perc <- function(start_p, end_p, p_p) { +prepare_advbwdist_perc <- function(start_p = NULL, end_p = NULL, p_p = NULL) { read.csv(paste(stats_dir, "advbwdist.csv", sep = ""), colClasses = c("date" = "Date")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% @@ -1048,15 +975,18 @@ prepare_advbwdist_perc <- function(start_p, end_p, p_p) { filter(if (!is.null(p_p)) percentile %in% as.numeric(p_p) else percentile != "") %>% transmute(date, percentile = as.factor(percentile), - variable = ifelse(is.na(isexit), "all", "exits"), - advbw = advbw * 8 / 1e9) + variable = ifelse(isexit == "t", "exits", "all"), + advbw = advbw * 8 / 1e9) %>% + spread(variable, advbw) %>% + rename(p = percentile) } plot_advbwdist_perc <- function(start_p, end_p, p_p, path_p) { prepare_advbwdist_perc(start_p, end_p, p_p) %>% + gather(variable, advbw, -c(date, p)) %>% mutate(variable = ifelse(variable == "all", "All relays", "Exits only")) %>% - ggplot(aes(x = date, y = advbw, colour = percentile)) + + ggplot(aes(x = date, y = advbw, colour = p)) + facet_grid(variable ~ .) + geom_line() + scale_x_date(name = "", breaks = custom_breaks, @@ -1069,15 +999,7 @@ plot_advbwdist_perc <- function(start_p, end_p, p_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_advbwdist_perc <- function(start_p = NULL, end_p = NULL, p_p = NULL, - path_p) { - prepare_advbwdist_perc(start_p, end_p, p_p) %>% - spread(variable, advbw) %>% - rename(p = percentile) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_advbwdist_relay <- function(start_p, end_p, n_p) { +prepare_advbwdist_relay <- function(start_p = NULL, end_p = NULL, n_p = NULL) { read.csv(paste(stats_dir, "advbwdist.csv", sep = ""), colClasses = c("date" = "Date")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% @@ -1086,14 +1008,17 @@ prepare_advbwdist_relay <- function(start_p, end_p, n_p) { relay != "") %>% transmute(date, relay = as.factor(relay), variable = ifelse(isexit != "t", "all", "exits"), - advbw = advbw * 8 / 1e9) + advbw = advbw * 8 / 1e9) %>% + spread(variable, advbw) %>% + rename(n = relay) } plot_advbwdist_relay <- function(start_p, end_p, n_p, path_p) { prepare_advbwdist_relay(start_p, end_p, n_p) %>% + gather(variable, advbw, -c(date, n)) %>% mutate(variable = ifelse(variable == "all", "All relays", "Exits only")) %>% - ggplot(aes(x = date, y = advbw, colour = relay)) + + ggplot(aes(x = date, y = advbw, colour = n)) + facet_grid(variable ~ .) + geom_line() + scale_x_date(name = "", breaks = custom_breaks, @@ -1106,15 +1031,7 @@ plot_advbwdist_relay <- function(start_p, end_p, n_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_advbwdist_relay <- function(start_p = NULL, end_p = NULL, n_p = NULL, - path_p) { - prepare_advbwdist_relay(start_p, end_p, n_p) %>% - spread(variable, advbw) %>% - rename(n = relay) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_hidserv_dir_onions_seen <- function(start_p, end_p) { +prepare_hidserv_dir_onions_seen <- function(start_p = NULL, end_p = NULL) { read.csv(paste(stats_dir, "hidserv.csv", sep = ""), colClasses = c("date" = "Date")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% @@ -1135,13 +1052,7 @@ plot_hidserv_dir_onions_seen <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_hidserv_dir_onions_seen <- function(start_p = NULL, end_p = NULL, - path_p) { - prepare_hidserv_dir_onions_seen(start_p, end_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_hidserv_rend_relayed_cells <- function(start_p, end_p) { +prepare_hidserv_rend_relayed_cells <- function(start_p = NULL, end_p = NULL) { read.csv(paste(stats_dir, "hidserv.csv", sep = ""), colClasses = c("date" = "Date")) %>% filter(if (!is.null(start_p)) date >= as.Date(start_p) else TRUE) %>% @@ -1164,13 +1075,7 @@ plot_hidserv_rend_relayed_cells <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_hidserv_rend_relayed_cells <- function(start_p = NULL, end_p = NULL, - path_p) { - prepare_hidserv_rend_relayed_cells(start_p, end_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_webstats_tb <- function(start_p, end_p) { +prepare_webstats_tb <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), col_types = cols( log_date = col_date(format = ""), @@ -1184,17 +1089,22 @@ prepare_webstats_tb <- function(start_p, end_p) { filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% filter(request_type %in% c("tbid", "tbsd", "tbup", "tbur")) %>% group_by(log_date, request_type) %>% - summarize(count = sum(count)) + summarize(count = sum(count)) %>% + spread(request_type, count) %>% + rename(date = log_date, initial_downloads = tbid, + signature_downloads = tbsd, update_pings = tbup, + update_requests = tbur) } plot_webstats_tb <- function(start_p, end_p, path_p) { - d <- prepare_webstats_tb(start_p, end_p) - levels(d$request_type) <- list( - "Initial downloads" = "tbid", - "Signature downloads" = "tbsd", - "Update pings" = "tbup", - "Update requests" = "tbur") - ggplot(d, aes(x = log_date, y = count)) + + prepare_webstats_tb(start_p, end_p) %>% + gather(request_type, count, -date) %>% + mutate(request_type = factor(request_type, + levels = c("initial_downloads", "signature_downloads", "update_pings", + "update_requests"), + labels = c("Initial downloads", "Signature downloads", "Update pings", + "Update requests"))) %>% + ggplot(aes(x = date, y = count)) + geom_point() + geom_line() + facet_grid(request_type ~ ., scales = "free_y") + @@ -1208,16 +1118,7 @@ plot_webstats_tb <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_webstats_tb <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_webstats_tb(start_p, end_p) %>% - rename(date = log_date) %>% - spread(request_type, count) %>% - rename(initial_downloads = tbid, signature_downloads = tbsd, - update_pings = tbup, update_requests = tbur) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_webstats_tb_platform <- function(start_p, end_p) { +prepare_webstats_tb_platform <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), col_types = cols( log_date = col_date(format = ""), @@ -1231,15 +1132,18 @@ prepare_webstats_tb_platform <- function(start_p, end_p) { filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% filter(request_type %in% c("tbid", "tbup")) %>% group_by(log_date, platform, request_type) %>% - summarize(count = sum(count)) + summarize(count = sum(count)) %>% + spread(request_type, count, fill = 0) %>% + rename(date = log_date, initial_downloads = tbid, update_pings = tbup) } plot_webstats_tb_platform <- function(start_p, end_p, path_p) { - d <- prepare_webstats_tb_platform(start_p, end_p) - levels(d$request_type) <- list( - "Initial downloads" = "tbid", - "Update pings" = "tbup") - ggplot(d, aes(x = log_date, y = count, colour = platform)) + + prepare_webstats_tb_platform(start_p, end_p) %>% + gather(request_type, count, -c(date, platform)) %>% + mutate(request_type = factor(request_type, + levels = c("initial_downloads", "update_pings"), + labels = c("Initial downloads", "Update pings"))) %>% + ggplot(aes(x = date, y = count, colour = platform)) + geom_point() + geom_line() + scale_x_date(name = "", breaks = custom_breaks, @@ -1257,15 +1161,7 @@ plot_webstats_tb_platform <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_webstats_tb_platform <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_webstats_tb_platform(start_p, end_p) %>% - rename(date = log_date) %>% - spread(request_type, count, fill = 0) %>% - rename(initial_downloads = tbid, update_pings = tbup) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_webstats_tb_locale <- function(start_p, end_p) { +prepare_webstats_tb_locale <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), col_types = cols( log_date = col_date(format = ""), @@ -1320,12 +1216,7 @@ plot_webstats_tb_locale <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_webstats_tb_locale <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_webstats_tb_locale(start_p, end_p) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_webstats_tm <- function(start_p, end_p) { +prepare_webstats_tm <- function(start_p = NULL, end_p = NULL) { read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), col_types = cols( log_date = col_date(format = ""), @@ -1339,15 +1230,19 @@ prepare_webstats_tm <- function(start_p, end_p) { filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% filter(request_type %in% c("tmid", "tmup")) %>% group_by(log_date, request_type) %>% - summarize(count = sum(count)) + summarize(count = sum(count)) %>% + mutate(request_type = factor(request_type, levels = c("tmid", "tmup"))) %>% + spread(request_type, count, drop = FALSE) %>% + rename(date = log_date, initial_downloads = tmid, update_pings = tmup) } plot_webstats_tm <- function(start_p, end_p, path_p) { - d <- prepare_webstats_tm(start_p, end_p) - levels(d$request_type) <- list( - "Initial downloads" = "tmid", - "Update pings" = "tmup") - ggplot(d, aes(x = log_date, y = count)) + + prepare_webstats_tm(start_p, end_p) %>% + gather(request_type, count, -date) %>% + mutate(request_type = factor(request_type, + levels = c("initial_downloads", "update_pings"), + labels = c("Initial downloads", "Update pings"))) %>% + ggplot(aes(x = date, y = count)) + geom_point() + geom_line() + facet_grid(request_type ~ ., scales = "free_y") + @@ -1361,16 +1256,7 @@ plot_webstats_tm <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_webstats_tm <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_webstats_tm(start_p, end_p) %>% - rename(date = log_date) %>% - mutate(request_type = factor(request_type, levels = c("tmid", "tmup"))) %>% - spread(request_type, count, drop = FALSE) %>% - rename(initial_downloads = tmid, update_pings = tmup) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_relays_ipv6 <- function(start_p, end_p) { +prepare_relays_ipv6 <- function(start_p = NULL, end_p = NULL) { read.csv(paste(stats_dir, "ipv6servers.csv", sep = ""), colClasses = c("valid_after_date" = "Date")) %>% filter(if (!is.null(start_p)) @@ -1385,12 +1271,15 @@ prepare_relays_ipv6 <- function(start_p, end_p) { exiting = sum(server_count_sum_avg[exiting_ipv6_relay == "t"])) %>% complete(valid_after_date = full_seq(valid_after_date, period = 1)) %>% gather(total, announced, reachable, exiting, key = "category", - value = "count") + value = "count") %>% + rename(date = valid_after_date) %>% + spread(category, count) } plot_relays_ipv6 <- function(start_p, end_p, path_p) { prepare_relays_ipv6(start_p, end_p) %>% - ggplot(aes(x = valid_after_date, y = count, colour = category)) + + gather(category, count, -date) %>% + ggplot(aes(x = date, y = count, colour = category)) + geom_line() + scale_x_date(name = "", breaks = custom_breaks, labels = custom_labels, minor_breaks = custom_minor_breaks) + @@ -1405,14 +1294,7 @@ plot_relays_ipv6 <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_relays_ipv6 <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_relays_ipv6(start_p, end_p) %>% - rename(date = valid_after_date) %>% - spread(category, count) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_bridges_ipv6 <- function(start_p, end_p) { +prepare_bridges_ipv6 <- function(start_p = NULL, end_p = NULL) { read.csv(paste(stats_dir, "ipv6servers.csv", sep = ""), colClasses = c("valid_after_date" = "Date")) %>% filter(if (!is.null(start_p)) @@ -1424,12 +1306,13 @@ prepare_bridges_ipv6 <- function(start_p, end_p) { summarize(total = sum(server_count_sum_avg), announced = sum(server_count_sum_avg[announced_ipv6 == "t"])) %>% complete(valid_after_date = full_seq(valid_after_date, period = 1)) %>% - gather(total, announced, key = "category", value = "count") + rename(date = valid_after_date) } plot_bridges_ipv6 <- function(start_p, end_p, path_p) { prepare_bridges_ipv6(start_p, end_p) %>% - ggplot(aes(x = valid_after_date, y = count, colour = category)) + + gather(category, count, -date) %>% + ggplot(aes(x = date, y = count, colour = category)) + geom_line() + scale_x_date(name = "", breaks = custom_breaks, labels = custom_labels, minor_breaks = custom_minor_breaks) + @@ -1443,14 +1326,7 @@ plot_bridges_ipv6 <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_bridges_ipv6 <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_bridges_ipv6(start_p, end_p) %>% - rename(date = valid_after_date) %>% - spread(category, count) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_advbw_ipv6 <- function(start_p, end_p) { +prepare_advbw_ipv6 <- function(start_p = NULL, end_p = NULL) { read.csv(paste(stats_dir, "ipv6servers.csv", sep = ""), colClasses = c("valid_after_date" = "Date")) %>% filter(if (!is.null(start_p)) @@ -1458,6 +1334,8 @@ prepare_advbw_ipv6 <- function(start_p, end_p) { filter(if (!is.null(end_p)) valid_after_date <= as.Date(end_p) else TRUE) %>% filter(server == "relay") %>% + mutate(advertised_bandwidth_bytes_sum_avg = + advertised_bandwidth_bytes_sum_avg * 8 / 1e9) %>% group_by(valid_after_date) %>% summarize(total = sum(advertised_bandwidth_bytes_sum_avg), total_guard = sum(advertised_bandwidth_bytes_sum_avg[guard_relay != "f"]), @@ -1469,14 +1347,13 @@ prepare_advbw_ipv6 <- function(start_p, end_p) { exiting = sum(advertised_bandwidth_bytes_sum_avg[ exiting_ipv6_relay != "f"])) %>% complete(valid_after_date = full_seq(valid_after_date, period = 1)) %>% - gather(total, total_guard, total_exit, reachable_guard, reachable_exit, - exiting, key = "category", value = "advbw") %>% - mutate(advbw = advbw * 8 / 1e9) + rename(date = valid_after_date) } plot_advbw_ipv6 <- function(start_p, end_p, path_p) { prepare_advbw_ipv6(start_p, end_p) %>% - ggplot(aes(x = valid_after_date, y = advbw, colour = category)) + + gather(category, advbw, -date) %>% + ggplot(aes(x = date, y = advbw, colour = category)) + geom_line() + scale_x_date(name = "", breaks = custom_breaks, labels = custom_labels, minor_breaks = custom_minor_breaks) + @@ -1494,14 +1371,7 @@ plot_advbw_ipv6 <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_advbw_ipv6 <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_advbw_ipv6(start_p, end_p) %>% - rename(date = valid_after_date) %>% - spread(category, advbw) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} - -prepare_totalcw <- function(start_p, end_p) { +prepare_totalcw <- function(start_p = NULL, end_p = NULL) { read.csv(paste(stats_dir, "totalcw.csv", sep = ""), colClasses = c("valid_after_date" = "Date", "nickname" = "character")) %>% filter(if (!is.null(start_p)) @@ -1509,7 +1379,9 @@ prepare_totalcw <- function(start_p, end_p) { filter(if (!is.null(end_p)) valid_after_date <= as.Date(end_p) else TRUE) %>% group_by(valid_after_date, nickname) %>% - summarize(measured_sum_avg = sum(measured_sum_avg)) + summarize(measured_sum_avg = sum(measured_sum_avg)) %>% + rename(date = valid_after_date, totalcw = measured_sum_avg) %>% + arrange(date, nickname) } plot_totalcw <- function(start_p, end_p, path_p) { @@ -1517,10 +1389,8 @@ plot_totalcw <- function(start_p, end_p, path_p) { mutate(nickname = ifelse(nickname == "", "consensus", nickname)) %>% mutate(nickname = factor(nickname, levels = c("consensus", unique(nickname[nickname != "consensus"])))) %>% - complete(valid_after_date = full_seq(valid_after_date, period = 1), - nesting(nickname)) %>% - ggplot(aes(x = valid_after_date, y = measured_sum_avg, - colour = nickname)) + + complete(date = full_seq(date, period = 1), nesting(nickname)) %>% + ggplot(aes(x = date, y = totalcw, colour = nickname)) + geom_line(na.rm = TRUE) + scale_x_date(name = "", breaks = custom_breaks, labels = custom_labels, minor_breaks = custom_minor_breaks) + @@ -1531,10 +1401,4 @@ plot_totalcw <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -write_totalcw <- function(start_p = NULL, end_p = NULL, path_p) { - prepare_totalcw(start_p, end_p) %>% - rename(date = valid_after_date, totalcw = measured_sum_avg) %>% - arrange(date, nickname) %>% - write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") -} diff --git a/src/main/java/org/torproject/metrics/web/RObjectGenerator.java b/src/main/java/org/torproject/metrics/web/RObjectGenerator.java index a529830..6a142e8 100644 --- a/src/main/java/org/torproject/metrics/web/RObjectGenerator.java +++ b/src/main/java/org/torproject/metrics/web/RObjectGenerator.java @@ -122,7 +122,7 @@ public class RObjectGenerator implements ServletContextListener { StringBuilder queryBuilder = new StringBuilder(); queryBuilder.append("robust_call(as.call(list("); if ("csv".equalsIgnoreCase(fileType)) { - queryBuilder.append("write_"); + queryBuilder.append("write_data, prepare_"); /* When we checked parameters above we also put in defaults for missing * parameters. This is okay for graphs, but we want to support CSV files * with empty parameters. Using the parameters we got here. */

1 0

[metrics-web/master] Simplify plot_webstats_tb_locale function.
by karsten＠torproject.org 11 Jan '19

11 Jan '19

commit 2b34cd2023a3e59057f4274afb0d7b8163282a18 Author: Karsten Loesing <karsten.loesing(a)gmx.net> Date: Thu Jan 10 10:41:48 2019 +0100 Simplify plot_webstats_tb_locale function. --- src/main/R/rserver/graphs.R | 61 ++++++++++++++++++++------------------------- 1 file changed, 27 insertions(+), 34 deletions(-) diff --git a/src/main/R/rserver/graphs.R b/src/main/R/rserver/graphs.R index ba8862c..27f399d 100644 --- a/src/main/R/rserver/graphs.R +++ b/src/main/R/rserver/graphs.R @@ -1265,8 +1265,8 @@ write_webstats_tb_platform <- function(start_p = NULL, end_p = NULL, path_p) { write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") } -plot_webstats_tb_locale <- function(start_p, end_p, path_p) { - d <- read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), +prepare_webstats_tb_locale <- function(start_p, end_p) { + read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), col_types = cols( log_date = col_date(format = ""), request_type = col_factor(), @@ -1274,20 +1274,35 @@ plot_webstats_tb_locale <- function(start_p, end_p, path_p) { channel = col_skip(), locale = col_factor(), incremental = col_skip(), - count = col_double())) - d <- d[d$log_date >= start_p & d$log_date <= end_p & - d$request_type %in% c("tbid", "tbup"), ] - levels(d$request_type) <- list( - "Initial downloads" = "tbid", - "Update pings" = "tbup") + count = col_double())) %>% + filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% + filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% + filter(request_type %in% c("tbid", "tbup")) %>% + rename(date = log_date) %>% + group_by(date, locale, request_type) %>% + summarize(count = sum(count)) %>% + mutate(request_type = factor(request_type, levels = c("tbid", "tbup"))) %>% + spread(request_type, count, fill = 0) %>% + rename(initial_downloads = tbid, update_pings = tbup) +} + +plot_webstats_tb_locale <- function(start_p, end_p, path_p) { + d <- prepare_webstats_tb_locale(start_p, end_p) %>% + gather(request_type, count, -c(date, locale)) %>% + mutate(request_type = factor(request_type, + levels = c("initial_downloads", "update_pings"), + labels = c("Initial downloads", "Update pings"))) e <- d e <- aggregate(list(count = e$count), by = list(locale = e$locale), FUN = sum) e <- e[order(e$count, decreasing = TRUE), ] e <- e[1:5, ] - d <- aggregate(list(count = d$count), by = list(log_date = d$log_date, + d <- aggregate(list(count = d$count), by = list(date = d$date, request_type = d$request_type, locale = ifelse(d$locale %in% e$locale, d$locale, "(other)")), FUN = sum) - ggplot(d, aes(x = log_date, y = count, colour = locale)) + + d %>% + complete(date = full_seq(date, period = 1), + nesting(locale, request_type)) %>% + ggplot(aes(x = date, y = count, colour = locale)) + geom_point() + geom_line() + scale_x_date(name = "", breaks = custom_breaks, @@ -1295,7 +1310,7 @@ plot_webstats_tb_locale <- function(start_p, end_p, path_p) { scale_y_continuous(name = "", labels = formatter, limits = c(0, NA)) + scale_colour_hue(name = "Locale", breaks = c(e$locale, "(other)"), - labels = c(e$locale, "Other")) + + labels = c(as.character(e$locale), "Other")) + facet_grid(request_type ~ ., scales = "free_y") + theme(strip.text.y = element_text(angle = 0, hjust = 0, size = rel(1.5)), strip.background = element_rect(fill = NA), @@ -1305,30 +1320,8 @@ plot_webstats_tb_locale <- function(start_p, end_p, path_p) { ggsave(filename = path_p, width = 8, height = 5, dpi = 150) } -# Ideally, this function would share code with plot_webstats_tb_locale -# by using a common prepare_webstats_tb_locale function. This just -# turned out to be a bit harder than for other functions, because -# plot_webstats_tb_locale needs the preliminary data frame e for its -# breaks and labels. Left as future work. write_webstats_tb_locale <- function(start_p = NULL, end_p = NULL, path_p) { - read_csv(file = paste(stats_dir, "webstats.csv", sep = ""), - col_types = cols( - log_date = col_date(format = ""), - request_type = col_factor(), - platform = col_skip(), - channel = col_skip(), - locale = col_factor(), - incremental = col_skip(), - count = col_double())) %>% - filter(if (!is.null(start_p)) log_date >= as.Date(start_p) else TRUE) %>% - filter(if (!is.null(end_p)) log_date <= as.Date(end_p) else TRUE) %>% - filter(request_type %in% c("tbid", "tbup")) %>% - rename(date = log_date) %>% - group_by(date, locale, request_type) %>% - summarize(count = sum(count)) %>% - mutate(request_type = factor(request_type, levels = c("tbid", "tbup"))) %>% - spread(request_type, count, fill = 0) %>% - rename(initial_downloads = tbid, update_pings = tbup) %>% + prepare_webstats_tb_locale(start_p, end_p) %>% write.csv(path_p, quote = FALSE, row.names = FALSE, na = "") }

1 0