[tor-dev] Help with rebasing arma's n23-5 tor branch to current master

Karsten Loesing karsten at torproject.org
Tue Sep 10 10:17:13 UTC 2013


On 9/7/13 5:57 PM, Charlie Belmer wrote:
> Hey Karsten & Dev team,
> 
> I didn't mean to take this off channel, so my apologies for that.
> 
> I ran a few more tests this week (and my apologies on the slowness of some
> of this work - life has been hectic this past month), but I am not finding
> much of use. I could not find any errors which are not also present in when
> N23 is disabled. If you want to browse the logs for yourself, I uploaded a
> partial log from a run to http://www.charliebelmer.com/scallion.n23.log(730MB).

That log ends at simulated time 0:19:34, so 5:30 minutes before the
actual requests start.  Can you post both full logs somewhere, maybe
gzipped to save bandwidth?

> The most telling entries I found so far are below, but I see the same
> errors whether UseN23 is set to 1 or not, so it is likely not the root
> cause. I'll keep digging.

Okay.

Another option would be to run your branch in Chutney to find the bug
and then return to Shadow for performance measurements.  That's what I
did with #9166 and it helped a lot.  Let me know if you need help with that.

Thanks!
Karsten


> Error 1 (Seems to occur when opening channel):
> 0:0:49:406270 [thread-2] 0:9:41:348157618 [scallion-info]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] bw_weights_error_t
> networkstatus_check_weights(int64_t, int64_t, int64_t, int64_t, int64_t,
> int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t,
> int64_t, int)(): Bw weight mismatch 1. G=214 M=12174 E=165 D=27654 T=40207
> Wmd=4769 Wme=-724848 Wmg=0 Wed=461 Wee=734848 Wgd=4769 Wgg=10000
> Wme=-724848 Wmg=0
> 0:0:49:407692 [thread-2] 0:9:41:348157618 [scallion-info]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] bw_weights_error_t
> networkstatus_check_weights(int64_t, int64_t, int64_t, int64_t, int64_t,
> int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t, int64_t,
> int64_t, int)(): Bw weight mismatch 1. G=214 M=12174 E=165 D=27654 T=40207
> Wmd=4769 Wme=-724848 Wmg=0 Wed=461 Wee=734848 Wgd=4769 Wgg=10000
> Wme=-724848 Wmg=0
> 
> Error 2:
> 0:0:54:227263 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void conn_read_callback(int,
> short, void *)(): socket 1000264 wants to read.
> 0:0:54:227281 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int
> connection_read_to_buf(connection_t *, ssize_t *, int *)(): 1000264:
> starting, inbuf_datalen 0 (0 pending in tls object). at_most 16448.
> 0:0:54:227302 [thread-2] 0:10:55:012000001 [shadow-debug]
> [4uthority1-73.1.0.0] [_tcp_endOfFileSignalled] 73.1.0.0:10021 (descriptor
> 1000264) <-> 91.1.0.0:9111: signaling close to user, socket no longer usable
> 0:0:54:227321 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] TLS error: unexpected close
> while reading (SSL negotiation finished successfully)
> 0:0:54:227338 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int tor_tls_read(tor_tls_t *,
> char *, size_t)(): read returned r=0, err=-8
> 0:0:54:227354 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int
> connection_read_to_buf(connection_t *, ssize_t *, int *)(): TLS connection
> closed on read. Closing. (Nickname nonexit10, address 91.1.0.0)
> 0:0:54:227373 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void
> channel_close_for_error(channel_t *)(): Closing channel 0x7f546d03ce40 due
> to lower-layer error
> 0:0:54:227389 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void
> channel_change_state(channel_t *, channel_state_t)(): Changing state of
> channel 0x7f546d03ce40 (global ID 21) from "open" to "closing"
> 0:0:54:227420 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void
> channel_remove_from_digest_map(channel_t *)(): Removed channel
> 0x7f546d03ce40 (global ID 21) from identity map in state closing (4) with
> digest BBF25A366EA5376B0026CE6EDE3DF536211429BE
> 0:0:54:227440 [thread-2] 0:10:55:012000001 [shadow-debug]
> [4uthority1-73.1.0.0] [tcp_close] 73.1.0.0:10021 (descriptor 1000264) <->
> 91.1.0.0:9111:  user closed connection
> 0:0:54:227452 [thread-2] 0:10:55:012000001 [shadow-debug]
> [4uthority1-73.1.0.0] [_tcp_setState] 73.1.0.0:10021 (descriptor 1000264)
> <-> 91.1.0.0:9111: moved from TCP state 'TCPS_CLOSEWAIT' to 'TCPS_LASTACK'
> 0:0:54:227473 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void
> connection_mark_for_close_internal_(connection_t *, int, const char *)():
> Calling connection_mark_for_close_internal_() on an OR conn at
> /home/coffee/shadow/shadow/build/tor/src/or/connection.c:2932
> 0:0:54:227491 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int
> conn_close_if_marked(int)(): Cleaning up connection (fd -1).
> 0:0:54:227509 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void
> circuit_n_chan_done(channel_t *, int)(): chan to NULL/91.1.0.0:9111,
> status=0
> 0:0:54:227546 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] void
> channel_change_state(channel_t *, channel_state_t)(): Changing state of
> channel 0x7f546d03ce40 (global ID 21) from "closing" to "channel error"
> 0:0:54:227565 [thread-2] 0:10:55:012000001 [scallion-debug]
> [4uthority1-73.1.0.0] [scalliontor_logmsg_cb] int
> connection_remove(connection_t *)(): removing socket -1 (type OR), n_conns
> now 26
> 
> 
> 
> 
> On Tue, Sep 3, 2013 at 10:47 AM, Karsten Loesing <karsten at torproject.org>wrote:
> 
>> On 9/3/13 4:32 PM, Charlie Belmer wrote:
>>> Hi Karsten,
>>>
>>> I have run several tests, but I haven't been able to satisfy myself yet.
>> I
>>> attached the results from my latest round of tests for your review. I am
>>> comparing the n23 branch (UseN23 1 set in each torrc) with the initial
>> head
>>> I merged n23 with. It seems that for some reason, the n23 branch
>> completes
>>> far fewer downloads, though I haven't yet found the cause, or found
>>> anything interesting in the logs.
>>>
>>> $ grep "fg-download-complete" m1.large-tor/data/scallion.log | wc -l
>>> 8827
>>> $ grep "fg-download-complete" m1.large-n23/scallion.log | wc -l
>>> 308
>>>
>>> I would definitely appreciate any thoughts, insights, or advice!
>>
>> Interesting!  Can you re-run the n23 simulation on info or even debug
>> log level?  That might reveal some more information.  You'll do that by
>> running "scallion --log-level=INFO".  A tiny network should be
>> sufficient for that.  Watch out for filling disks.
>>
>> If you want me to look at the scallion.log file, feel free to post it
>> somewhere, and I'll see if something looks unusual to me.
>>
>> Thanks,
>> Karsten
>>
>>
>>> On Tue, Sep 3, 2013 at 8:58 AM, Karsten Loesing <karsten at torproject.org
>>> wrote:
>>>
>>>> Hi Charlie,
>>>>
>>>> did you have the chance to simulate your rebased n23-5 branch using
>> Shadow?
>>>>
>>>> If not, maybe I can help by simulating it once and writing down more
>>>> specific instructions for you?
>>>>
>>>> Thanks!
>>>> Karsten
>>>>
>>>>
>>>> On 8/8/13 10:36 AM, Karsten Loesing wrote:
>>>>> (Moving this discussion back to tor-dev@ with Charlie's permission.)
>>>>>
>>>>> On 8/6/13 4:10 AM, Charlie Belmer wrote:
>>>>>> Hey Karsten,
>>>>>>
>>>>>> I've been reading up on Shadow & Scallion to do some performance
>>>> testing on
>>>>>> N23. Before I go too far down that road, I wanted to check in with
>> you.
>>>>>>
>>>>>> Are there existing blueprints or best practices I should be following?
>>>>>
>>>>> Thanks for looking into this!
>>>>>
>>>>> So, I wouldn't call the following notes blueprints or best practices,
>>>>> but they are a description of what I would do to run Shadow simulations
>>>>> of your branch.
>>>>>
>>>>> First of all, be sure to read the Shadow wiki which has contains lots
>> of
>>>>> interesting stuff: https://github.com/shadow/shadow/wiki
>>>>>
>>>>> The next step would be to get a vanilla Tor master (that your branch is
>>>>> based on) running in a minimal or tiny Shadow network.  It's probably
>>>>> easiest to use a local Ubuntu VM for this.  Or you can start an
>> m1.large
>>>>> EC2 instance for the tiny network, or an even smaller instance for the
>>>>> minimal network.
>>>>>
>>>>> Once that simulation succeeds, you'll want to run another simulation
>>>>> using your branch, extract performance metrics using Shadow's analyze
>>>>> script, produce a PDF with graphs also using the analyze script, and
>>>>> then wonder if things are faster than before.
>>>>>
>>>>> The minimal or tiny networks are mostly there to learn that things are
>>>>> working as expected.  You'll probably want larger networks for real
>>>>> performance results, but starting small to get everything working is
>>>>> probably easier.
>>>>>
>>>>> Sounds doable, I guess.  However, here's where things start to get
>>>>> tricky: in order to simulate Tor master (or a branch based on it) in
>>>>> Shadow, you'll have to perform quite a few extra steps like compiling
>>>>> your own CMake and Clang/LLVM.  Not really hard, but you need to get
>> all
>>>>> pieces together before Shadow does what you want.
>>>>>
>>>>> I'm pasting detailed instructions that I used to simulate Steven's uTP
>>>>> branch in Shadow below as an example.  Not everything applies to your
>>>>> branch, but it should give you the general idea.
>>>>>
>>>>> If something remains unclear, please feel free to ask!
>>>>>
>>>>>> Depending how it goes, I'd also look at re-trying adaptive 23 or
>> tuning
>>>> the
>>>>>> existing implementation.
>>>>>
>>>>> Neat!  We should probably include Mashael in that discussion once you
>>>>> have a working and simulated n23-5 branch.  Maybe she has ideas what
>>>>> parts need tweaking.  I'm cc'ing her in this mail so she knows what
>>>>> you're working on.
>>>>>
>>>>> In any case, having a working rebased n23-5 branch is already a big
>> step
>>>>> forward.
>>>>>
>>>>> Thanks!
>>>>> Karsten
>>>>>
>>>>>
>>>>> === START OF INSTRUCTIONS ===
>>>>>
>>>>> Launch new EC2 instance running Ubuntu Server 13.04 64 bit
>>>>>
>>>>> Set ulimit -n to 25000, see Shadow wiki for details
>>>>>
>>>>
>> https://github.com/shadow/shadow/wiki/Preparing-your-machine#system-configs-and-limits
>>>>>
>>>>> $ sudo apt-get update
>>>>> $ sudo apt-get upgrade
>>>>>
>>>>> Clone and test-compile modified libutp
>>>>>
>>>>> $ sudo apt-get install build-essential git automake libssl-dev
>>>> libevent-dev
>>>>> $ mkdir src
>>>>> $ cd src/
>>>>> $ git clone https://github.com/sjmurdoch/libutp
>>>>> $ cd libutp/
>>>>> $ make
>>>>> $ cd ../
>>>>>
>>>>> Clone and test-compile Tor branch utp
>>>>>
>>>>> $ git clone https://git.torproject.org/tor.git
>>>>> $ cd tor/
>>>>> $ git remote add sjm217 https://git.torproject.org/sjm217/tor.git
>>>>> $ git fetch sjm217
>>>>> $ git checkout -b utp sjm217/utp
>>>>> $ ./autogen.sh
>>>>> $ LDFLAGS="-L/home/ubuntu/src/libutp"
>> CFLAGS="-I/home/ubuntu/src/libutp"
>>>>> LIBS="-lutp -lrt" ./configure --disable-asciidoc
>>>>> $ make
>>>>> $ make distclean
>>>>> $ cd ../
>>>>>
>>>>> Modify Tor to use uTP for all links (or "0 &&" for not using uTP for
>> any
>>>>> link)
>>>>>
>>>>> diff --git a/src/or/channeltls.c b/src/or/channeltls.c
>>>>> index 0551b73..b7b36e1 100644
>>>>> --- a/src/or/channeltls.c
>>>>> +++ b/src/or/channeltls.c
>>>>> @@ -418,7 +418,7 @@ channel_tls_connect(const tor_addr_t *addr,
>> uint16_t
>>>>> port,
>>>>>    /* Create a uTP connection */
>>>>>    tor_addr_to_sockaddr(addr, port, (struct sockaddr*)&sin,
>> sizeof(sin));
>>>>>    tor_addr_to_str(addr_str, addr, sizeof(addr_str), 0);
>>>>> -  if (!strncmp(addr_str, "128.232.10.129", sizeof(addr_str))) {
>>>>> +  if (1 || !strncmp(addr_str, "128.232.10.129", sizeof(addr_str))) {
>>>>>      log_info(LD_CHANNEL,
>>>>>               "Trying uTP connection to %s", addr_str);
>>>>>      tlschan->utp = UTP_Create(tor_UTPSendToProc, tlschan, (const
>> struct
>>>>> sockaddr*)&sin,
>>>>>
>>>>> Install CMake 2.8.10
>>>>>
>>>>> $ wget http://www.cmake.org/files/v2.8/cmake-2.8.10.2.tar.gz
>>>>> $ tar xf cmake-2.8.10.2.tar.gz
>>>>> $ cd cmake-2.8.10.2/
>>>>> $ ./configure
>>>>> $ make
>>>>> $ export PATH=~/src/cmake-2.8.10.2/bin:$PATH
>>>>> $ cd ../
>>>>> $ which cmake # result: /home/ubuntu/src/cmake-2.8.10.2/bin/cmake
>>>>>
>>>>> Build Clang and LLVM from source
>>>>>
>>>>> $ wget http://llvm.org/releases/3.2/llvm-3.2.src.tar.gz
>>>>> $ wget http://llvm.org/releases/3.2/clang-3.2.src.tar.gz
>>>>> $ tar xaf llvm-3.2.src.tar.gz
>>>>> $ tar xaf clang-3.2.src.tar.gz
>>>>> $ cp -R clang-3.2.src llvm-3.2.src/tools/clang
>>>>> $ cd llvm-3.2.src/
>>>>> $ mkdir build
>>>>> $ cd build/
>>>>> $ cmake -DCMAKE_INSTALL_PREFIX=/home/ubuntu/.local ../.
>>>>> $ make
>>>>> $ make install
>>>>> $ export PATH=~/.local/bin/:$PATH
>>>>> $ cd ../../
>>>>> $ which clang # result: /home/ubuntu/.local/bin//clang
>>>>>
>>>>> Install Shadow
>>>>>
>>>>> $ sudo apt-get install gcc xz-utils make automake autoconf cmake tidy
>>>>> libtidy-dev libglib2.0 libglib2.0-dev dstat pdftk python2.7
>>>>> python-matplotlib python-numpy python-scipy
>>>>> $ git clone https://github.com/shadow/shadow.git
>>>>> $ cd shadow
>>>>> $ ./setup dependencies
>>>>> $ ./setup build -g -i /home/ubuntu/src/libutp -i
>>>>> /usr/include/x86_64-linux-gnu/c++/4.7 -l /home/ubuntu/src/libutp
>>>>> --tor-prefix /home/ubuntu/src/tor --tor-lib utp
>>>>> $ ./setup install
>>>>> $ export PATH=~/.shadow/bin/:$PATH
>>>>>
>>>>> Simulate tiny network in Shadow
>>>>>
>>>>> $ cd resource/examples/scallion/
>>>>> $ tar xf tiny-4GB-m1.large.tar.xz
>>>>> $ cd tiny-4GB-m1.large/
>>>>> $ scallion
>>>>>
>>>>> Graph simulation results
>>>>>
>>>>> Add one line to beginning of analyze.py:
>>>>>
>>>>> import matplotlib; matplotlib.use('Agg')
>>>>>
>>>>> For additional traffic simulations:
>>>>>
>>>>> $ grep -v "\[traffic-" scallion.log > scallion-notraffic.log
>>>>>
>>>>> $ cd ~/src/
>>>>> $ mkdir results
>>>>> $ cd results/
>>>>> $ python ~/src/shadow/contrib/analyze.py parse --output uTP-1
>>>>>
>>>>
>> ~/src/shadow/resource/examples/scallion/tiny-4GB-m1.large/data-utp-1/scallion.log
>>>>> $ python ~/src/shadow/contrib/analyze.py parse --output uTP-0
>>>>>
>>>>
>> ~/src/shadow/resource/examples/scallion/tiny-4GB-m1.large/data-utp-0/scallion.log
>>>>> $ python ~/src/shadow/contrib/analyze.py parse --output vanilla
>>>>>
>>>>
>> ~/src/shadow/resource/examples/scallion/tiny-4GB-m1.large/data-vanilla/scallion.log
>>>>> $ python ~/src/shadow/contrib/analyze.py plot --title "sjm217's utp
>>>>> branch (#9166)" --prefix "sjm217-utp" --data vanilla/ "vanilla
>>>>> 0.2.4.4-alpha" --data uTP-1/ "uTP for all links" --data uTP-0/ "uTP for
>>>>> none of the links"
>>>>>
>>>>> === END OF INSTRUCTIONS ===
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 



More information about the tor-dev mailing list