On Sun, Aug 19, 2012 at 8:25 PM, vmon vmonmoonshine@gmail.com wrote:
Hey Zack,
I know you are super busy packing for your big move, but I thought as tomorrow is the hard deadline for GSoC, is not a bad ideas to give you some final update.
I appreciate it. I am still in the middle of moving and don't have the time to go over your changes in detail, but I read over the diff between 'master' and 'f--payload-servers' and I like it for the most part. I don't think we want to _merge_ it just yet, but it's definitely something we can use. I like the URL dictionary concept and the use of query parameters, and it looks like you've got some good cleanups in here too.
I have a few specific change requests for you. First: boost should not be used. I know it's convenient when you're writing the code, but it causes so many deployment headaches that it's best avoided. It *appears* that you're only using one thing, boost::filesystem::exists, to check whether a file exists before attempting to open it. That's an antipattern. The correct approach is, don't check first, just attempt to open the file. If it succeeds, the file existed. If it fails, then check whether errno == ENOENT to find out whether the problem is that the file doesn't exist. (If you needed boost for something else, please let me know and we'll figure out a replacement.) Second: do not include the header <iostream>, ever. It is only *necessary* to include that header to get 'cin', 'cout', or 'cerr', and none of those should ever be used in this program. (For what you were doing you probably want to include <istream> instead.) (This will also mean that you don't need to add std::__ioinit to the global variables whitelist.) Third: I appreciate your having fixed my sloppy fwrite()s, but please, whenever you print an error message as a direct consequence of an I/O operation failing, include strerror(errno). That is, instead of log_debug("Error writing data"); it should be log_warn("Error writing data: %s", strerror(errno));
... I really gotta find time to write a style guide for this thing :-/
Also, could you please explain in detail the problem you were having with the DECLARE macros? If they are getting in your way, then we should change them, but I do not understand what the issue is.
So most of last week I spent on the conflict that occurred between libcurl and libevent having their eyes on the same socket. I tried different ugly and uglier solutions but finally settled on this one which is not that bad:
http://stackoverflow.com/questions/12021217/how-to-ask-libcurl-not-to-listen...
After that, I'm passing all time line tests now.
Great.
However, when I look at the client log, I see that some of the sequence no are missing in the recv: like this
203.7169 [debug] recv: <3.81> receiving block 42 <d=0 p=9 f=DAT> T:203.7170: ckt 3 <ntp 1 outq 0>: recv 42 <d=0 p=9 f=DAT>
and the next recv is like this:
203.7926 [debug] recv: <3.99> receiving block 60 <d=0 p=1 f=DAT> T:203.7927: ckt 3 <ntp 1 outq 0>: recv 60 <d=0 p=1 f=DAT>
Is this for sure an error and loss of data? or there is a normal situation that such a thing can happen? For example when the whole block can't be sent over one connection? (otherwise how do the tl tests all pass?).
Blocks are cut to fit, so it never happens that "the whole block can't be sent over one connection". What _can_ happen is that blocks arrive in a different order than they are received. Sometimes they get reordered to an extent that seems incredible, e.g. block 60 arriving immediately after block 42 as in your example (I suspect if you look at the complete log you'll see that what actually happened is block 42 was *delayed* until after blocks 43 through 59 had already arrived).
That's perfectly normal by itself, but there is a bug. If block K+128 arrives while the receiver is still waiting for block K, the receiver will declare a protocol error and kill the entire circuit. That's by design; the bug is that the sender has no way of knowing that the receiver is still waiting for block K. This is what I'm supposed to be fixing right now (by introducing explicit acknowledgments).
Further more, I merged the payload scraper and stegotorus. I this way, http_apache checks for the payload database and if it's not there it calls the scraper (it needs apache to be installed on the same machine). These all are commited on the github.
Sounds good. When we get a config file we'll want to be able to say that we expect to use one or the other, but for now this is a good approach.
Are we now exclusively using libcurl to generate requests or is there still legacy hand-generation code in there?
I started writing a unit test that fetch a webpage directly (using curl) and through stegotorus and check if they are the same. I thought curl is best approximation of a browser that is available to us. But as I don't think I can finish that till tomorrow, I'm going to spend the remaining time on documenting the 'uri transport protocol' and deal with small issues.
After we are done with the evaluation, I'll go back to unit testing.
Sounds good.
zw