[tbb-dev] Very very slow tor-browser-build build; catastrophic recursion in rbm?

Wed Feb 12 13:40:38 UTC 2020

On Tue, 11 Feb 2020, David Fifield wrote:

> I did a build of tor-browser-build build and it took even longer than I
> remember it taking -- from Friday until Tuesday, roughly. This is on a
> VPS with six 2 GHz CPUs and 16 GB RAM. Some projects I know take a long
> time to build, like rust and firefox, but it seemed like most of the
> time was being spent in the "rbm" process, not in building the projects.
> 
> I don't know much about profiling Perl, but I tried a tool called
> NYTProf that produces HTML output:
> https://people.torproject.org/~dcf/graphs/nytprof-rbm-20200212-94fce31e/

Thanks, I think this profile can help find a few things to improve.

> 
> The above NYTProf output is not from a full build. It's from modifying
> projects/snowflake/config only and then building snowflake for only one
> target; still it took almost an hour.
> 	perl -d:NYTProf rbm/rbm build snowflake --target testbuild --target torbrowser-linux-x86_64
> 	nytprofhtml

Was it a new build of snowflake where some of the dependencies had to be
built, or a build where only snowflake had to be built?

In the first case, I think part of the reason why it takes time is that
snowflake has many small dependencies, and each one is built in a
separate container, so some time is spent extracting/removing chroots.

If only snowflake and no other dependencies was built, then I don't know
why it took almost one hour. In my case, adding a blank line to
projects/snowflake/build and rebuilding it takes around 8 minutes.

> 
> To my untrained eye it looks like there is some kind of exponential
> recursion happening around the RBM::config_p function. See here, where
> in building one project the function was called 89 million times:
> https://people.torproject.org/~dcf/graphs/nytprof-rbm-20200212-94fce31e/#subs_table
> Clicking through to the per-line view, we see that 45 million of those
> calls were from RBM::get_target, and most of the rest were from
> RBM::config:
> https://people.torproject.org/~dcf/graphs/nytprof-rbm-20200212-94fce31e/root-tor-browser-build-rbm-lib-RBM-pm-8-line.html#89
> 
> You know how, if you write a Fibonacci number function and don't memoize
> the recursive calls, it ends up re-computing the same subtrees over and
> over and doing exponential work?
> 	def fib(n):
> 	    if n == 0 or n == 1:
> 		return 1
> 	    return fib(n - 1) + fib(n - 2)
> It feels like something like that is happening here. There's a pattern
> where RBM::process_template calls a chain of functions that reaches
> RBM::config_p, which eventually calls RBM::process_template again. It's
> not an infinite loop, but it seems to be doing the same work many times.
> I looked at it for a bit but my Perl isn't strong enough to understand
> specifically what's happening.
> 
> Notice how there are three loops of RBM::process_template before doing
> an RBM::config_p that calls back into further RBM::process_template.

Yes, there is a lot of recursive calls.

For example the filename in projects/tor-browser/config is defined like
this:

  filename: 'tor-browser-[% c("version") %]-[% c("var/osname") %]-[% c("var/build_id") %]'

computing this filename will require computing var/build_id, which is
defined like this:

  build_id: '[% sha256(c("var/build_id_txt", { buildconf => { num_procs => 4 } })).substr(0, 6) %]'
  build_id_txt: |
    [% c("version") %]
    [% IF c("git_hash") || c("hg_hash"); GET c("abbrev"); END; %]
    [% IF c("var/container/use_container") && ! c("var/container/global_disable") -%]
    [% c("var/container/suite") %]
    [% c("var/container/arch") %]
    [% END -%]
    input_files: [% c("input_files_id") %]
    build:
    [% c("build", { filename => 'f', output_dir => '/out' }) %]

so this require recursively computing filenames of all other projects,
which requires computing sha256sum of all input files.

Maybe there is some caching we can add to avoid computing some things
multiple times. However in some cases we cannot avoid computing almost
the same thing twice because we cannot know if an option can change the
result, for example with the following two lines:

  filename: 'name-[% c("version", { option_1 => 1 }) %]'
  filename: 'name-[% c("version", { option_1 => 2 }) %]'

> 
> Some more evidence is that while I was waiting for rbm to do something,
> I was watching the modification times of the git_clones:
> 	ls -ltrhd git_clones/*/.git | tail -n 15
> It was constantly touching all different projects, but re-touching the
> same ones over and over.

I opened a ticket for this:
https://trac.torproject.org/projects/tor/ticket/33283

I think this patch should avoid doing multiple `git checkout` and `git
log` on the same projects:
https://gitweb.torproject.org/user/boklm/rbm.git/commit/?h=bug_33283&id=ce6b9136fc9e006536b34844946d67aa92cc0c21

When running this:

  rbm/rbm showconf tor-browser filename --target torbrowser-linux-x86_64 --target alpha

in my case the patch improves the time to run this from 3m25 to 2m49.

An other possible improvement is in the function input_file_need_dl
where we can avoid computing multiple times the sha256sum of the same
files.

Nicolas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tbb-dev/attachments/20200212/7bd9a55b/attachment.sig>