On Fri, Nov 16, 2018 at 10:22:24AM +0100, Linus Nordberg wrote:
- Are we limited to using solr, as mentioned in #25322, or can we explore other options?
I have vague memories that Isa and Hiro explored other options, like outsourcing it to duckduckgo, but apparently the user flow was horrible. So, I don't know what constraints we want now, but there is some history of exploring other options.
- User fronting tpo web sites are "on the static rotation" because that's how we can keep them up and running given the resources at hand. Adding dynamic content, i.e. anything that is not "oh, that url corresponds to this file, let's send it to the user", would not be possible on our current set of VM's given the load we see on user facing tpo websites. This means that one of the proposed solutions with web servers proxying requests to a separate service, search.tpo, is not an option.
If there's some way to limit the number of searches (proxypasses) going at once, so a crawler doesn't take down (fill all the slots of) all of our static webservers, this idea might still be worth exploring. I feel a bit bad putting in place something that is so obviously going to be a source of ongoing pain, but I don't know of amazing better options that match all the other goals.
Another argument against proxying is that it breaks the expectation of end-to-end security given by HTTPS.
If we're proxying to another service running *on that same machine*, then I think we're ok on this point. It's just if we have some central separate search service that it would be a problem. So for example if solr is our choice, we could run a replicated solr on each webserver.
--Roger