Optimized bisection

kevincox · September 25, 2020, 7:31pm

I'm considering using Bors for a somewhat peculiar set of requirements. I have a project that has very slow CI. CI is so slow that it currently runs on a daily fixed schedule. I would like to apply a "merge queue" style strategy to avoid having a broken master branch as with a once-a-day cadence missing even a couple of runs can be a substantial delay for updates. Bors obviously sounds perfect for this project as it makes a merge queue setup easy and supports batching.

However I am concerned that if we experience a failure that bisection will take a long time to complete. nixpkgs gets about 100 commits a day so a simple bisection would take 6-7 runs to identify the culprit (assuming that there is one). In order to avoid blocking up for a week upon every failure[1] it will be necessary to implement a smarter search than running CI on each commit.

[1]: This might actually be much quicker due to hopefully failing quickly however there is nothing to guarantee that.

Ideas

Ordering

One option to transfer some information from the original failure to the new build. For example a list of failing tests. In that case we could run those tests first which should significantly reduce the time to identify if this build is failing.

However this isn't perfect either. Imagine that the previous build took 12h and the last commit was the problematic one. We would still need to do 7 runs and take 3 days to resolve the issue.

Scanning

This is basically building on Ordering by only running the failed tests on the first pass. For example we could do an "optimistic bisection" with just the relevant tests, then once we believe we have found the culprit we can run a full build at a commit that is expected to pass.

In this case imagine that we can run the failed tests in 1h. We would need 8 runs, but the first 7 should take 1h each. The final run would be very slow but is expected to succeed.

Implications

I would need to think about the details a bit more. Especially if there is more than one failure present and how the behaviour changes with fix commits thrown into the mix. However it seems like this should make the project feasible if implemented.

notriddle · September 26, 2020, 12:32am

Other suggestion, based on tactics I’ve heard of used in other enormous projects:

Have a bot parse your CI logs, and use a heuristic to determine which PR probably caused the failure, and automatically cancel the culprit.

I think this could be implemented as a GitHub action. It would recognize bors “Build Failed” comments, pull down the linked-to log, check if the failed Derivation matched any of the files on the current PR, and post the “bors r-“ comment as a result.

kevincox · September 26, 2020, 12:53am

I'm not sure I completely understand. Is this what you mean:

Have a bot watch for Bors reporting a failure.
Gather info on the failed tests from that run.
For each commit master..staging:
a. use the failed info to identify "likely to fail commits"
b. Comment bors r- to cancel those commits.

I guess this kinda works but it seems very janky. Some issues:

The next CI run will have started, therefore be competing with this bot for resources. Ideally the filtering would happen before the next CI run starts so that it can use the resources to filter the bad commits quickly. I guess we could add some sort of interlock to the CI and the bot to avoid this.
There is a race condition here that probably doesn't matter. But the bot doesn't really have a good way to know that master..staging is a split of the failed run it saw. It could either show up before Bors has a chance to push the new batch or it could show up after CI has finished testing (for example if the bot is down for a while and tries to catch up).

I think the last point is the one that bothers me the most. It seems like the bot would be based off of assuming what is happening with little insight into what Bors is actually thinking. For example what if the batch had only one commit and so no bisection happened. The bot might start running the analysis on the next batch thinking that it is a bisection. Or am I missing some API that solves this?