Hey! We recently had a very "merge conflicting" PR go through bors-NG in a way that caused numerous batch failures, and we're hoping to improve the way that these PRs are handled. I'll start with an example to explain the scenario, then share some ideas we have.
Scenario
The PR in question is #50314, and it went on this journey:
- Batch 202 (#50335 #50314 #50309 #50225 #49536) - Failed (merge conflict)
- Batch 203 (#49536) - Succeeded
- Batch 204 (#50342 #50335 #50314 #50309 #50307) - Failed (merge conflict)
- Batch 205 (#50309 #50307) - Succeeded
- Batch 206 (#50342 #50336 #50335 #50314 #50225) - Failed (merge conflict)
- Batch 207 (#50314 #50225) - Failed (merge conflict)
- Batch 208 (#50342 #50336 #50335) - Succeeded
- Batch 209 (#50225) - Succeeded
- Batch 210 (#50314 #50305 #50026) - Failed (merge conflict)
- Batch 211 (#50306 #50293 #50026 #49940 #49742) - Succeeded
- Batch 212 (#50352 #50341 #50314 #50305 #50242 #50208 #50154) - Failed (merge conflict)
- Batch 213 (#50354) - Succeeded
- Batch 214 (#50242 #50208 #50154) - Succeeded
- Batch 215 (#50363 #50361 #50360 #50356 #50352 #50346 #50343 #50341 #50305 #50288 #50260 #50139 #50102 #49952) - Succeeded
- Batch 216 (#50368 #50314 #50289 #50279) - Succeeded
After #50314 caused batches 202, 204, 206, 207, 210, and 212 with the first bors merge
, it was manually cancelled. The PR was then updated (rebased the code) and bors merge
d again, where it succeeded in batch 216.
Observations
A couple of patterns worth pointing out:
- many of the batches that #50314 was bisected into collected additional PRs as it was waiting in the queue.
- batches that result in a bisect are allowed to collect newly
bors merge
d PRs
Proposals/thoughts
Ideally we would like to have #50314 reach the point where it is more clearly identified as not being mergeable. We think this would end up being a batch containing only #50314. This would isolate the erroneous PR from other PRs.
Following on from this, we have some ideas, and wanted to get your feedback on them before continuing.
-
When merging patches for a batch, instead of failing fast and doing a batch bisect, use a different flow with specialized merge conflict aware logic. It could group the patches into 2 new batches: ones that can be merged successfully (that gets run first), and those that cannot be merged successfully (one that is queued to run second).
-
When a batch is bisected, prevent additional patches from entering the batch to limit the scope of the bad PR to a smaller batch size. I think of this like "freezing" a waiting batch, so no more patches can be linked to it.
Cheers,
Adam