Limiting the scope of merge failures in batches

Hey! We recently had a very "merge conflicting" PR go through bors-NG in a way that caused numerous batch failures, and we're hoping to improve the way that these PRs are handled. I'll start with an example to explain the scenario, then share some ideas we have.

Scenario

The PR in question is #50314, and it went on this journey:

  1. Batch 202 (#50335 #50314 #50309 #50225 #49536) - Failed (merge conflict)
  2. Batch 203 (#49536) - Succeeded
  3. Batch 204 (#50342 #50335 #50314 #50309 #50307) - Failed (merge conflict)
  4. Batch 205 (#50309 #50307) - Succeeded
  5. Batch 206 (#50342 #50336 #50335 #50314 #50225) - Failed (merge conflict)
  6. Batch 207 (#50314 #50225) - Failed (merge conflict)
  7. Batch 208 (#50342 #50336 #50335) - Succeeded
  8. Batch 209 (#50225) - Succeeded
  9. Batch 210 (#50314 #50305 #50026) - Failed (merge conflict)
  10. Batch 211 (#50306 #50293 #50026 #49940 #49742) - Succeeded
  11. Batch 212 (#50352 #50341 #50314 #50305 #50242 #50208 #50154) - Failed (merge conflict)
  12. Batch 213 (#50354) - Succeeded
  13. Batch 214 (#50242 #50208 #50154) - Succeeded
  14. Batch 215 (#50363 #50361 #50360 #50356 #50352 #50346 #50343 #50341 #50305 #50288 #50260 #50139 #50102 #49952) - Succeeded
  15. Batch 216 (#50368 #50314 #50289 #50279) - Succeeded

After #50314 caused batches 202, 204, 206, 207, 210, and 212 with the first bors merge, it was manually cancelled. The PR was then updated (rebased the code) and bors merged again, where it succeeded in batch 216.

Observations

A couple of patterns worth pointing out:

  • many of the batches that #50314 was bisected into collected additional PRs as it was waiting in the queue.
  • batches that result in a bisect are allowed to collect newly bors merged PRs

Proposals/thoughts

Ideally we would like to have #50314 reach the point where it is more clearly identified as not being mergeable. We think this would end up being a batch containing only #50314. This would isolate the erroneous PR from other PRs.

Following on from this, we have some ideas, and wanted to get your feedback on them before continuing.

  1. When merging patches for a batch, instead of failing fast and doing a batch bisect, use a different flow with specialized merge conflict aware logic. It could group the patches into 2 new batches: ones that can be merged successfully (that gets run first), and those that cannot be merged successfully (one that is queued to run second).

  2. When a batch is bisected, prevent additional patches from entering the batch to limit the scope of the bad PR to a smaller batch size. I think of this like "freezing" a waiting batch, so no more patches can be linked to it.

Cheers,
Adam

2 Likes