Use a squash merge in Bors

Summary: Implement a configuration option for bors to produce "squash merge" commits. When this option is enabled, instead of attaching all commits under the approved pull requests to a batch merge commit, bors will generate one, single, commit for each approved pull request.

I allow this RFC document to be modified and redistributed under the terms of the Apache-2.0 license,

Motivation

Currently, Bors merges in the entire commit history of a PR. This ends up polluting the git history with huge numbers of small commits that don't matter. Github solves this by offering a "Squash and merge" functionality. This will compress the entire commit history into a single commit and provide a good commit title with a link back to the PR.

Squash and merge functionality is widely used in the Github community and is a blocker for rolling out Bors to more locations.

There are several open bugs/feature requests that want the functionality of a squash merge in Bors. Eg: Is anyone working on squash-merge for bors and https://github.com/bors-ng/bors-ng/issues/194

Guide-level explanation

To have Bors use the Github "squash-and-merge" functionality.

Edit your bors.toml file and add use_squash_merge = true. (The default value of use_squash_merge is false.)

When you run bors r+, bors will operate as normal, except when your batch passes:

  1. All changes in each PR will be squashed together, so there is one commit per PR.
  2. The commit message will be set to use the PR title with a link to the PR, and the the commit details will contain the commit messages that have been squashed together.

Reference-level explanation

The basic functional loop of this feature is:

  1. Create the staging branch
  2. If the use_squash_merge flag is set to true for the report then
  3. For each Patch
  • Get the Git tree of the final commit in each patch
  • Create a commit on the staging branch with the commit tree (this is a squash commit)
  • Generate a commit title from the PR title
  • Generate a commit text body from the PR description
  • Create a merge commit (if needed) of all squashed commits
  1. Run the regular Bors build checks
  2. When a batch passes all required checks - For each Patch in a Batch
  • Change the title of the patch from "$title" to "[Merged By Bors] - $title"

Drawbacks

This is an opt-in feature. I think most people would get a better experience if it was just enabled by default.

Rationale and alternatives

  • Why is this design the best in the space of possible designs?

This change leaves all other aspects of the code-base alone. The functionality is exclusively opt-in, and does not require any additional services, process, or resources to be installed in the Bors container. The code changes are non-obtrusive

  • What other designs have been considered and what is the rationale for not choosing them?

One alternative is using the green button merge functionality. This works but could allow some code to be merged without being fully validated against the tests.

  • What is the impact of not doing this?

This is a highly demanded feature and not implementing it will hamper adoption of Bors in the wider world.

Prior art

See Is anyone working on squash-merge for bors and https://github.com/bors-ng/bors-ng/issues/194 for prior discussions of this.

Future possibilities

The alternative option could become easy to support if Github adds a more robust Git manipulation API.

@MarkRobinson Can you elaborate on what you mean by this?

As far as I can tell, Bors does not use GitHub's Pull Requests API. Rather, it uses GitHub's Git Database API. Specifically, Bors uses the "create a commit" endpoint to synthesize merge commits and the "update a reference" endpoint to force-push staging and trying and fast-forward push `master.

I would suggest performing the squash-merge on the integration branch, such that the integration -> master path stays unchanged and the two branches (and their CI) can remain correlated.

Generally speaking, if it's not useful to keep the commits it's not actually useful to keep their individual messages either. Especially as in the best case these commit messages are useful when correlated to their content (in the worst case the commit messages are useless).

Why not use the PR body as commit body instead? If the contents of the individual commit message are useful for some reason, they can be copied back in the PR description.

One alternative is checking out the git tree and doing a rebase through git to squash the commits onto staging. This would preserve the nice functionality of atomic commits to master for all changes. The downside is that it would require:

  • git to be added into the Bors container
  • Sufficient disk space to checkout the entire git tree (non-trivial and potentially quick slow)
  • Integrate a git interface into Bors and deal with those errors

Squashing can actually be implemented rather easily through the github API:

  • create a regular merge of one branch into the other
  • get the tree object of the resulting merge commit
  • create a new commit with that tree and only the relevant parent
  • reset/update the branch (I don't know if bors currently uses a sacrificial branch or if it does everything on the staging branch)

The problem with this approach is that GitHub will not recognize this as "merging" the pull request. Thus, you will not get the nice purple "merged" badge (merged). You could configure Bors to "close" the pull request, but then you will have the red "closed" badge (closed) instead of the "merged" badge.

I don't consider that an issue (or at least not once which should limit tooling, it should be fixed in and by github, and should be possible since they're clearly messing with that flag when using the rebase or squash modes on the PR merge button / API) but yes.

Here's what we could do: Close the pull request, but edit the title to start with some text (customizable, with default value [Merged by Bors]) to indicate that the pull request wasn't really closed but was merged by Bors.

That way we don't have to wait for the GitHub team to fix this behavior.

Why not use the PR body as commit body instead? If the contents of the individual commit message are useful for some reason, they can be copied back in the PR description.

I was aiming to copy what github does. I will happily agree that the PR body is a much better commit message than the individual commits.

1 Like

DilumAluthge

Use the Github merge API w/ "squash" option enabled

@MarkRobinson Can you elaborate on what you mean by this?

Sure. There is a Github API that allows you to invoke the PR merge functionality. You can trigger it with the 'squash-and-merge' functionality by setting the mode to 'squash' when you invoke it.

Woah! That's really cool. Thanks for the tip.

I'm fine with accepting this RFC, as long as you add the alternative approaches that people are bringing up to the Alternatives section, and you verify that the code that ends up in master will always be identical to the code that gets tested in staging.

The problem is that I'm not sure if merging is actually deterministic, and even if it is the GitHub Data API might merge differently than the GitHub PR API.

If you have to mark the PR's as closed, we can live with it, even though I'd rather ensure they come up merged.

As far as I can tell, if you "merge" a pull request using GitHub's Pull Request API, you add new commits to master. In other words, the commits that you add to master will not be the same commits that were tested in staging.

In my opinion, one of the most important features of Bors is that master is only ever fast-forwarded to the same exact commit that was tested in staging.

This is my concern: How can we ensure that the code in master is identical to the code in staging? Currently, Bors ensures this by only fast-forwarding master to the same commit that passed in staging. But since the Pull Request API is creating new commits and adding them to master, we can't do that.

Could we maybe compare the tree hash of the final code in master versus the code in staging? Certainly if the tree hashes are identical, then the code is identical. But if the tree hashes don't match, does that mean that the code is definitely different? Or could you have different tree hashes for the same code content?

The only other way to ensure the code is identical would be to copy the code locally and compare it ourselves, which will require a lot of disk space and will never be feasible for the public Bors instance.

:-1: Because of these concerns, I am opposed to adding the use of GitHub's Pull Request API to Bors, and thus I would oppose this RFC in its current form

However, if you change the RFC to use @xmo-odoo's suggestion for squashing, then I wouldn't oppose this RFC. (Certainly it is unfortunate that you'd get the "Closed" instead of "Merged", but that is preferable to the concern that the commits in master are different fro the commits in staging.)

Of course, I'm not a project maintainer, so ultimately it's not my decision.

I think @xmo-odoo suggestion is a fine one. I'll change the RFC to use it. It does complicate how we would configure if the feature is enabled.

  1. Use an env var/global configuration
  2. Use the toml set in the first Patch
  3. Use the value set on master

I'd prefer (3) myself but I would rather be consistent with what bors does generally in these situations.

I prefer #2, since it's how bors normally treats its toml file.

1 Like

Incidentally, I asked github support about the ability to mark PRs as merged (to get the purple badge instead of the red "closed" one) and they replied that:

I'm not aware of any plans to provide a feature for marking a pull request as merged like you described. With that in mind, I don't expect that will happen anytime soon.

So there you are, probably not going to be a thing unless either someone has an "in" and can suggest the feature to a dev who'd be interested or possibly there's enough interest for the feature being expressed on the forum or support (or both).

Thanks for reaching out to GitHub support!

I agree, it seems unlikely that we will ever see that feature.

It's really nbd, support is responsive and reactive even if often nothing comes off of it.

I would strongly recommend anyone interested in this sort of feature ask them about it, and ping them one in a while, my understanding is it's probably the least bad way of in essence sending github feature requests (AFAIK there's no public tracker or anything, and I don't think the community forums handle feature requests outside of API) (except now that I'm thinking about it, this is an API feature request, so I should probably open a thread) (edit: done).

1 Like

I upvoted your post and added a comment of support.

Thanks for the feedback everyone. Given what people have suggested here's what I think I should change.

  1. Have Bors created Squash commits using the Git Tree API. Then apply those commits to the staging branch. Run the regular bors commit evaluation process.

  2. Have Bors change the title of a PR from "$title" to "[Merged by Bors] - $title".

  3. Have the commit title be the name of the PR, and the commit message be the PR description.

  4. We will read the first Patch in the batch during the commit process to determine if we should use the old way or the new of merging patches.

1 Like

I've updated the RFC with the comments that everyone has suggested.

2 Likes

Hey All,

I've written up an implementation. You can see it at: https://github.com/bors-ng/bors-ng/pull/718. Please add comments or suggestions. I'm very new to Elixir, let me know what needs to be updated to be idiomatic.

1 Like