Sudden crash with concurrent batches

Hi, we work on clap-rs/clap and we use the public instance of bors.

I tried to r+ three PRs in quick succession, namely #1712, #1713, and #1714. #1714 happened to fall into the batch 48629, while the other two fell into some another batch which number I didn't notice.

Somehow, 48629 had been cancelled, and it made the other batch to crash. I believe that nobody from the team could have cancelled it.

What could it be?

1 Like

@notriddle Do you have any idea about this?

We also can't seem to use bors r+ before CI passes. It just doesn't merge after CI passes. Would love to see if there any related logs. If we figure out the problem, I can contribute a fix.

The best place I can think of to start looking is in History tab on the bors dashboard. If you have permission to see it, this is the link to it.

In here, I can see that 1714 was canceled because of a crash:

{{:badmatch, {:error, :post_commit_status}},
 [
   {BorsNG.GitHub, :post_commit_status!, 2,
    [file: 'lib/github/github.ex', line: 227]},
   {Enum, :"-each/2-lists^foreach/1-0-", 2,
    [file: 'lib/enum.ex', line: 769]},
   {Enum, :each, 2, [file: 'lib/enum.ex', line: 769]},
   {BorsNG.Worker.Batcher, :do_handle_cast, 2,
    [file: 'lib/worker/batcher.ex', line: 195]},
   {BorsNG.Worker.Batcher, :handle_cast, 2,
    [file: 'lib/worker/batcher.ex', line: 90]},
   {:gen_server, :try_dispatch, 4,
    [file: 'gen_server.erl', line: 637]},
   {:gen_server, :handle_msg, 6,
    [file: 'gen_server.erl', line: 711]},
   {:proc_lib, :init_p_do_apply, 3,
    [file: 'proc_lib.erl', line: 249]}
 ]}

I'm not sure why it failed to post the commit status, but I'm adding a pull request which will add a bit more diagnostic detail to this error, in case it happens again.

1 Like

We just got a similar crash

https://app.bors.tech/repositories/6173/log

{{:badmatch,
  {:error, :post_commit_status, 401,
   "{\"message\":\"Bad credentials\",\"documentation_url\":\"https://developer.github.com/v3\"}"}},
 [
   {BorsNG.GitHub, :post_commit_status!, 2,
    [file: 'lib/github/github.ex', line: 227]},
   {BorsNG.Worker.Batcher, :maybe_complete_batch, 1,
    [file: 'lib/worker/batcher.ex', line: 535]},
   {BorsNG.Worker.Batcher, :handle_cast, 2,
    [file: 'lib/worker/batcher.ex', line: 90]},
   {:gen_server, :try_dispatch, 4,
    [file: 'gen_server.erl', line: 637]},
   {:gen_server, :handle_msg, 6,
    [file: 'gen_server.erl', line: 711]},
   {:proc_lib, :init_p_do_apply, 3,
    [file: 'proc_lib.erl', line: 249]}
 ]}

This is definitely a spurious error, a retry passes.

@notriddle Had a crash with PR yesterday.

{{:badmatch,
  {:error, :merge_branch,
   %Tesla.Env{
     __client__: %Tesla.Client{
       adapter: {Tesla.Adapter.Httpc, :call,
        [
          [
            ssl: [
              verify: :verify_peer,
              verify_fun: {&:ssl_verify_hostname.verify_fun/3,
               [check_hostname: 'api.github.com']},
              cacertfile: '/app/_build/prod/lib/certifi/priv/cacerts.pem'
            ]
          ]
        ]},
       fun: nil,
       post: [],
       pre: [
         {Tesla.Middleware.BaseUrl, :call,
          ["https://api.github.com"]},
         {Tesla.Middleware.Headers, :call,
          [
            [
              {"authorization",
               "token v1.f7527151d67fdae656bf60ca7bd3136a497ac63c"},
              {"accept", "application/vnd.github.v3+json"},
              {"user-agent", "bors-ng https://bors.tech"}
            ]
          ]},
         {Tesla.Middleware.Retry, :call,
          [[delay: 100, max_retries: 5]]}
       ]
     },
     __module__: Tesla,
     body: "{\"message\":\"Resource not accessible by integration\",\"documentation_url\":\"https://developer.github.com/v3/repos/merging/#perform-a-merge\"}",
     headers: [
       {"date", "Mon, 20 Apr 2020 22:17:18 GMT"},
       {"server", "GitHub.com"},
       {"vary", "Accept-Encoding, Accept, X-Requested-With"},
       {"content-length", "137"},
       {"content-type", "application/json; charset=utf-8"},
       {"status", "403 Forbidden"},
       {"x-ratelimit-limit", "5000"},
       {"x-ratelimit-remaining", "4985"},
       {"x-ratelimit-reset", "1587424624"},
       {"x-github-media-type", "github.v3; format=json"},
       {"access-control-expose-headers",
        "ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset"},
       {"access-control-allow-origin", "*"},
       {"strict-transport-security",
        "max-age=31536000; includeSubdomains; preload"},
       {"x-frame-options", "deny"},
       {"x-content-type-options", "nosniff"},
       {"x-xss-protection", "1; mode=block"},
       {"referrer-policy",
        "origin-when-cross-origin, strict-origin-when-cross-origin"},
       {"content-security-policy", "default-src 'none'"},
       {"x-github-request-id",
        "B8B2:7B95:791AC7:13D1FD0:5E9E1F6E"}
     ],
     method: :post,
     opts: [],
     query: [],
     status: 403,
     url: "https://api.github.com/repositories/31315121/merges"
   }}},
 [
   {BorsNG.GitHub, :merge_branch!, 2,
    [file: 'lib/github/github.ex', line: 157]},
   {Enum, :"-reduce/3-lists^foldl/2-0-", 3,
    [file: 'lib/enum.ex', line: 1940]},
   {BorsNG.Worker.Batcher, :start_waiting_batch, 1,
    [file: 'lib/worker/batcher.ex', line: 429]},
   {BorsNG.Worker.Batcher, :poll_, 1,
    [file: 'lib/worker/batcher.ex', line: 292]},
   {BorsNG.Worker.Batcher, :handle_info, 2,
    [file: 'lib/worker/batcher.ex', line: 236]},
   {:gen_server, :try_dispatch, 4,
    [file: 'gen_server.erl', line: 637]},
   {:gen_server, :handle_msg, 6,
    [file: 'gen_server.erl', line: 711]},
   {:proc_lib, :init_p_do_apply, 3,
    [file: 'proc_lib.erl', line: 249]}
 ]}

Was that pull request editing a GitHub Action config file?

1 Like

Nope. https://github.com/clap-rs/clap/pull/1834/files. If you look at out repository history at app.bors.tech, it kept crashing however many times we try.

I'm confused by GitHub's behavior here.

It looks like the failing PR doesn't have any GitHub Actions changes, but this pull request, the one that came immediately before it, did change your actions configuration. That probably has something to do with it, because the "Resource not accessible by integration" error has always been attributed to GHA before, but I'm not sure what exactly is going on here.

1 Like

This looks like the same thing that I described in this issue comment. In short:

[Even though] the external fork PR doesn't modify the github actions files, [...] there's a commit in master that did, so a merge commit into staging or staging.tmp fails.

edit: I forgot to mention the fix. Simply merge master back into the external fork PR and bors should no longer crash when trying to merge.

2 Likes