Sudden crash with concurrent batches

Hi, we work on clap-rs/clap and we use the public instance of bors.

I tried to r+ three PRs in quick succession, namely #1712, #1713, and #1714. #1714 happened to fall into the batch 48629, while the other two fell into some another batch which number I didn't notice.

Somehow, 48629 had been cancelled, and it made the other batch to crash. I believe that nobody from the team could have cancelled it.

What could it be?

1 Like

@notriddle Do you have any idea about this?

We also can't seem to use bors r+ before CI passes. It just doesn't merge after CI passes. Would love to see if there any related logs. If we figure out the problem, I can contribute a fix.

The best place I can think of to start looking is in History tab on the bors dashboard. If you have permission to see it, this is the link to it.

In here, I can see that 1714 was canceled because of a crash:

{{:badmatch, {:error, :post_commit_status}},
 [
   {BorsNG.GitHub, :post_commit_status!, 2,
    [file: 'lib/github/github.ex', line: 227]},
   {Enum, :"-each/2-lists^foreach/1-0-", 2,
    [file: 'lib/enum.ex', line: 769]},
   {Enum, :each, 2, [file: 'lib/enum.ex', line: 769]},
   {BorsNG.Worker.Batcher, :do_handle_cast, 2,
    [file: 'lib/worker/batcher.ex', line: 195]},
   {BorsNG.Worker.Batcher, :handle_cast, 2,
    [file: 'lib/worker/batcher.ex', line: 90]},
   {:gen_server, :try_dispatch, 4,
    [file: 'gen_server.erl', line: 637]},
   {:gen_server, :handle_msg, 6,
    [file: 'gen_server.erl', line: 711]},
   {:proc_lib, :init_p_do_apply, 3,
    [file: 'proc_lib.erl', line: 249]}
 ]}

I'm not sure why it failed to post the commit status, but I'm adding a pull request which will add a bit more diagnostic detail to this error, in case it happens again.

1 Like

We just got a similar crash

https://app.bors.tech/repositories/6173/log

{{:badmatch,
  {:error, :post_commit_status, 401,
   "{\"message\":\"Bad credentials\",\"documentation_url\":\"https://developer.github.com/v3\"}"}},
 [
   {BorsNG.GitHub, :post_commit_status!, 2,
    [file: 'lib/github/github.ex', line: 227]},
   {BorsNG.Worker.Batcher, :maybe_complete_batch, 1,
    [file: 'lib/worker/batcher.ex', line: 535]},
   {BorsNG.Worker.Batcher, :handle_cast, 2,
    [file: 'lib/worker/batcher.ex', line: 90]},
   {:gen_server, :try_dispatch, 4,
    [file: 'gen_server.erl', line: 637]},
   {:gen_server, :handle_msg, 6,
    [file: 'gen_server.erl', line: 711]},
   {:proc_lib, :init_p_do_apply, 3,
    [file: 'proc_lib.erl', line: 249]}
 ]}

This is definitely a spurious error, a retry passes.