Hello! also, a question about HEAD moving


#1

Hi Michael!

@couchand here. Thought I should reach out and say hi. Thanks for
building bors-ng, it’s really neat! I’m super excited to get us on it, I
think it will help out our workflow a lot.

We’re sure hammering on it here, and we haven’t even officially switched
over. Actually, that may be part of the problem…

Today we’ve had a bunch of people try to start using Bors, but we still
haven’t disabled manual merges, so there’s been some collisions. Also our
build takes (optimistically) about 15 minutes.

Bors seems to keep getting wedged, maybe when a manual merge interrupts.
It looks like it’s supposed to comment “Cancelled” on the PR, but it’s just
crashing, and sometimes the batch disappears without a trace. I’d file an
issue on GitHub, but I’m not sure if this is related to #301 or another
issue already listed.

Here are a handful of the crash reports, in case that’s useful to poke
through. Hope I’m not being too annoying by e-mailing you about this.

Let me know if I should file this as GitHub issue or add it to #301 or
something else.

Thanks for your hard work!
Andrew

4/2/2018, 1:48:57 PM Crash batch

{{%HTTPoison.Error{id: nil, reason: :timeout},
  [{HTTPoison, :request!, 5,
    [file: 'lib/httpoison.ex', line: 66]},
   {BorsNG.GitHub.Server, :do_handle_call, 3,
    [file: 'lib/github/github/server.ex', line: 268]},
   {BorsNG.GitHub.Server, :use_token!, 3,
    [file: 'lib/github/github/server.ex', line: 493]},
   {:gen_server, :try_handle_call, 4,
    [file: 'gen_server.erl', line: 615]},
   {:gen_server, :handle_msg, 5,
    [file: 'gen_server.erl', line: 647]},
   {:proc_lib, :init_p_do_apply, 3,
    [file: 'proc_lib.erl', line: 247]}]},
 {GenServer, :call,
  [BorsNG.GitHub,
   {:post_commit_status,
    {{:installation, 117629}, 16563587},
    {"8f8c83d9853162411cad9c3f1db00acd1d6d297b", :error,
     "Canceled",
     "https://bors.crdb.io/repositories/56/log#batch-42"}},
   5000]}}

4/2/2018, 12:44:06 PM Crash batch

{%Ecto.ConstraintError{constraint: "statuses_identifier_batch_id_index",
  message: "constraint error when attempting to insert struct:\n\n
* unique: statuses_identifier_batch_id_index\n\nIf you would like to
convert this constraint into an error, please\ncall
unique_constraint/3 in your changeset and define the
proper\nconstraint name. The changeset has not defined any
constraint.\n",
  type: :unique},
 [{Ecto.Repo.Schema, :"-constraints_to_errors/3-fun-1-", 4,
   [file: 'lib/ecto/repo/schema.ex', line: 574]},
  {Enum, :"-map/2-lists^map/1-0-", 2,
   [file: 'lib/enum.ex', line: 1229]},
  {Ecto.Repo.Schema, :constraints_to_errors, 3,
   [file: 'lib/ecto/repo/schema.ex', line: 559]},
  {Ecto.Repo.Schema, :"-do_insert/4-fun-1-", 14,
   [file: 'lib/ecto/repo/schema.ex', line: 222]},
  {Ecto.Repo.Schema, :"-wrap_in_transaction/6-fun-0-", 3,
   [file: 'lib/ecto/repo/schema.ex', line: 774]},
  {Ecto.Adapters.SQL, :"-do_transaction/3-fun-0-", 3,
   [file: 'lib/ecto/adapters/sql.ex', line: 576]},
  {DBConnection, :transaction_run, 4,
   [file: 'lib/db_connection.ex', line: 1283]},
  {DBConnection, :run_begin, 3,
   [file: 'lib/db_connection.ex', line: 1207]}]}

4/2/2018, 1:48:56 PM Crash batch

{:timeout,
 {GenServer, :call,
  [BorsNG.GitHub,
   {:post_commit_status,
    {{:installation, 117629}, 16563587},
    {"8f8c83d9853162411cad9c3f1db00acd1d6d297b", :ok,
     "Build succeeded",
     "https://bors.crdb.io/repositories/56/log#batch-42"}},
   5000]}}

4/2/2018, 12:33:28 PM Crash batch

{:timeout,
 {GenServer, :call,
  [BorsNG.GitHub,
   {:delete_branch, {{:installation, 117629}, 16563587},
    {"staging.tmp"}}, 5000]}}

4/2/2018, 11:18:25 AM Crash batch

{:function_clause,
 [{BorsNG.Worker.Batcher, :start_waiting_merged_batch,
   [%BorsNG.Database.Batch{__meta__: #Ecto.Schema.Metadata<:loaded, "batches">,
     commit: nil, id: 38,
     inserted_at: ~N[2018-04-02 14:48:18.100195],
     into_branch: "master", last_polled: 1522680498,
     patches: #Ecto.Association.NotLoaded<association :patches is not loaded>,
     priority: 0,
     project: %BorsNG.Database.Project{__meta__:
#Ecto.Schema.Metadata<:loaded, "projects">,
      auto_member_required_perm: nil,
      auto_reviewer_required_perm: :admin,
      batch_delay_sec: 10, batch_poll_period_sec: 1800,
      batch_timeout_sec: 7200, id: 56,
      inserted_at: ~N[2018-03-26 18:55:33.600532],
      installation: #Ecto.Association.NotLoaded<association
:installation is not loaded>,
      installation_id: 1,
      members: #Ecto.Association.NotLoaded<association :members is not loaded>,
      name: "cockroachdb/cockroach", repo_xref: 16563587,
      staging_branch: "staging", trying_branch: "trying",
      updated_at: ~N[2018-03-26 18:55:33.600539],
      users: #Ecto.Association.NotLoaded<association :users is not loaded>},
     project_id: 56, state: :waiting, timeout_at: 0,
     updated_at: ~N[2018-04-02 14:48:18.100203]}, [],
    %{commit: "de7e7a104b608bb2b3ceff37a7dfe5d832dbf39d",
      tree: "d93031b1cfffbea8695d85e386990fd9db1042ac"},
    "7397b8d8147f511182184797bc47934b3a30efd9"],
   [file: 'lib/worker/batcher.ex', line: 284]},
  {BorsNG.Worker.Batcher, :start_waiting_batch, 1,
   [file: 'lib/worker/batcher.ex', line: 269]},
  {BorsNG.Worker.Batcher, :poll, 1,
   [file: 'lib/worker/batcher.ex', line: 200]},
  {BorsNG.Worker.Batcher, :handle_info, 2,
   [file: 'lib/worker/batcher.ex', line: 182]},
  {:gen_server, :try_dispatch, 4,
   [file: 'gen_server.erl', line: 601]},
  {:gen_server, :handle_msg, 5,
   [file: 'gen_server.erl', line: 667]},
  {:proc_lib, :init_p_do_apply, 3,
   [file: 'proc_lib.erl', line: 247]}]}

4/2/2018, 10:48:23 AM Crash batch

{:timeout,
 {GenServer, :call,
  [BorsNG.GitHub,
   {:get_reviews, {{:installation, 117629}, 16563587},
    {24174}}, 5000]}}

#2

Hi @couchand

I checked that there wasn’t anything sensitive in the crash reports; there’s no keys or anything, so it’s safe to share. Normally, you want to use the forum for support; that way, people can Google their errors and get solutions.


#3

Now, I’m going to try to answer the question :sunglasses:

The first error means that there was a timeout when bors tried to make an API call to GitHub. It usually happens either because GitHub themselves are having trouble or because your bot has gotten into a very chatty mood and you’re getting rate limited. I’m guessing the latter, since it’s cancelling a bunch of pull requests at once.

The second error should never happen under any circumstances. I don’t have access to my laptop this week, so I’ll look into it next week.

The third, fourth, and sixth one is similar to the first one, but in this case it’s a subsystem that’s taking too long to respond instead of GitHub itself. This almost certainly means that there’s a backlog of API calls being processed, probably because a bunch of them got cancelled.

The fifth one has me confused. Next week I’ll be able to use my laptop, instead of just a smartphone to look into this.


#4

Sounds like we’re uncovering some interesting issues. I’ll wait on most of them until you have a chance to dig into it, let me know if you need any more information. In the meantime we’ve seen a few that are clearly the same issue as #301 – with the :badmatch.

Regarding API limits, do you have a sense for how much activity would be required to hit it? The limit seems awfully high, and while we are definitely churning a lot of cancelled/restarted PRs, getting to 5000 an hour is surprising to me at least.


#5

The 5000 is the timeout; that is, Bors is set up to time out if an API call takes more than five seconds.


#6

Oh, I meant the 5000 requests per hour of the GitHub API rate limiting.


#7

@couchand

I found and fixed the immediate cause of the fifth error. I also fixed a couple other things that’ve been causing crashes, though I can’t tell how much of it is related to your problems.