Postgres timeout error with large GitHub organization


we are currently testing to see if bors-ng fits our needs to improve our GitHub merge workflow.

We are running bors on Kubernets connected to an RDS Postgres instance. We are running GitHub enterprise.

The issue we have now encountered is related to adding a repository from a GitHub organization with 3.7k users.
During the initial sync, it paginates through the collaborators for the repository (122 pages...)
And then we get the following error:

16:13:28.646 pid=<0.2031.0> [error] Postgrex.Protocol (#PID<0.2031.0>) disconnected: ** 
(DBConnection.ConnectionError) client #PID<0.2212.0> timed out because it checked out the 
 connection for
 longer than 15000ms
16:13:28.648 pid=<0.2212.0> [error] Error in process #PID<0.2212.0> on node :"bors@" 
with exit value:
{%DBConnection.ConnectionError{message: "tcp recv: closed"}, 
[{Ecto.Adapters.Postgres.Connection, :execute, 4, [file: 'lib/ecto/adapters/postgres/connection.ex', 
 line: 92]}, {Ecto.Adapters.SQL
 , :sql_call, 6, [file: 'lib/ecto/adapters/sql.ex', line: 256]}, {Ecto.Adapters.SQL, :struct, 8, [file: 
'lib/ecto/adapters/sql.ex', line: 542]}, {Ecto.Repo.Schema, :apply, 4, [file: 'lib/ecto/r
epo/schema.ex', line: 547]}, {Ecto.Repo.Schema, :"-do_insert/4-fun-1-", 14, [file: 
'lib/ecto/repo/schema.ex', line: 213]}, {Ecto.Repo.Schema, :"-wrap_in_transaction/6-fun-0-", 3, [file: 
cto/repo/schema.ex', line: 774]}, {DBConnection, :transaction_nested, 2, [file: 'lib/db_connection.ex', 
line: 1374]}, {DBConnection, :transaction_meter, 3, [file: 'lib/db_connection.ex', line:

The database is reporting that the connection was closed:

:LOG: checkpoint complete: wrote 37 buffers (0.2%); 0 WAL file(s) added, 0 removed, 1 recycled; 
write=3.730 s, sync=0.001 s, total=3.741 s; sync files=10, longest=0.001 s, average=0.000 s; 
distance=65754 kB, estimate=65754 kB
2021-01-20 15:56:54 UTC:[29690]:LOG: unexpected EOF 
on client connection with an open transaction

I wasn't able to find an easy way of increasing the timeout. Since this is actually the first time I've seen elixir, I will do some checking where the timeout can be increased. If anyone has encountered similiar issues and was already able to fix them, any help would be greatly appreciated. :slight_smile:

This is rally a great tool and I hope we can integrate this and later start contributing as well.

1 Like

I think this PR would probably help with it.

1 Like

Thanks for the quick response. Without the transaction it should work without issues. I patched the code to increase the timeout and the operation took >60 seconds to complete successfully. We will test the patch later today as well.

We've tested the patch and it works as expected. Users are added and no timeout is seen. Thanks a lot :slight_smile: