Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Crash on start if can't connected to bitcoind #1022

Open
dpc opened this issue Mar 18, 2024 · 3 comments
Open

Bug: Crash on start if can't connected to bitcoind #1022

dpc opened this issue Mar 18, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@dpc
Copy link

dpc commented Mar 18, 2024

Describe the bug

The core question is - should a daemon like electrs crash on start if it can't connected to bitcoind?

Starting electrs 0.10.1 on x86_64 linux with Config { network: Regtest, db_path: "/build/devimint-7736-767/electrs/regtest", daemon_dir: "/build/devimint-7736-767/bitcoin/regtest", daemon_auth: UserPass("bitcoin", "<sensitive>"), daemon_rpc_addr: 127.0.0.1:17057, daemon_p2p_addr: 127.0.0.1:26493, electrum_rpc_addr: 127.0.0.1:23330, monitoring_addr: 127.0.0.1:24286, wait_duration: 10s, jsonrpc_timeout: 15s, index_batch_size: 10, index_lookup_limit: None, reindex_last_blocks: 0, auto_reindex: true, ignore_mempool: false, sync_once: false, skip_block_download_wait: false, disable_electrum_rpc: false, server_banner: "Welcome to electrs 0.10.1 (Electrum Rust Server)!", signet_magic: fabfb5da, args: [] }
[2024-03-18T06:20:13.921Z INFO  electrs::metrics::metrics_impl] serving Prometheus metrics on 127.0.0.1:24286
[2024-03-18T06:20:13.921Z INFO  electrs::server] serving Electrum RPC on 127.0.0.1:23330
[2024-03-18T06:20:13.942Z INFO  electrs::db] "/build/devimint-7736-767/electrs/regtest": 0 SST files, 0 GB, 0 Grows
[2024-03-18T06:20:13.943Z INFO  electrs::db] closing DB at /build/devimint-7736-767/electrs/regtest
Error: electrs failed

Caused by:
    0: bitcoind RPC polling failed
    1: daemon not available
    2: JSON-RPC error: transport error: Couldn't connect to host: Connection refused (os error 111)

Note the first timestamp: 20:13.921

the whole test suite started:

�[2m2024-03-18T06:20:13.911817Z�[0m �[32m INFO�[0m �[2mdevimint�[0m�[2m:�[0m Setting up test dir �[3mpath�[0m�[2m=�[0m/build/devimint-7736-767

timestamp: 20:13.911

bitcoind spawned in the background earlier, but was available for querying only a few seconds later. But 30ms into the test suite, electrs already gave up on it.

It seems like all Bitcoin daemons we're using are like that: lightningd, lnd, electrs. which makes me wonder - is this some shared design decision, that I never learned, or just a weird coincidence. :D . All three are different languages, different teams etc.

Sure in a real deployment, there always will be some kind of supervisor to restart things, but still... I would expect daemons to never shut down just because they can't connect to another networked service. What's the point, if the supervisor ... is just going to start them again.

The context is: I'm trying to optimize our test suite starting time: letting more things start in parallel, etc. And it would be nice if I could start some daemons around the same time I'm starting bitcoind, and not have to postpone everything until bitcoind takes a shower, brushes teeth, eats breakfast and is finally ready for work.

@dpc dpc added the bug Something isn't working label Mar 18, 2024
@romanz romanz self-assigned this Mar 18, 2024
@448-OG
Copy link

448-OG commented Apr 4, 2024

Can I work on this if the issue has not been solved?

@448-OG
Copy link

448-OG commented Apr 4, 2024

Would checking if a daemon like bitcoind is running after every few seconds, as set in config file work and log an ERROR on each retry and log an INFO on each successful connection solve the issue ?

@dpc
Copy link
Author

dpc commented Apr 4, 2024

There's some design debate here to be had, I guess. Does electrs really need to connect on start to bitcoind? If so, it probably could block for some time, sleep ,retry, etc. until maybe eventually give up. If not really - then doing on start should be converted to a normal operation loop. What I mean by that: in a normal operation electrs probably runs some loop/listen for notifications etc. and can tolerate temporary connectivity issues by just retrying. Maybe whatever is failing if bitcoind is not reachable on start, could be converted to be a part of such a high-level operation loop and retry all the same if anything goes wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants