Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulcrum stopped processing mempool txs (Windows Server 2012) #217

Open
pkoutsogiannis opened this issue Dec 1, 2023 · 34 comments
Open

Fulcrum stopped processing mempool txs (Windows Server 2012) #217

pkoutsogiannis opened this issue Dec 1, 2023 · 34 comments
Labels
Requires Investigation Not clear if bug here or bug outside of Fulcrum Windows

Comments

@pkoutsogiannis
Copy link

pkoutsogiannis commented Dec 1, 2023

We are using Fulcrum 1.9.7 (Release f27fc28)

We encountered the following issue 2 times in the past month:

Fullcrum stopped processing mempool txs without any log entry. We issued a stop command but fulcrum hang and we had to kill the process and restart it.

[2023-12-01 11:11:35.940] 51632 mempool txs involving 323803 addresses
[2023-12-01 11:12:45.967] 51897 mempool txs involving 324605 addresses
[2023-12-01 11:13:55.989] 52183 mempool txs involving 325474 addresses
[2023-12-01 11:15:05.989] 52451 mempool txs involving 326368 addresses
[2023-12-01 11:16:16.037] 52718 mempool txs involving 327421 addresses
[2023-12-01 11:17:26.076] 53005 mempool txs involving 328511 addresses
[2023-12-01 13:03:37.850] <AdminSrv 127.0.0.1:8000> New TCP Client.3419140 127.0.0.1:55881, 1 client total
[2023-12-01 13:03:37.959] Received 'stop' command from admin RPC, shutting down ...
[2023-12-01 13:03:37.959] Shutdown requested
[2023-12-01 13:03:37.959] Stopping Stats HTTP Servers ...
[2023-12-01 13:03:37.959] Stopping Controller ...

(we had to kill the process after 5 minutes)

The conf file:

datadir = d:\fulcrum_data
bitcoind = 127.0.0.1:8332
rpcuser = redacted
rpcpassword = redacted
tcp = 10.190.89.8:50001
peering = false
announce = false
public_tcp_port = 50001
admin = 8000
stats = 8081
db_mem = 1024

@cculianu
Copy link
Owner

cculianu commented Dec 1, 2023

We are using Fulcrum 1.9.7 (Release f27fc28)
We encountered the following issue 2 times in the past month:

Fulcrum 1.9.7 has only been out for ~1 week. There was indeed a hang bug back in version 1.9.4 or so.

I see from the log this hang happened today -- but were you for sure on 1.9.7?

@pkoutsogiannis
Copy link
Author

pkoutsogiannis commented Dec 1, 2023

The first occurrence was with 1.9.6 last month and this is why we upgraded to 1.9.7

The log is from today.

@cculianu
Copy link
Owner

cculianu commented Dec 1, 2023

Darn. Ok.. I will investigate. I added some optimizations to make mempool synch much faster but they had a bunch of bugs. I thought I squashed them all but apparently maybe not. Will investigate.

@cculianu
Copy link
Owner

cculianu commented Dec 1, 2023

In the meantime you could just go back to Fulcrum 1.9.3 I guess or.. hang in there.

@cculianu cculianu added the Requires Investigation Not clear if bug here or bug outside of Fulcrum label Dec 1, 2023
@pkoutsogiannis
Copy link
Author

We are now running fulcrum with -d so that we can catch any helpful information for you.

@cculianu
Copy link
Owner

cculianu commented Dec 1, 2023

We are now running fulcrum with -d so that we can catch any helpful information for you.

Yes, this is extremely helpful. Thank you.

@pkoutsogiannis
Copy link
Author

I forgot to mention that we are using the windows binary on windows server 2016.

Keep up the good work.

@cculianu
Copy link
Owner

cculianu commented Dec 1, 2023

I forgot to mention that we are using the windows binary on windows server 2016.

Keep up the good work.

Ahhh! That is helpful information! Thank you., I pray this is a windows-specific problem (but it may not be).

Question: Were you running Fulcrum previous to 1.9.4 (1.9.3, etc) for any extended periods and if so did you ever noticed this problem then?

@pkoutsogiannis
Copy link
Author

It started after upgrading from 1.9.3 to 1.9.6

@pkoutsogiannis
Copy link
Author

We had 1.9.3 running for an extended period indeed without this issue.

@pkoutsogiannis
Copy link
Author

We had 1.9.3 running for at least a month on the windows 2016 machine.

We also have an 1.9.7 instance running on a windows 11 machine and is still error free. We had also a 1.9.6 running there without issues as well. The only difference is the windows version and that we have fast-sync=4098 and db_max_open_files=500 set.

@cculianu
Copy link
Owner

cculianu commented Dec 1, 2023

The only difference is the windows version and that we have fast-sync=4098 and db_max_open_files=500 set.

Yeah that shouldn't matter. I am curious if the Windows 11 machine ever has problems or not. Keep me updated. I will thoroughly review the code.

FWIW I actually have a windows laptiop here (windows 10) that's been running BTC Fulcrum for a week now with no hang (and before that 1.9.6 with no hang). I will continue to monitor the situation and also look for bugs in my code.

:/

Do let me know what happens I'll investigate this further in the meantime.

@pkoutsogiannis
Copy link
Author

Note: The windows 11 machine is much faster than the windows 2016 machine, I am mentioning this just in case of some race condition.

@cculianu
Copy link
Owner

cculianu commented Dec 1, 2023

What are the specs on the slow machine? And.. is bitcoind running locally on both machines or is one connecting to the bitcoind process on the other?

@pkoutsogiannis
Copy link
Author

pkoutsogiannis commented Dec 1, 2023

There are 2 separate and unrelated machines running bitcoin and fulcrum locally on the same machine respectively.

Windows 2016:

cpu: intel xeon e5-2620 2,10GHz
memory: 64G
disk: 2TB ssd

bitcoind config:

txindex=1
server=1
listen=0
rpcbind=127.0.0.1
rpcallowip=127.0.0.1
rpcuser = redacted
rpcpassword = redacted
rpcworkqueue=1000
zmqpubhashblock=tcp://127.0.0.1:8433


Windows 11:

cpu: amd ryzen 5 5560U
memory: 16G
disk: 2TB ssd (Samsung 990 PRO NVMe M.2 SSD, 2TB, PCIe 4.0)

bitcoind config:

txindex=1
server=1
listen=0
rpcbind=127.0.0.1
rpcallowip=127.0.0.1
rpcuser = redacted
rpcpassword = redacted
rpcworkqueue=1000
zmqpubhashblock=tcp://127.0.0.1:8433

@cculianu
Copy link
Owner

cculianu commented Dec 3, 2023

You know in my experience setting the rpcworkqueue=1000 on bitcoind is asking for trouble. If bitcoind can't keep up with requests, it's best for it to error-out early. Having a queue of 1000 requests lined up, may lead to ridiculous timeouts. You are better off having bitcoind saturate its rpcworkqueue early. There is a reason why Core has this defaulting to 16... I am not sure what docs you read that recommended this be raised.. can you tell me where you read that you should raise this?

Question: Are you hitting bitcoind directly to do any processing outside of Fulcrum? For example: are you doing expensive calls to bitcoind (such as mining, scantxoutset, etc) outside of Fulcrum via bitcoind's RPC?

@pkoutsogiannis
Copy link
Author

pkoutsogiannis commented Dec 3, 2023

The rpcworkqueue was set to 1000 for no actual reason. We found that as a recommendation from someone on the team few months ago.

Both bitcoind are used solely by fulcrum only. Fulcrum on windows 2016 (the one which hang) is not even used by any client since it serves as a backup service. It just sits there idle.

@pkoutsogiannis
Copy link
Author

pkoutsogiannis commented Dec 3, 2023

Shall we change the rpcworkqueue back to 16 and restart fulcrum in debug mode again?

@cculianu
Copy link
Owner

cculianu commented Dec 3, 2023

Well I actually don't think that was the problem -- since anyway Fulcrum should have been able to exit in a timely manner. It shouldn't hang like that either way. And if you say RPC is only used by Fulcrum.. anyway Fulcrum doesn't make "expensive" calls that eat a ton of time (such as mining or scantxoutset).

Your choice .. can leave it as-is.. or set it to default just to see if "that fixed it". Up to you.

@pkoutsogiannis
Copy link
Author

Since there are no other rpc calls except fulcrum I will leave it running as it is and will update you if it hangs again with the debug log.

@cculianu
Copy link
Owner

cculianu commented Jan 7, 2024

Is no news good news? Has it been running smoothly all this time?

@pkoutsogiannis
Copy link
Author

I am monitoring it everyday and till now there was no incident.

@pkoutsogiannis
Copy link
Author

pkoutsogiannis commented Jan 19, 2024

We got bad news. Unfortunately it stopped processing mempool txs. Also, after issuing a stop command, it got stuck in joining thread log line and I had to kill the process.

[2024-01-19 05:37:21.127] (Debug) 54798 mempool txs involving 289259 addresses (exclusive lock held for 2.030 msec)
[2024-01-19 05:37:23.375] (Debug) getrawmempool: got reply with 54829 items, 0 ignored, 0 dropped, 31 new (reply took: 154.112 msec, processing took: 58.417 msec)
[2024-01-19 05:37:23.375] (Debug) Thread started
[2024-01-19 05:37:23.391] (Debug) downloaded 31 txs (failed: 0, ignored: 0), elapsed so far: 0.221 secs
[2024-01-19 05:37:23.391] (Debug) Precached 30/33 inputs in 7.265 msec, of which 1.305 msec was spent processing, thread exiting.
[2024-01-19 05:37:23.391] (Debug) 54829 mempool txs involving 289272 addresses (exclusive lock held for 1.259 msec)
[2024-01-19 05:37:25.610] (Debug) getrawmempool: got reply with 54866 items, 0 ignored, 0 dropped, 37 new (reply took: 154.447 msec, processing took: 59.890 msec)
[2024-01-19 05:37:25.610] (Debug) Thread started
[2024-01-19 05:37:25.626] (Debug) downloaded 37 txs (failed: 0, ignored: 0), elapsed so far: 0.223 secs
[2024-01-19 05:37:25.626] (Debug) Precached 5/37 inputs in 8.223 msec, of which 0.171 msec was spent processing, thread exiting.
[2024-01-19 05:37:25.626] (Debug) 54866 mempool txs involving 289295 addresses (exclusive lock held for 1.005 msec)
[2024-01-19 05:37:27.861] (Debug) getrawmempool: got reply with 54902 items, 0 ignored, 0 dropped, 36 new (reply took: 154.117 msec, processing took: 58.803 msec)
[2024-01-19 05:37:27.861] (Debug) Thread started
[2024-01-19 05:37:27.876] (Debug) downloaded 36 txs (failed: 0, ignored: 0), elapsed so far: 0.222 secs
[2024-01-19 05:37:27.876] (Debug) Precached 19/51 inputs in 11.076 msec, of which 5.321 msec was spent processing, thread exiting.
[2024-01-19 05:37:27.892] (Debug) 54902 mempool txs involving 289315 addresses
[2024-01-19 05:37:30.112] (Debug) getrawmempool: got reply with 54972 items, 0 ignored, 0 dropped, 70 new (reply took: 155.114 msec, processing took: 59.566 msec)
[2024-01-19 05:37:30.112] (Debug) Thread started
[2024-01-19 05:37:30.127] (Debug) downloaded 70 txs (failed: 0, ignored: 0), elapsed so far: 0.228 secs
[2024-01-19 05:37:30.127] (Debug) Precached 11/79 inputs in 12.933 msec, of which 3.781 msec was spent processing, thread exiting.
[2024-01-19 05:37:30.143] (Debug) 54972 mempool txs involving 289350 addresses (exclusive lock held for 0.663 msec)
[2024-01-19 05:37:32.368] (Debug) getrawmempool: got reply with 55042 items, 0 ignored, 0 dropped, 70 new (reply took: 155.610 msec, processing took: 59.486 msec)
[2024-01-19 05:37:32.368] (Debug) Thread started
[2024-01-19 05:37:32.383] (Debug) downloaded 70 txs (failed: 0, ignored: 0), elapsed so far: 0.228 secs
[2024-01-19 05:37:32.383] (Debug) Precached 16/79 inputs in 12.287 msec, of which 2.329 msec was spent processing, thread exiting.
[2024-01-19 05:37:32.399] (Debug) 55042 mempool txs involving 289382 addresses (exclusive lock held for 0.568 msec)
[2024-01-19 05:38:30.649] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 05:48:20.928] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 05:48:45.662] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 05:56:54.910] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 06:03:31.190] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 06:04:50.111] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 06:06:11.095] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 06:09:30.563] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 06:17:58.624] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 06:21:13.498] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 06:22:37.826] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 06:27:51.403] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 06:43:26.227] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 06:48:44.695] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 07:04:55.910] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 07:34:55.918] <ZMQ Notifier (hashblock)> (Debug) Idle timeout elapsed (1800.0 sec), reconnecting socket ...
[2024-01-19 07:36:16.606] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 07:42:56.948] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 07:44:59.385] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 07:58:01.772] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 08:08:30.442] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 08:22:17.314] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 08:22:55.220] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 08:25:58.985] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 08:55:59.009] <ZMQ Notifier (hashblock)> (Debug) Idle timeout elapsed (1800.0 sec), reconnecting socket ...
[2024-01-19 09:09:57.787] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 09:16:59.895] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 09:22:32.987] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 09:27:51.829] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 09:28:18.939] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 09:45:58.794] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 09:48:46.512] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 09:58:24.682] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 09:58:46.619] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 10:12:07.944] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 10:27:17.019] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 10:30:35.049] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 10:58:49.089] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 11:02:24.635] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 11:04:13.323] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 11:26:58.364] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 11:37:34.065] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 12:04:43.246] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 12:21:38.289] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 12:32:47.942] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 12:34:44.333] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 12:42:07.050] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 12:57:28.015] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 12:57:54.858] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 13:15:32.854] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 13:23:07.431] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 13:35:54.506] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 13:40:07.927] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 13:41:39.051] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 13:49:55.143] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 14:11:10.169] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 14:14:24.747] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 14:28:02.665] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 14:33:37.024] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 14:35:20.195] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 15:05:20.204] <ZMQ Notifier (hashblock)> (Debug) Idle timeout elapsed (1800.0 sec), reconnecting socket ...
[2024-01-19 15:08:05.203] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 15:30:07.745] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 16:00:07.753] <ZMQ Notifier (hashblock)> (Debug) Idle timeout elapsed (1800.0 sec), reconnecting socket ...
[2024-01-19 16:08:08.158] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 16:18:23.015] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 16:20:43.545] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 16:45:05.727] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 16:52:12.491] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 17:05:32.191] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 17:11:31.658] <ZMQ Notifier (hashblock)> (Debug) topic: "hashblock", parts: 3, bytes: 45
[2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) Got connection from: 127.0.0.1:61727
[2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) on_connected 30481655
[2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> New TCP Client.30481655 127.0.0.1:61727, 1 client total
[2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) TCP Client.30481655 (id: 30481655) 127.0.0.1:61727 socket disconnected
[2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) TCP Client.30481655 (id: 30481655) 127.0.0.1:61727 lost connection
[2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) killClient (id: 30481655)
[2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) do_disconnect (abort) 30481655
[2024-01-19 17:19:58.860] <AdminSrv 127.0.0.1:8000> (Debug) Client 30481655 destructing
[2024-01-19 17:19:58.969] Received 'stop' command from admin RPC, shutting down ...
[2024-01-19 17:19:58.969] Shutdown requested
[2024-01-19 17:19:58.969] (Debug) void App::cleanup()
[2024-01-19 17:19:58.969] Stopping Stats HTTP Servers ...
[2024-01-19 17:19:58.969] (Debug) HttpSrv 127.0.0.1:8081 thread is running, joining thread
[2024-01-19 17:19:58.969] (Debug) HttpSrv 127.0.0.1:8081 cleaned up 5 signal/slot connections
[2024-01-19 17:19:58.969] (Debug) ~AbstractTcpServer
[2024-01-19 17:19:58.969] Stopping Controller ...
[2024-01-19 17:19:58.969] (Debug) Controller thread is running, joining thread

@cculianu
Copy link
Owner

So there must be some issue at least on windows. You used the provided windows binary correct ?

I’ll have to investigate this when I get some free time.

@pkoutsogiannis
Copy link
Author

So there must be some issue at least on windows. You used the provided windows binary correct ?

Correct.

@pkoutsogiannis
Copy link
Author

pkoutsogiannis commented Jan 19, 2024

I have reverted back to 1.9.3 and I will monitor this as well.

Kudos for the excellent work.

@cculianu
Copy link
Owner

Yeah if 1.9.3 never hangs I can just undo the optimization I added for a threaded prefetcher of coins. It only shaves a few seconds off the synchmempool on large mempools (60k txns+).. but if it means there is some instability with it for whatever reason it's gone. Do let me know how 1.9.3 works out.

@pkoutsogiannis
Copy link
Author

pkoutsogiannis commented Jan 23, 2024

Fulcrum (1.9.3) hang and we had to kill the process after it did not stop after issuing a stop command. Maybe the problem is with the specific os (windows Server 2012 R2) since the other instance running on Windows 11 never hang sofar.

[2024-01-23 02:47:42.621] Block height 826926, downloading new blocks ...
[2024-01-23 02:47:43.220] Processed 1 new block with 2723 txs (6938 inputs, 3854 outputs, 6974 addresses), verified ok.
[2024-01-23 02:47:43.222] Block height 826926, up-to-date
[2024-01-23 02:47:53.249] 37937 mempool txs involving 270164 addresses
[2024-01-23 02:49:03.268] 38151 mempool txs involving 271709 addresses
[2024-01-23 02:50:13.326] 38328 mempool txs involving 272184 addresses
[2024-01-23 02:51:23.313] 38527 mempool txs involving 272953 addresses
[2024-01-23 02:52:33.317] 38696 mempool txs involving 273690 addresses
[2024-01-23 02:53:43.316] 38849 mempool txs involving 274557 addresses
[2024-01-23 02:54:03.597] Block height 826927, downloading new blocks ...
[2024-01-23 02:54:04.463] Processed 1 new block with 1088 txs (7525 inputs, 4586 outputs, 9123 addresses), verified ok.
[2024-01-23 02:54:04.465] Block height 826927, up-to-date
[2024-01-23 02:54:44.472] 38372 mempool txs involving 268470 addresses
[2024-01-23 13:18:24.078] <AdminSrv 127.0.0.1:8000> New TCP Client.2081457 127.0.0.1:61892, 1 client total
[2024-01-23 13:18:24.187] Received 'stop' command from admin RPC, shutting down ...
[2024-01-23 13:18:24.187] Shutdown requested
[2024-01-23 13:18:24.187] Stopping Stats HTTP Servers ...
[2024-01-23 13:18:24.187] Stopping Controller ...

@pkoutsogiannis
Copy link
Author

pkoutsogiannis commented Jan 23, 2024

The instance (1.9.7) running on Windows 11 that never hang is up and running since Dec 6th 2023.

@cculianu
Copy link
Owner

And just to be clear — the one that hung was 1.9.3 right? So it definitely isn’t my new mempool changes.

Ok in a way this is good news but in another way it’s bad since if Fulcrum is triggering some OS specific issues that’s incredibly hard to troubleshoot.

Good to know it’s not my recent changes though. That’s a relief!

@cculianu
Copy link
Owner

Is there any way you can install a service pack or somehow update the Windows Server 2012 box? Who knows maybe that magically fixes it?

@cculianu cculianu changed the title Fulcrum stopped processing mempool txs Fulcrum stopped processing mempool txs (Windows Server 2012) Jan 23, 2024
@pkoutsogiannis
Copy link
Author

pkoutsogiannis commented Jan 23, 2024

I have all service packs already installed on windows server 2012. I will continue monitoring the windows 11 instance though to ensure that the problem was os specific.

Keep up that the good work!

@cculianu
Copy link
Owner

Thanks man. This was a relief though to learn that it's not specific to 1.9.5+, but some other unknown issue. Oh -- there is a new 1.9.8 FYI -- the major change is it calculates fees more accurately for BTC.

I am starting to suspect the hang somehow may happen within rocksdb. One thing I could do is make a custom build of the Windows binary that uses the latest RocksDB 8.10.0 -- that's one option here (but that would require me to spend 3-4 hours mucking about the docker builder to build it, and I am not sure I have that much free time this week for that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Requires Investigation Not clear if bug here or bug outside of Fulcrum Windows
Projects
None yet
Development

No branches or pull requests

3 participants
@pkoutsogiannis @cculianu and others