-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fulcrum stopped processing mempool txs (Windows Server 2012) #217
Comments
Fulcrum 1.9.7 has only been out for ~1 week. There was indeed a hang bug back in version 1.9.4 or so. I see from the log this hang happened today -- but were you for sure on 1.9.7? |
The first occurrence was with 1.9.6 last month and this is why we upgraded to 1.9.7 The log is from today. |
Darn. Ok.. I will investigate. I added some optimizations to make mempool synch much faster but they had a bunch of bugs. I thought I squashed them all but apparently maybe not. Will investigate. |
In the meantime you could just go back to Fulcrum 1.9.3 I guess or.. hang in there. |
We are now running fulcrum with -d so that we can catch any helpful information for you. |
Yes, this is extremely helpful. Thank you. |
I forgot to mention that we are using the windows binary on windows server 2016. Keep up the good work. |
Ahhh! That is helpful information! Thank you., I pray this is a windows-specific problem (but it may not be). Question: Were you running Fulcrum previous to 1.9.4 (1.9.3, etc) for any extended periods and if so did you ever noticed this problem then? |
It started after upgrading from 1.9.3 to 1.9.6 |
We had 1.9.3 running for an extended period indeed without this issue. |
We had 1.9.3 running for at least a month on the windows 2016 machine. We also have an 1.9.7 instance running on a windows 11 machine and is still error free. We had also a 1.9.6 running there without issues as well. The only difference is the windows version and that we have fast-sync=4098 and db_max_open_files=500 set. |
Yeah that shouldn't matter. I am curious if the Windows 11 machine ever has problems or not. Keep me updated. I will thoroughly review the code. FWIW I actually have a windows laptiop here (windows 10) that's been running BTC Fulcrum for a week now with no hang (and before that 1.9.6 with no hang). I will continue to monitor the situation and also look for bugs in my code. :/ Do let me know what happens I'll investigate this further in the meantime. |
Note: The windows 11 machine is much faster than the windows 2016 machine, I am mentioning this just in case of some race condition. |
What are the specs on the slow machine? And.. is bitcoind running locally on both machines or is one connecting to the bitcoind process on the other? |
There are 2 separate and unrelated machines running bitcoin and fulcrum locally on the same machine respectively. Windows 2016: cpu: intel xeon e5-2620 2,10GHz bitcoind config: txindex=1 Windows 11: cpu: amd ryzen 5 5560U bitcoind config: txindex=1 |
You know in my experience setting the Question: Are you hitting bitcoind directly to do any processing outside of Fulcrum? For example: are you doing expensive calls to bitcoind (such as mining, |
The rpcworkqueue was set to 1000 for no actual reason. We found that as a recommendation from someone on the team few months ago. Both bitcoind are used solely by fulcrum only. Fulcrum on windows 2016 (the one which hang) is not even used by any client since it serves as a backup service. It just sits there idle. |
Shall we change the rpcworkqueue back to 16 and restart fulcrum in debug mode again? |
Well I actually don't think that was the problem -- since anyway Fulcrum should have been able to exit in a timely manner. It shouldn't hang like that either way. And if you say RPC is only used by Fulcrum.. anyway Fulcrum doesn't make "expensive" calls that eat a ton of time (such as mining or scantxoutset). Your choice .. can leave it as-is.. or set it to default just to see if "that fixed it". Up to you. |
Since there are no other rpc calls except fulcrum I will leave it running as it is and will update you if it hangs again with the debug log. |
Is no news good news? Has it been running smoothly all this time? |
I am monitoring it everyday and till now there was no incident. |
We got bad news. Unfortunately it stopped processing mempool txs. Also, after issuing a stop command, it got stuck in joining thread log line and I had to kill the process. [2024-01-19 05:37:21.127] (Debug) 54798 mempool txs involving 289259 addresses (exclusive lock held for 2.030 msec) |
So there must be some issue at least on windows. You used the provided windows binary correct ? I’ll have to investigate this when I get some free time. |
Correct. |
I have reverted back to 1.9.3 and I will monitor this as well. Kudos for the excellent work. |
Yeah if 1.9.3 never hangs I can just undo the optimization I added for a threaded prefetcher of coins. It only shaves a few seconds off the synchmempool on large mempools (60k txns+).. but if it means there is some instability with it for whatever reason it's gone. Do let me know how 1.9.3 works out. |
Fulcrum (1.9.3) hang and we had to kill the process after it did not stop after issuing a stop command. Maybe the problem is with the specific os (windows Server 2012 R2) since the other instance running on Windows 11 never hang sofar. [2024-01-23 02:47:42.621] Block height 826926, downloading new blocks ... |
The instance (1.9.7) running on Windows 11 that never hang is up and running since Dec 6th 2023. |
And just to be clear — the one that hung was 1.9.3 right? So it definitely isn’t my new mempool changes. Ok in a way this is good news but in another way it’s bad since if Fulcrum is triggering some OS specific issues that’s incredibly hard to troubleshoot. Good to know it’s not my recent changes though. That’s a relief! |
Is there any way you can install a service pack or somehow update the Windows Server 2012 box? Who knows maybe that magically fixes it? |
I have all service packs already installed on windows server 2012. I will continue monitoring the windows 11 instance though to ensure that the problem was os specific. Keep up that the good work! |
Thanks man. This was a relief though to learn that it's not specific to 1.9.5+, but some other unknown issue. Oh -- there is a new 1.9.8 FYI -- the major change is it calculates fees more accurately for BTC. I am starting to suspect the hang somehow may happen within rocksdb. One thing I could do is make a custom build of the Windows binary that uses the latest RocksDB 8.10.0 -- that's one option here (but that would require me to spend 3-4 hours mucking about the docker builder to build it, and I am not sure I have that much free time this week for that). |
We are using Fulcrum 1.9.7 (Release f27fc28)
We encountered the following issue 2 times in the past month:
Fullcrum stopped processing mempool txs without any log entry. We issued a stop command but fulcrum hang and we had to kill the process and restart it.
[2023-12-01 11:11:35.940] 51632 mempool txs involving 323803 addresses
[2023-12-01 11:12:45.967] 51897 mempool txs involving 324605 addresses
[2023-12-01 11:13:55.989] 52183 mempool txs involving 325474 addresses
[2023-12-01 11:15:05.989] 52451 mempool txs involving 326368 addresses
[2023-12-01 11:16:16.037] 52718 mempool txs involving 327421 addresses
[2023-12-01 11:17:26.076] 53005 mempool txs involving 328511 addresses
[2023-12-01 13:03:37.850] <AdminSrv 127.0.0.1:8000> New TCP Client.3419140 127.0.0.1:55881, 1 client total
[2023-12-01 13:03:37.959] Received 'stop' command from admin RPC, shutting down ...
[2023-12-01 13:03:37.959] Shutdown requested
[2023-12-01 13:03:37.959] Stopping Stats HTTP Servers ...
[2023-12-01 13:03:37.959] Stopping Controller ...
(we had to kill the process after 5 minutes)
The conf file:
datadir = d:\fulcrum_data
bitcoind = 127.0.0.1:8332
rpcuser = redacted
rpcpassword = redacted
tcp = 10.190.89.8:50001
peering = false
announce = false
public_tcp_port = 50001
admin = 8000
stats = 8081
db_mem = 1024
The text was updated successfully, but these errors were encountered: