-
Notifications
You must be signed in to change notification settings - Fork 35.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
contrib: add tool to convert compact-serialized UTXO set to SQLite database #27432
base: master
Are you sure you want to change the base?
contrib: add tool to convert compact-serialized UTXO set to SQLite database #27432
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
This also closes #21670 ;-) |
494be8c
to
3ce180a
Compare
What is the rationale for encoding as text rather than bytes? SQLite can store byte values as BLOBs. |
Fair question. There was already some discussion in #24952 about whether to store txids/scriptPubKeys as TEXT or BLOB, see #24952 (review), #24952 (comment) and #24952 (comment). The two main points were:
Considering the scriptPubKey column individually, there is no good reason to use TEXT rather than BLOB, but I went for TEXT mostly for consistency reasons, to not mix TEXT and BLOB in different columns when it's both binary data. That said, I'm also very open also for using BLOB instead, it's just a matter of trade-offs. |
Approach ACK. Seems like a fine idea to me.
It's a python conversion script: can't you just add a command-line option for the resulting db to have hex txids or big/little endian blobs if there's user demand for it? Hex encoding seems a fine default to me, for what it's worth. If people end up wanting lots of different options (convert scriptPubKeys to addresses? some way to update the db to a new state, rather than just create a new one?) maybe it would make sense for this script to have its own repo even; but while it stays simple/small, seems fine for contrib. |
Concept ACK, will test soon |
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK 3ce180a
Left two nits which don't need addressing unless being re-touched, but overall this works well in testing and seems like a useful contrib script. Converting the output to json also worked as described in the comments above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
73b9cab
to
a1c1cf3
Compare
Sorry for the extra-late reply, missed this message and the CI fail. Rebased on master and resolved the silent merge conflict (caused by the module move
Good idea, planning to tackle this as a follow-up. |
a1c1cf3
to
9668255
Compare
Good point, changed to draft state for now. |
9668255
to
6e34600
Compare
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
6e34600
to
254633a
Compare
Concept ACK |
|
||
def decompress_script(f): | ||
"""Equivalent of `DecompressScript()` (see compressor module).""" | ||
size = read_varint(f) # sizes 0-5 encode compressed script types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL we compress certain standard scriptPubKey
types.
254633a
to
217bc3b
Compare
Rebased on #29612, supporting the latest format with enhanced metadata (magic bytes, version, network magic, block height, block hash, coins count). |
217bc3b
to
15b0c48
Compare
15b0c48
to
0a89179
Compare
utACK 0a89179 -- no deep review, but being able to mess around with the utxo set as an sqlite dump is handy, and adding a script to contrib seems low risk. As at height 844861, the raw dump is ~12GB and the sqlite version is ~25GB, so having a bunch of free space is helpful if you want to try it out. |
Problem description
There is demand from users to get the UTXO set in form of a SQLite database (#24628). Bitcoin Core currently only supports dumping the UTXO set in a binary compact-serialized format, which was crafted specifically for AssumeUTXO snapshots (see PR #16899), with the primary goal of being as compact as possible. Previous PRs tried to extend the
dumptxoutset
RPC with new formats, either in human-readable form (e.g. #18689, #24202), or most recently, directly as SQLite database (#24952). Both are not optimal: due to the huge size of the ever-growing UTXO set with already more than 80 million entries on mainnet, human-readable formats are practically useless, and very likely one of the first steps would be to put them in some form of database anyway. Directly adding SQLite3 dumping support on the other hand introduces an additional dependency to the non-wallet part of bitcoind and the risk of increased maintenance burden (see e.g. #24952 (comment), #24628 (comment)).Proposed solution
This PR follows the "external tooling" route by adding a simple Python script for achieving the same goal in a two-step process (first create compact-serialized UTXO set via
dumptxoutset
, then convert it to SQLite via the new script). Executive summary:utxos
with the following schema:(txid TEXT, vout INT, value INT, coinbase INT, height INT, scriptpubkey TEXT)
[1] note that there are some rare cases of operating systems like FreeBSD though, where the sqlite3 module has to installed explicitly (see #26819)
A functional test is also added that creates UTXO set entries with various output script types (standard and also non-standard, for e.g. large scripts) and verifies that the UTXO sets of both formats match by comparing corresponding MuHashes. One MuHash is supplied by the bitcoind instance via
gettxoutsetinfo muhash
, the other is calculated in the test by reading back the created SQLite database entries and hashing them with the test framework'sMuHash3072
module.Manual test instructions
I'd suggest to do manual tests also by comparing MuHashes. For that, I've written a go tool some time ago which would calculate the MuHash of a sqlite database in the created format (I've tried to do a similar tool in Python, but it's painfully slow).
For a demonstration what can be done with the resulting database, see #24952 (review) for some example queries. Thanks go to LarryRuane who gave me to the idea of rewriting this script in Python and adding it to
contrib
.