Enforce incentive compatibility for all RBF replacements #26451

sdaftuar · 2022-11-04T13:50:15Z

Opening this for discussion.

Currently it's possible in our RBF rules to evict a transaction with higher mining score from our mempool, in favor of a transaction with a higher total fee but lower feerate. This patch would fix this incentive incompatibility, by requiring that any new transaction have a mining score (as defined by the minimum of its feerate and its feerate including ancestors) to be greater than the individual feerates of all transactions that would be evicted.

Because this new feerate criteria includes the ancestors of a new transaction in the score, we are able to eliminate the prohibition against including new unconfirmed parents in a replacement transaction, a slight relaxation of the prior rules.

sdaftuar · 2022-11-04T13:53:09Z

The new test included in the second commit demonstrates the problem highlighted in the OP (the test will fail on master).

instagibbs · 2022-11-04T14:06:49Z

Because this new feerate criteria includes the ancestors of a new transaction in the score, we are able to eliminate the prohibition against including new unconfirmed parents in a replacement transaction, a slight relaxation of the prior rules.

This is elimination of bip125(or doc/policy/mempool-replacements.md) rule#2, if I understand correctly.

One interesting but understandable wrinkle is that in common use-cases, adding new unconfirmed inputs(to CPFP) is likely uneconomical if one is replacing high feerate children, even if the original ancestors are low feerate. I ran into this case when thinking about V3 which has a similar rule in place: dc8b611#diff-d18bbdec91d0f4825b512a31f34666b000c7b7e2e05a3d43570c4b971532616fR158

Some basic testing of this new-unconfirmed-inputs case should make it obvious.

sdaftuar · 2022-11-04T14:11:13Z

This is elimination of bip125(or doc/policy/mempool-replacements.md) rule#2, if I understand correctly.

Yes, this PR would eliminate rule 2, and modify rule 6.

I agree with you that including more unconfirmed inputs makes it very difficult to replace high-feerate children under the policy I'm suggesting. If you look at the test I've added, the feerate needed in order to get the new transaction into the mempool is extremely high, illustrating the point that fixing incentive compatibility appears to make the pinning problem much, much worse.

However, if we are interested in incentive compatibility of our replacement rules, I believe this is the most straightforward set of rules that we could adopt which would ensure that we never evict something that has a better mining score than a new incoming transaction.

instagibbs · 2022-11-04T14:15:44Z

I'm asking for a test case of rule#2 being removed, as I don't think that's covered yet?

sdaftuar · 2022-11-04T14:24:49Z

@instagibbs Done - I added back a modified version of the prior test, to demonstrate new unconfirmed inputs are permitted.

instagibbs · 2022-11-04T14:25:09Z

concept ACK, would also need documentation in doc/policy/mempool-replacements.md of course

sdaftuar · 2022-11-04T14:32:04Z

I actually just realized that this PR does not guarantee incentive compatible replacements, either. Will push up a test case to illustrate it (maybe it'll be fun for reviewers to figure out why in the meantime).

glozow

Concept ACK to enforcing replacements are better mining candidates. IIUC this approach seems similar to #23121 (but using min(ancestor, individual) feerate) and the CheckMinerScores() check from #25038 (applied to individual replacements as well)?

petertodd · 2022-11-04T16:29:11Z

Comment: whether or not this is the right trade-off depends on the state of the mempool. At one extreme, if the mempool is entirely empty we'd be better off with the higher fee transaction regardless. If the mempool was full, we may even be better off with a higher feerate, lower fee paying, transaction.

Personally I'd lean towards the latter assumption. But it's worth discussing.

instagibbs · 2022-11-04T17:03:46Z

Based on discussions, I think the issue is that a new unconfirmed input could be included that actually would have been mined anyways, thus giving the replacement child a higher estimated miner score than it otherwise should have. Raising the "min" in the function between the ancestor and individual score.

ariard

Concept ACK

ariard · 2022-11-05T03:36:09Z

src/validation.cpp

+    // the highest possible mining score of all its conflicts.
+    // - The motivation for this check is to ensure that the replacement transaction is preferable for
+    //   block-inclusion, compared to what would be removed from the mempool.
+    if (const auto err_string{PaysMoreThanConflicts(ws.m_all_conflicting, newFeeRate, hash)}) {


I think this type of policy changes would benefit from a more binding definition of "incentive compatibility". If we speak about a world where the network mempools backlog is always superior to MAX_BLOCK_WEIGHT, optimizing for mining score based on feerate sounds the most obvious. If we add the dimension of mempools congestion variance where emptyness is a non-null outcome, I think a miner could adopt a "pure-replace-by-fee" mempool acceptance policy until reaching MAX_BLOCK_WEIGHT.

Another layer of complexity in devising "incentive compatibility" could be to realize the best feerate mempool plausible is a combination of the replacement policy and an order of events. E.g, if you evict 2sat/vb A for 4sat/vb A' though after 1 min you receive 4 sat/vb B child of A, your mempool is at loss. This scenario sounds far from unplausible to me in a shared-utxo future (e.g 2-of-2 LN channels), where spenders are building chain of transactions in blindness of every co-spender. A miner mempool optimization could be to run a cache of all the replacements.

On this subject, earlier this year on the ML: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2022-February/019921.html

Of course, we could make abstraction of order of events complexity by assuming all received transactions are following templates (e.g nversion=3) where the design assumes only the most efficient package/chain of transactions propagate. However, I'm not sure how much our network transaction-relay "transaction patterns" (current and future) are realistic of potential miner mempools running with DEFAULT_ANCESTOR_LIMIT=$HUGE and out-of-band transaction-relay. If our design give margin for non-marginal fee asymmetries, in a future where fees are significantly contributing to block reward, you should expect some miners doing the efforts of capturing them.

All thoughts for future works, I think this is a good improvement for now.

If our design give margin for non-marginal fee asymmetries

There will always be a margin whatever design we pick as you can imagine a spending pattern that doesn't fit; we can try to shrink it as best we can.

There will always be a margin whatever design we pick as you can imagine a spending pattern that doesn't fit; we can try to shrink it as best we can.

Yes I think we should try to shrink it as best we can. Note the goal of avoiding non-marginal fee asymmetries that we might allow with policy changes such as nVersion=3.

ajtowns · 2022-11-05T09:10:51Z

Would it be better to make this change as part of a rethink for package rbf?

Currently it's possible in our RBF rules to evict a transaction with higher mining score from our mempool, in favor of a transaction with a higher total fee but lower feerate.

In particular, doesn't this block some cases where you could currently replace a pair of txs [A,B]
with [C,D] (A and C conflicting; B spending A, and D spending C) where D has the highest mining score? That is, without package relay, you have to consider whether to evict both of A,B while only considering C, which will often lead you to make a suboptimal choice. With package relay, you can wait and consider C's inclusion only when you also have D available, and use that extra information to make a better decision...

Getting the incentive compatible answer here seems hard even with package relay (if you receive C on its own first, you should reject it; but if you add its txid to a hard reject filter and forget it, then when D arrives you'll reject it for having a rejected parent, rather than recalculating the fee and seeing that it is worth replacing). Incentive compatibility may also result in the mempool repeatedly swapping between A and C, eg if they were pre-signed transactions being CPFPed by a series of B1/D1/B2/D2/.. CPFP txs taking a scorched earth approach of incrementally burning the potential profits to fees, though that's presumably solved by requiring Bn/Dn to pay an incremental relay fee for the package, not just itself.

Other than that, this seems reasonable to me.

Based on discussions, I think the issue is that a new unconfirmed input could be included that actually would have been mined anyways, thus giving the replacement child a higher estimated miner score than it otherwise should have. Raising the "min" in the function between the ancestor and individual score.

Can you give a concrete example of this? As far as I can see that case only works when the new conflict C replacing A+B, spends some new unconfirmed tx Y, and "Y + X + C" has a lower fee rate than "Y + X + A + B". But we compare C's feerate with A and B's raw feerate, and only accept the replacement if that's larger; and having Y as a parent can only serve to reduce C's feerate from raw? So having mined Y and X, it seems to me that mining C is always superior to A or A+B, except in the edge case where you've run out of block space.

instagibbs · 2022-11-05T14:51:20Z

I seem unable to construct an example that's worrying.

For the sake of discussion and math simplicity, assume we allow 0-fee tx into mempool, no incremental fee increase required, and all tx are same size, and assuming we aren't optimizing for flat fees but feerate.

A: 0sat/vbyte parent of D (in mempool)
B: 0sat/vbyte parent of D' (in mempool)
C: Xsat/vbyte parent of D' (in mempool)
D: 10sat/vbyte child of A (in mempool)
D': 10 sat/vbyte child of {A,B,C} (proposed RBF tx)

Using this example and tweaking what X is, it seems the miner gets paid optimally either way, regardless of the "stolen valor" from the ancestor C feerate. If it's really high, the RBF succeeds, D is swapped out for D', fees unchanged(modulo incremental fees in reality). If it's low, it fails, and fees are unchanged.

Perhaps the multi-conflict is key to this, but I await specific counter examples with the same assumptions as above.

0xB10C · 2022-11-05T16:48:14Z

fwiw: I came across a bunch of mainnet replacements recently that would be affected by this restricting of rule 6. Let me know if numbers or examples would be helpful.

jonatack · 2022-11-05T20:14:29Z

Interesting.

Verified both added tests fail on master as expected with No exception raised and replacement-adds-unconfirmed, respectively, without the following removals:

make new tests pass on master

--- a/test/functional/feature_rbf.py
+++ b/test/functional/feature_rbf.py
@@ -311,7 +311,7 @@ class ReplaceByFeeTest(BitcoinTestFramework):
         )["hex"]
 
         # This will raise an exception due to insufficient fee
-        assert_raises_rpc_error(-26, "insufficient fee", self.nodes[0].sendrawtransaction, tx1b_hex, 0)
+        # assert_raises_rpc_error(-26, "insufficient fee", self.nodes[0].sendrawtransaction, tx1b_hex, 0)
 
         # Increasing the fee_per_output sufficiently should cause the
         # transaction to succeed. We have to exceed the individual feerates of
@@ -384,8 +384,8 @@ class ReplaceByFeeTest(BitcoinTestFramework):
             amount_per_output=1 * COIN,
         )["hex"]
 
-        tx2_txid = self.nodes[0].sendrawtransaction(tx2_hex, 0)
-        assert tx2_txid in self.nodes[0].getrawmempool()
+        # tx2_txid = self.nodes[0].sendrawtransaction(tx2_hex, 0)
+        # assert tx2_txid in self.nodes[0].getrawmempool()

Initially tweaking the tests, it appears these fee_per_output test values can be much closer; the following values pass for me:

--- a/test/functional/feature_rbf.py
+++ b/test/functional/feature_rbf.py
@@ -307,7 +307,7 @@ class ReplaceByFeeTest(BitcoinTestFramework):
             utxos_to_spend=[tx0_outpoint],
             sequence=0,
             num_outputs=100,
-            fee_per_output=20000,
+            fee_per_output=209663,
         )["hex"]
 
         # This will raise an exception due to insufficient fee
@@ -320,7 +320,7 @@ class ReplaceByFeeTest(BitcoinTestFramework):
                 utxos_to_spend=[tx0_outpoint],
                 sequence=0,
                 num_outputs=100,
-                fee_per_output=220000,
+                fee_per_output=209664,
         )["hex"]

Aside: test/functional/wallet_listtransactions.py may need to be updated, pending future changes. With the current change it raises insufficient fee, rejecting replacement both in the CI and run locally.

./test/functional/wallet_listtransactions.py

$ test/functional/wallet_listtransactions.py 
2022-11-05T19:20:28.923000Z TestFramework (INFO): Initializing test directory /var/folders/bz/mn3hr6td37bczwp7j89frs4w0000gn/T/bitcoin_func_test_6n7cj_o0
2022-11-05T19:20:30.334000Z TestFramework (INFO): Test simple send from node0 to node1
2022-11-05T19:20:31.490000Z TestFramework (INFO): Test confirmations change after mining a block
2022-11-05T19:20:32.516000Z TestFramework (INFO): Test send-to-self on node0
2022-11-05T19:20:32.653000Z TestFramework (INFO): Test sendmany from node1: twice to self, twice to node0
2022-11-05T19:20:33.956000Z TestFramework (INFO): Test 'include_watchonly' feature (legacy wallet)
2022-11-05T19:20:35.187000Z TestFramework (INFO): Test txs w/o opt-in RBF (bip125-replaceable=no)
2022-11-05T19:20:40.454000Z TestFramework (INFO): Test txs with opt-in RBF (bip125-replaceable=yes)
2022-11-05T19:20:45.618000Z TestFramework (INFO): Test tx with unknown RBF state (bip125-replaceable=unknown)
2022-11-05T19:20:45.620000Z TestFramework (ERROR): JSONRPC error
Traceback (most recent call last):
  File "/Users/jon/bitcoin/bitcoin/test/functional/test_framework/test_framework.py", line 133, in main
    self.run_test()
  File "/Users/jon/bitcoin/bitcoin/test/functional/wallet_listtransactions.py", line 109, in run_test
    self.run_rbf_opt_in_test()
  File "/Users/jon/bitcoin/bitcoin/test/functional/wallet_listtransactions.py", line 197, in run_rbf_opt_in_test
    txid_3b = self.nodes[0].sendrawtransaction(tx3_b_signed, 0)
  File "/Users/jon/bitcoin/bitcoin/test/functional/test_framework/coverage.py", line 49, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/Users/jon/bitcoin/bitcoin/test/functional/test_framework/authproxy.py", line 144, in __call__
    raise JSONRPCException(response['error'], status)
test_framework.authproxy.JSONRPCException: insufficient fee, rejecting replacement d7b92bf468ed0d0d952df107069c359849e5c8d432412bed03398fbad851e82a; new feerate 0.00409641 BTC/kvB <= old feerate 0.00909090 BTC/kvB (-26)

sdaftuar · 2022-11-05T23:57:45Z

@instagibbs I pushed up a test that I believe shows the insufficiency of the logic. Sorry if the test is a little ugly, but if you inspect you'll see that there are three initial unrelated transactions in the mempool:
txA - feerate 961 sat/vbyte
txP1 - feerate 9.6 sat/vbyte
txP2 - feerate 9615 sat/vbyte

Then we construct transaction A' that conflicts with A but spends P1 and P2. The feerate of A' by itself is 1073 sat/vbyte, but the feerate of (A', P1) is 715 sat/vbyte. So assuming blocks are full with other similarly good transactions, a miner would rather include P2 + A in a block than include P2 + P1 + A'.

DrahtBot · 2022-11-06T23:48:03Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	instagibbs, glozow, ariard

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#28813 (rpc: permit any ancestor package through submitpackage by glozow)
#28785 (validation: return more helpful results for reconsiderable fee failures and skipped transactions by glozow)
#28121 (include verbose debug messages in testmempoolaccept reject-reason by pinheadmz)
#26711 (validate package transactions with their in-package ancestor sets by glozow)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

instagibbs · 2022-11-07T12:59:09Z

@sdaftuar ok, my example actually fails this as well. I got confused and compared total fees, vs the "package feerate".

ajtowns · 2022-11-07T14:19:25Z

@instagibbs I pushed up a test that I believe shows the insufficiency of the logic. Sorry if the test is a little ugly, but if you inspect you'll see that there are three initial unrelated transactions in the mempool: txA - feerate 961 sat/vbyte txP1 - feerate 9.6 sat/vbyte txP2 - feerate 9615 sat/vbyte

Then we construct transaction A' that conflicts with A but spends P1 and P2. The feerate of A' by itself is 1073 sat/vbyte, but the feerate of (A', P1) is 715 sat/vbyte. So assuming blocks are full with other similarly good transactions, a miner would rather include P2 + A in a block than include P2 + P1 + A'.

I think with rounder numbers this looks like:

P2 1000vb, 10Msat (10,000sat/vb)
A 1000vb, 1Msat (1,000sat/vb)
B 1000vb, 1.1Msat (1,100sat/vb) (conflicts with A, spends P1 and P2)
C 1000vb, 600ksat (600sat/vb
P1 1000vb, 10ksat (10sat/vb)

With A in the mempool, your block ordering would be [P2, A, C, P1]. With B in the mempool, your block ordering would be [P2, C, P1, B]. That has the usual complexities with limited block sizes -- if we were at the last 2000vb or 3000vb in the block, we'd prefer to have A; but if we had 4000vb in the block (and no other txs), we'd prefer B.

Even if we fixed that, and only counted P1 towards B's score and thus chose to keep A, we could just go a layer down due to the possibility of another tx cpfp'ing P1: if we added a tx,

D 1000vb, 100Msat (100,000 sat/vb) (spends P1, doesn't conflict with anything)

then our block ordering would be [P1, D, P2, A, C] or [P1, D, P2, B, C] and we would always prefer to have B in our mempool, no matter how much space we had left in the block. And then if a tx E came in, that conflicts with D, but doesn't spend P1, we'd go back to preferring A rather than B.

That is to say: there are at least two "non-local" issues that affect whether we prefer A or B: how close A or B would be to the end of the block, and whether other higher-scoring transactions are already cpfp'ing in-mempool parents.

I think maybe what we're trying to achieve is something more simple, like "relay tx replacements if and only if they're obviously sensible", ie:

if the signers of a tx are willing to spend more in fees (rules 3/4) on a replacement tx
and we're sure the replacement tx is more attractive to miners than the original (rules 2/5/6)
then include it in our mempool and relay it onwards

If we can't be sure which of A/B is best because it could change depending on unrelated txs in the mempool, then we just don't do anything, though of course the signers could always contact miners directly. We'll still get this wrong at edge cases due to the block limit (a larger replacement with more fees might not fit in a block when the smaller original would, and despite having lower fees, the next best alternative might be even lower than that), but the only way to avoid that would be to require the tx always be smaller, which seems unreasonably limiting.

In light of that, rather than just doing away with rule 2, maybe examining it explicitly might be better? If you're replacing A with B, and A's in-mempool parents are X,Y and B's are X,Z; then maybe you want to compare the minimum score of (B, B+Z.ancestor_score) compared to the minimum score of (A, A+Y.ancestor_score)? For multiple parents, I think you'd want to sort them by their ancestors scores (smallest to largest), and continue until your accumulated score is lower than next parent's ancestor score. That will have some false negatives still, but I think it would avoid false positives (allowing replacements that are less appealing to miners) and might not be too complicated?

instagibbs · 2022-11-07T15:18:51Z

I think maybe what we're trying to achieve is something more simple, like "relay tx replacements if and only if they're obviously sensible

Agree on goals. Future work is needed to make fee bumping reliable anyways to remove this out-of-band bump risk.

I think removing shared ancestors from the ancestor calculation is "obviously" right. The symmetric comparison seems easier to reason about, in that we can switch the sides around from an ordering perspective, and you get identical results, i.e. not seeming to introduce pinning vector. If deemed "incentive compatible", it seems tighter than the CheckMinerScores check in https://github.com/bitcoin/bitcoin/pull/25038/files#diff-fa5cb2d84034ff72cdb9d479b17cf8c744a9bf3fc932b3a77c1a017edd767dfaR185 which suffers from an order-dependent pinning vector when adding new unconfirmed inputs.

sdaftuar · 2022-11-07T20:18:11Z

I think maybe what we're trying to achieve is something more simple, like "relay tx replacements if and only if they're obviously sensible

Yes this is how I'm thinking about it as well.

In light of that, rather than just doing away with rule 2, maybe examining it explicitly might be better? If you're replacing A with B, and A's in-mempool parents are X,Y and B's are X,Z; then maybe you want to compare the minimum score of (B, B+Z.ancestor_score) compared to the minimum score of (A, A+Y.ancestor_score)?

@ajtowns I don't think this works, because if Y has other, higher feerate children besides A, then (A+Y).ancestor_score can understate the miner's incentive to include (Y, A) in a block -- Y might already be scheduled for inclusion due to some unrelated descendant. An easy heuristic to avoid that issue is to just look at A's individual feerate (but that will risk overstating A's desirability -- which is ok if our goal is to merely ensure that any replacements that do happen are sensible, and not to ensure that all replacements that "should" happen are able to take place, but this would make pinning much worse of a problem).

Also, I think it's beneficial to avoid looking at all the ancestors of the transactions being removed, because that could be a large number of ancestors to traverse.

For multiple parents, I think you'd want to sort them by their ancestors scores (smallest to largest), and continue until your accumulated score is lower than next parent's ancestor score. That will have some false negatives still, but I think it would avoid false positives (allowing replacements that are less appealing to miners) and might not be too complicated?

I think this might work (when applied to the replacement transaction only) -- will try this approach. I think the easiest way to do this would be to ignore that the topology of the ancestors may imply that this is more of a "worst case" analysis than is always warranted, but if we're only worried about avoiding incentive incompatible replacements, then I think this might suffice.

ajtowns · 2022-11-08T03:20:30Z

I don't think this works, because ...

Ah, yeah, I wondered why it wasn't comparing max(..) < min(..) when I was writing it.

The symmetric comparison seems easier to reason about, in that we can switch the sides around from an ordering perspective, and you get identical results, i.e. not seeming to introduce pinning vector.

If the goal is just "relay tx replacements if and only if they're obviously sensible", I think preventing pinning vectors is out of scope: all you need to do to create a pinning vector is have your transactions be "non-obvious". To prevent them, I think you need a different goal, eg "relay tx replacements in all circumstances, picking a consistent winner even when it's not obvious that's necessarily sensible". That is, there's a big difference between designing something that just tries to maximise the "obvious" cases and one that prevents attackers from being able to find and take advantage of a "non-obvious" case.

instagibbs · 2022-11-08T14:32:59Z

@ajtowns it's a two part optimization, ideally.

Of course in this circumstance, we have outright prohibition of new unconfirmed inputs today, so it's no worse at least. I am trying to decide if opening this door actually helps any real usage patterns.

ajtowns · 2022-11-08T14:47:53Z

@ajtowns it's a two part optimization, ideally.

Of course in this circumstance, we have outright prohibition of new unconfirmed inputs today, so it's no worse at least. I am trying to decide if opening this door actually helps any real usage patterns.

You were CPFPing one thing, and now you want to CPFP two things?

instagibbs · 2022-11-10T18:44:21Z

requiring that any new transaction have a mining score (as defined by the minimum of its feerate and its feerate including ancestors) to be greater than the individual feerates of all transactions that would be evicted.

FWIW this part doesn't seem to make anything particularly worse for pinning, just the normal issues we have today. Would be curious if @0xB10C has some data on use cases impacted.

sdaftuar · 2022-11-11T16:25:14Z

requiring that any new transaction have a mining score (as defined by the minimum of its feerate and its feerate including ancestors) to be greater than the individual feerates of all transactions that would be evicted.

FWIW this part doesn't seem to make anything particularly worse for pinning, just the normal issues we have today. Would be curious if @0xB10C has some data on use cases impacted.

I would expect that the heuristic I propose here would certainly make pinning worse if people wanted to take advantage of this. For instance, imagine two conflicting transactions A, A'. A is in the mempool and has two children, B and C.

Suppose B is large -- 49kvb, and has very low feerate (could be 0 feerate, once package relay is in place). C is relatively small, say 100vb[1], but pays a very large feerate, say 49.1k sats. Then under the policy I've proposed here, someone wanting to get A' to replace A would not only need to pay more than 49.1k sats more than whatever fee A has on it, the feerate would also need to be at least 491sat/vbyte, even though B+C together only pay 1 sat/vbyte.

Under the old rules, the person creating transaction A' may have had some incentive to just make their transaction bigger, so that the increased total fee that needs to be paid can be put to use by at least consolidating UTXOs, which otherwise would cost something on their own (this would be reasonable to do if the feerate on the replacement transaction wouldn't need to be very large compared with the direct conflicts). But with the new rule, doing that would be disastrous as a bigger transaction would also be at a much higher feerate than before.

[1] I can't remember right now what the smallest reasonable transaction size I should include here, so replace this number with whatever is achievable (I assume 100 is pretty close to the size of a 1-in, 1-out segwit transaction nowadays).

instagibbs · 2022-11-11T16:46:53Z

Under the old rules, the person creating transaction A' may have had some incentive to just make their transaction bigger

This would require the wallet to know the specific pin to overcome it in a specific way, I think?

0xB10C · 2022-11-14T15:28:53Z

I've searched through my RBF data specifically for replacements with a negative feerate deltas. I've looked at data from 2022-02-11 to 2022-10-31 captured with a Bitcoin Core node with a patched ZMQ interface running with default parameters for mempool and policy. During this period my node has seen 1.65M transactions being replaced in 363k RBF-sequences (multiple replacements related by conflicting inputs). Out of these RBF-sequences, only 223 included a replacement where the replaced-transactions feerate was higher than the replacement-transactions feerate (i.e. negative feerate delta).

Initially, I thought there might be a single entity with a clear pattern doing these negative feerate delta replacements on purpose. However, by categorizing by transaction versions and transaction locktime (set or unset) showed various combinations of transactions being replaced. Additionally, the transactions are a mix of simple payment transactions (few inputs, few outputs), consolidation transactions (high input to output ratio), and payout transactions (high output to input ratio). I don't think they come from the same source given the different wallet fingerprints. My thesis is, that most of these are probably cases where unconfirmed outputs are spent by a different party and the original sender (maybe unaware of the child transactions) replaces and "cancels" them.

Given that there's no real pattern, I've picked out some examples.

Example 1: This graph shows the RBF-sequence of a transaction confirming as 03a86cf6..(tx2 here). First, tx0 (version=2, inputs=4, outputs=2, likely a payment, locktime set) was broadcast with a feerate of 2.92 sat/vByte. About 25 min later tx1 (version=1, inputs/outputs=2, likely also a payment, locktime unset) was broadcast at 15.9 sat/vByte spending an unconfirmed UTXO of tx0. Half an hour later, tx2 (version=2, inputs=4, outputs=2, likely a payment, locktime set) was broadcast at 12.37 sat/vByte which replaced both tx0 and tx1. The inputs and output scripts of tx0 and tx2 are equal, the only thing that changed was the fee: this is a simple fee bump. However, the outputs of tx1 are not contained in tx2. This transaction is effectively removed from the network before it could confirm (i.e cancelled). The feerate delta between tx1 and tx2 is negative. This is possible because we don't check the feerate of children in our RBF rule 6 checks. With the proposed change in this PR, tx2 would have had to pay more than tx1's feerate of 15.9 sat/vByte.

Example 2: This example shows a fee bump from 1 sat/vByte to 5 sat/vByte of a consolidation (tx0 and tx3; circle size grows with tx vsize). In between, tx1 as child of tx0 and tx2 as child of tx1 were broadcast with a higher feerate. Here, the fingerprints of tx1 and tx2 are the same as tx0 and tx2 so the sender of these might be the same (also you usually consolidate to your own wallet). We can't tell if tx1 and tx2 were canceled on purpose (double spent attempt?) or if that wasn't what the party expected to do by replacing tx0. Also, interesting to note is that tx1 has a locktime lower than tx0. This could mean that this transaction came from a Bitcoin Core or an Electrum wallet and it just happened to hit the 10% chance of setting the locktime to currentHeight - rand(1, 100) (Bitcoin Core, Electrum). A few potential learnings (~~will create issues for this in Bitcoin Core and Electrum~~):

~~Warn users if doing a fee-bump would cancel a child transaction.~~ Bitcoin Core bumpfee RPC docs mention "The command will fail if the wallet or mempool contains a transaction that spends one of T's outputs.".
Don't set the locktime to currentHeight - rand(1, 100) when replacing a transaction. (wallet: don't back-date locktime when replacing/fee bumping #26526, Don't back-date locktime during RBF and when spending unconfirmed UTXOs spesmilo/electrum#8073)
Don't set the locktime to currentHeight - rand(1, 100) when we are spending from unconfirmed transactions with a locktime set. The "could be a pre-signed transaction" argument doesn't work here as the parent has a higher locktime than the child indicating that the child was constructed before the parent. (wallet: don't back-date locktime when spending unconfirmed UTXOs #26527, Don't back-date locktime during RBF and when spending unconfirmed UTXOs spesmilo/electrum#8073)

Example 3: This is an example where tx1 could have somewhat cheaply pinned the consolidation transaction tx0. With the proposed change, the fee-bump tx2 would have needed to pay more than 225 sat/vByte. However, as is a consolidation and the transactions share a fingerprint, these transactions might be from the same party. Not necessarily a malicious pin, but maybe one that would have happened on accident.

A few complexer examples with less commentary:

Example 4: Consolidation and payout transaction where (probably) multiple parties already spent their unconfirmed UTXOs before tx0 was replaced. tx6

Example 5: tx4

Example 6: tx8

I don't have a strong opinion on this change yet. I think it's a trade-off between (accidental and malicious) pinning and incentive compatibility. Doesn't seems like implementing this change would impact too many current RBF sequences (223 out of 363k) based on the data from this year. Happy to take another look to answer follow-up questions.

glozow · 2022-12-02T17:30:18Z

In my view the goals for adding a "miner score" rule are: Primarily, prevent using RBF to "knock down" a transaction's mining priority, either intentionally by an attacker or unintentionally by a wallet not accurately considering unconfirmed inputs' fees. Most notably, currently attackers can delay an ACP tx's confirmation by replacing it with something that spends from a large, low feerate ancestor. Also, it allows removing the "no new unconfirmed inputs" rule with more confidence it won't cause problems, thus allowing users to use new unconfirmed inputs to fund their RBFs.

I'll note that, in the V3 world, min(ancestor feerate, feerate) == the miner score, since there can't be anything else bumping anybody. If we have a V3 replacing a V3, this should work fine. In the non-V3 world, min(ancestor feerate, feerate) can underestimate. Maybe applying this check when all directly-conflicting transactions are V3 makes sense? That achieves the first goal since the replacement (even if non-V3) must have a higher miner score. This would save ACP users if they switch to V3?

Any underestimation of to-be-replaced mempool transactions' miner scores is a pinning attack, so I'm not completely convinced it's a good idea to use this particular heuristic as a "miner score" check on every RBF. For an accurate "miner score," I think the best way is to run the block template algorithm. I implemented this in #26152, which runs the algo on a transaction cluster rather than the whole mempool, but I'm still not sure if the computational complexity is appropriate for use in mempool validation.

Previously, it was possible for a transaction to have a lower feerate than a transaction being evicted from the mempool, because we only were comparing an incoming transaction's feerate against the feerates of the transactions it directly conflicts with (and not any children). This could result in transactions that would be mined in the next block from being evicted from the mempool in favor of transactions which may not be mined for a long time. Eliminate this behavior to ensure all replacements are incentive compatible. At the same time, we also eliminate the restriction against including new unconfirmed inputs in a replacement transaction, because we now consider the ancestor feerate of the new transaction and compare that with the feerates of the transactions being evicted.

…cement transactions.

sdaftuar · 2022-12-07T22:34:41Z

I updated this PR with @ajtowns' suggestion:

For multiple parents, I think you'd want to sort them by their ancestors scores (smallest to largest), and continue until your accumulated score is lower than next parent's ancestor score.

I think this is very conservative, but I believe now it should be impossible for non-incentive-compatible replacements to take place.

In the non-V3 world, min(ancestor feerate, feerate) can underestimate.

This can both underestimate and overestimate, essentially for the reason that the earlier draft I had of this PR was broken. If a transaction has an ancestor that, say, depends on nothing else and has a very high feerate, then the minimum of ancestor feerate and feerate might just be the feerate of the transaction itself. However, the parent transaction with a higher feerate can be selected first, and then the true mining score of the transaction will be the minimum of its feerate and that with some other parents, which might be lower than the transactions own feerate. So this shows that we might overestimate a transaction's mining score with this calculation.

Any underestimation of to-be-replaced mempool transactions' miner scores is a pinning attack, so I'm not completely convinced it's a good idea to use this particular heuristic as a "miner score" check on every RBF. For an accurate "miner score," I think the best way is to run the block template algorithm.

I'm not sure I understand the first part of this sentence: I think that underestimating the replacement transaction's score might create a pinning attack, and overestimating the score of the to-be-replaced transactions can also create a pinning attack. And I think this PR in its current form does both.

I am also not convinced what I've done here is a good idea, and I think comparing this approach to some kind of transaction selection algorithm that operates on a relevant transaction subgraph is worth thinking about. Some issues to consider:

How important is it that the RBF replacement rules be understandable to wallets? I think it's fair to say that most wallets aren't able to inspect the mempool anyway, so the BIP 125 rules already aren't something most wallets use to calculate needed fees. Instead I think the approach I've heard from the lightning community, of firing off transactions and bumping them if they haven't confirmed in a while, seems much more doable. And in that world, having an opaque calculator that governs when a transaction's feerate is high enough to warrant replacement may be good enough?
How important is it that we accept replacement transactions as soon as they are "better" to mine, rather than be conservative and impose higher fee requirements (as this PR would do)? In a world where v3 transactions can opt out of pinning, is it ok for non-v3 transactions to face worse pinning behaviors? (In that vein, does anyone even worry about pinning anymore now that v3 is being proposed to solve this problem?)
On the flip side, is there a risk that if we make pinning worse for non-v3 transactions, that we'll drive use towards v3 transactions, and some of those use cases might not be incentive compatible when applied to the rules that govern v3?

instagibbs · 2022-12-08T15:31:19Z

How important is it that the RBF replacement rules be understandable to wallets? ... And in that world, having an opaque calculator that governs when a transaction's feerate is high enough to warrant replacement may be good enough?

That's how I understand it. Idealized functionality is that there is a "going feerate" and that the proposed transaction package should be able to get in at that rate, or close to it. Wallets don't know the going rate since it's actually a stochastic variable, so as long as we're approximating this behavior it is likely fine.

(In that vein, does anyone even worry about pinning anymore now that v3 is being proposed to solve this problem?)

Long term, probably not much, but the transition period of currently deployed systems will likely be years even if things were merged today.

On the flip side, is there a risk that if we make pinning worse for non-v3 transactions, that we'll drive use towards v3 transactions, and some of those use cases might not be incentive compatible when applied to the rules that govern v3?

The risk here is that we make V2 txn topology support so bad that people hand-roll their own policies or directly connect to miners; I think focusing on V3 in that context is nearly orthogonal. i.e. if we merged a V2 tightening that was overly broad, and did not move forward with an alternative policy, this risk already exists.

From a sheer numeric perspective, it looks like this PR's change(at least before latest modification) doesn't effect much: #26451 (comment) Is that a strike for the change or against it, I don't know!

glozow · 2022-12-12T15:37:44Z

Any underestimation of to-be-replaced mempool transactions' miner scores is a pinning attack, so I'm not completely convinced it's a good idea to use this particular heuristic as a "miner score" check on every RBF. For an accurate "miner score," I think the best way is to run the block template algorithm.

I'm not sure I understand the first part of this sentence: I think that underestimating the replacement transaction's score might create a pinning attack, and overestimating the score of the to-be-replaced transactions can also create a pinning attack. And I think this PR in its current form does both.

I am thinking about two types of "pinning" (1) the original tx is made very hard to replace and (2) a replacement can knock the fee down instead of up. To clarify, here, I meant (2): perhaps we underestimate the to-be-replaced transaction's score, thinking it is lower than that of the replacement tx when it isn't. Maybe "pinning" isn't the right word for it, but I think it's definitely a scenario we want to prevent. It's attackable, and I don't think a user ever wants to pay more fees but make their tx take longer to confirm (?).

On the flip side, is there a risk that if we make pinning worse for non-v3 transactions, that we'll drive use towards v3 transactions, and some of those use cases might not be incentive compatible when applied to the rules that govern v3?

The risk here is that we make V2 txn topology support so bad that people hand-roll their own policies or directly connect to miners; I think focusing on V3 in that context is nearly orthogonal. i.e. if we merged a V2 tightening that was overly broad, and did not move forward with an alternative policy, this risk already exists.

I agree with a philosophy of attempting to design v3 to be "unpinnable" and also improve the heuristics that make v2 break for some of the topologies it permits. AFAIU there are a number of use cases that wouldn't be compatible with v3.

instagibbs · 2022-12-12T15:42:30Z

To clarify, here, I meant (2): perhaps we underestimate the to-be-replaced transaction's score, thinking it is lower than that of the replacement tx when it isn't. Maybe "pinning" isn't the right word for it, but I think it's definitely a scenario we want to prevent. It's attackable, and I don't think a user ever wants to pay more fees but make their tx take longer to confirm (?).

I think it depends on how much more the user/wallet would have to pay to re-replace that replacement. It definitely could result in what I consider a pin if there is asymmetry.

DrahtBot · 2023-11-08T11:15:14Z

🐙 This pull request conflicts with the target branch and needs rebase.

DrahtBot · 2024-02-06T01:16:00Z

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

sdaftuar · 2024-02-11T12:19:56Z

This work has obviously been superseded by the cluster mempool proposal, which solves these problems in a much better way than anything contemplated here.

fanquake added the Brainstorming label Nov 4, 2022

glozow added the TX fees and policy label Nov 4, 2022

glozow reviewed Nov 4, 2022

View reviewed changes

instagibbs mentioned this pull request Nov 4, 2022

policy: nVersion=3 and Package RBF #25038

Closed

ariard reviewed Nov 5, 2022

View reviewed changes

sdaftuar mentioned this pull request Nov 5, 2022

Remove mempoolfullrbf option #26438

Closed

DrahtBot mentioned this pull request Nov 7, 2022

policy: Ephemeral anchors #26403

Closed

DrahtBot mentioned this pull request Nov 7, 2022

[mempool] allow tx replacement by smaller witness #24007

Closed

This was referenced Nov 17, 2022

wallet: don't back-date locktime when replacing/fee bumping #26526

Open

wallet: don't back-date locktime when spending unconfirmed UTXOs #26527

Open

sdaftuar added 4 commits December 7, 2022 15:16

Add test to cover new rbf feerate requirement

933d7d9

Add a test verifying new unconfirmed inputs are permitted

4e7bfb3

Test that we use "worst-case" ancestor feerates when evaluating repla…

41b0a23

…cement transactions.

sdaftuar force-pushed the 2022-11-fixrbf branch from f635db6 to 41b0a23 Compare December 7, 2022 20:18

DrahtBot mentioned this pull request Dec 8, 2022

Remove -mempoolfullrbf option #26525

Closed

sdaftuar mentioned this pull request May 16, 2023

Proposal for a new mempool design #27677

Open

DrahtBot mentioned this pull request May 24, 2023

[NO MERGE] BIP331 Ancestor Package Relay #27742

Draft

This was referenced Jul 20, 2023

validate package transactions with their in-package ancestor sets #26711

Closed

include verbose "debug-message" field in testmempoolaccept response #28121

Open

DrahtBot mentioned this pull request Oct 18, 2023

[WIP] Cluster mempool implementation #28676

Draft

8 tasks

This was referenced Nov 3, 2023

validation: return more helpful results for reconsiderable fee failures and skipped transactions #28785

Merged

rpc: permit any ancestor package through submitpackage #28813

Closed

DrahtBot added the Needs rebase label Nov 8, 2023

petertodd mentioned this pull request Jan 23, 2024

RBF: Require unconfirmed inputs to come from a single conflicting transaction #29297

Closed

sdaftuar closed this Feb 11, 2024

Enforce incentive compatibility for all RBF replacements #26451

Enforce incentive compatibility for all RBF replacements #26451

Conversation

sdaftuar commented Nov 4, 2022

sdaftuar commented Nov 4, 2022

instagibbs commented Nov 4, 2022

sdaftuar commented Nov 4, 2022

instagibbs commented Nov 4, 2022

sdaftuar commented Nov 4, 2022

instagibbs commented Nov 4, 2022

sdaftuar commented Nov 4, 2022

glozow left a comment

Choose a reason for hiding this comment

petertodd commented Nov 4, 2022

instagibbs commented Nov 4, 2022

ariard left a comment

Choose a reason for hiding this comment

ariard Nov 5, 2022

Choose a reason for hiding this comment

instagibbs Nov 5, 2022

Choose a reason for hiding this comment

ariard Nov 7, 2022

Choose a reason for hiding this comment

ajtowns commented Nov 5, 2022

instagibbs commented Nov 5, 2022 • edited

0xB10C commented Nov 5, 2022 • edited

jonatack commented Nov 5, 2022

sdaftuar commented Nov 5, 2022

DrahtBot commented Nov 6, 2022 • edited

Reviews

Conflicts

instagibbs commented Nov 7, 2022

ajtowns commented Nov 7, 2022

instagibbs commented Nov 7, 2022

sdaftuar commented Nov 7, 2022

ajtowns commented Nov 8, 2022

instagibbs commented Nov 8, 2022

ajtowns commented Nov 8, 2022

instagibbs commented Nov 10, 2022

sdaftuar commented Nov 11, 2022 • edited

instagibbs commented Nov 11, 2022

0xB10C commented Nov 14, 2022 • edited

glozow commented Dec 2, 2022

sdaftuar commented Dec 7, 2022

instagibbs commented Dec 8, 2022 • edited

glozow commented Dec 12, 2022

instagibbs commented Dec 12, 2022

DrahtBot commented Nov 8, 2023

DrahtBot commented Feb 6, 2024

sdaftuar commented Feb 11, 2024

instagibbs commented Nov 5, 2022 •

edited

0xB10C commented Nov 5, 2022 •

edited

DrahtBot commented Nov 6, 2022 •

edited

sdaftuar commented Nov 11, 2022 •

edited

0xB10C commented Nov 14, 2022 •

edited

instagibbs commented Dec 8, 2022 •

edited