Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a backwards-compatible proposal for future-proofing the type-bits #19

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Steve132
Copy link
Contributor

The current CashAddr specification includes a version byte that has a 4-bit field to represent the 'type' of the address. Although 16 possible address types seems like a lot considering we currently only have two, it could become necessary to extend the CashAddr format quickly if we ever start to have more than that number. Considering that BCH is re-enabling future opcodes, we might need more than 15 address types sooner rather than later. In that hypothetical, it would be better to have extended it now, while there are fewer implementers, because the later we extend it, the more CashAddr implementations will exist in the ecosystem and extending it will be harder.

Any such extension would need to also be backwards-compatible with the current behavior for all currently supported version bits, in order to not break existing code. Furthermore, it would be better to re-use existing methods that are tested in existing node/wallet code. This proposal means that wallets won't be pressured to upgrade existing code until we start to get close to 16 types.

This change is a pull request in the spec implementing a possible way to extend the 'type' space but achieve both goals (backwards compatibility, simplicity of implementation, doesn't break existing wallets). It changes the version 'byte' into a version "VarInt" which incidentally parses exactly the same for existing types 0-15, while allowing an essentially infinite number of types in the future. It also uses code that is likely to already exist and be well tested in any environment where CashAddr is implemented.

It is referenced by bitcoincashorg/spec#49, but is not solely useful in that context.

If necessary, I can also write test-vectors and a few implementations.

The current CashAddr specification includes a version byte that has a 4-bit field to represent the 'type' of the address.   Although 16 possible address types seems like a lot considering we currently only have two, it could become necessary to extend the CashAddr format quickly if we ever start to have more than that number.    Considering that BCH is re-enabling future opcodes, we might need more than 15 address types sooner rather than later.  In that hypothetical, it would be better to have extended it now, while there are fewer implementers, because the later we extend it, the more CashAddr implementations will exist in the ecosystem and extending it will be harder.   

Any such extension would need to also be backwards-compatible with the current behavior for all currently supported version bits, in order to not break existing code.  Furthermore, it would be better to re-use existing methods that are tested in existing node/wallet code.  This proposal means that wallets won't be pressured to upgrade existing code until we start to get close to 16 types.

This change is a pull request in the spec implementing a possible way to extend the 'type' space but achieve both goals (backwards compatibility, simplicity of implementation, doesn't break existing wallets).  It changes the version 'byte' into a version "VarInt" which incidentally parses exactly the same for existing types 0-15, while allowing an essentially infinite number of types in the future.  It also uses code that is likely to already exist and be well tested in any environment where CashAddr is implemented.  

It is referenced by bitcoincashorg/spec#49, but is not solely useful in that context.

If necessary, I can also write test-vectors and a few implementations.
@Steve132
Copy link
Contributor Author

I implemented this proposal as a pull request, which I recognize might not be the proper channels. I wanted to start a conversation about it and give an example as a changelist of how it could be specified. It's open to edits and discussions here obviously.

2. A hash.
3. A 40 bits checksum.

#### Version byte
#### Version VarInt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is implied by the reservation of the leftmost bit being 0, but it may be good to make this explicit. @deadalnix

Also we should update the version number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply saying that the leftmost bit is reserved is not the same thing as implying that it is to be interpreted as a 0xFD 0xFE 0xFF Bitcoin VarInt. :) I don't know of any implementations that do that If that's the case, we should make it explicit now (like I've done in the pull request).

Like actually say that if the first byte is 0xFD then there are two more bytes of version, 0xFE for 4 bytes, 0xFF for 8 bytes, like in the rest of the Bitcoin code, would be nice.

Copy link
Contributor

@avl42 avl42 May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly confused... are we turning the whole version byte into a VarInt, or just the type part of it? If it's about the full version byte, then we'd also need to define how many of the LSBits will then be the payload-size part for each of the VarInt-sizes.

Copy link
Contributor Author

@Steve132 Steve132 May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole version byte becomes a varint. That's necessary in order to maintain backcompat with existing spec for MSB=0. It also simplifies the implementation quite a bit.

This doesn't extend the potential sizes of the payload, because that's beyond the scope of this spec, which is designed to be simple. The sizes are still the 8 sizes defined in the spec. Under this change, the sizes are still the 8 sizes defined in the spec.

However, we could later define a region of the larger 64-bit type space for which the sizes are extended if we wanted. However, that's impossible unless we first extend the type space.

Side note: is there a reason that CashAddr doesn't simply do what bech32 does and actually use a serialization of the scriptPubKey? That would allow us to skip a large headache with all of these different address types and stuff completely and then people would be able to automatically see what kind of address it was (if there's any weird non-standard scriptPubKeys) from the serialization.

Maybe a separate discussion should be added for implementing that as a type (like type 04?) and cut off a lot of the need for weird new types off at the pass? Pros: new weird smart contracts with weird scriptPubKeys don't need a new type for each kind of smart contract, because the whole thing is in the address. Cons: It would be possible to construct two valid addresses for a p2sh or p2pkh address (one the 'standard' way and one by encoding the scriptPubKey)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we first extended the type-range, it would otoh be impossible to later define different payload-sizes for certain sub-ranges of types...

Copy link
Contributor Author

@Steve132 Steve132 May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we first extended the type-range, it would otoh be impossible to later define different payload-sizes for certain sub-ranges of types...

That's not the case. In fact, it's the opposite. What we're really doing here is extending the entire version byte to potentially be larger to be a version int.

For example, later extensions could say that if the version int is between 0x4000 and 0x8000, then in that case the bottom 8 bits define the size instead of the bottom 3. Parsers that don't recognize 0x4000 as a valid type will then simply fail to parse and explain why (unsupported version) versus the status quo.

My proposal allows for backwards compatibility with existing code while still allowing forwards extensions like you're proposing.

@avl42
Copy link
Contributor

avl42 commented May 22, 2018

As I'm arguing in bitcoincashorg/spec#49 (comment) turning the version byte into a VarInt hinders extendability. What happens with a larger range of values can be seen in TCP/IP protocol's port numbers: a registry for a couple of values many of which lose their significance over time, and a chaos of people just picking "nice" values for their new uses...

Maybe we'd better go for alphabetic types once we exceed the still manageable number of 16 type values. Making it a VarInt now, blocks any alternatives, for no immediate need.

@Steve132
Copy link
Contributor Author

Maybe we'd better go for alphabetic types once we exceed the still manageable number of 16 type values. Making it a VarInt now, blocks any alternatives, for no immediate need.

I see that argument, and I can undertand why a different proposal might be better....but I wanted something simple to implement and simple to change that can be easily implemented now without introducing a lot of headaches with backcompat. I disagree strongly that the need isn't immediate. If the blocksize debate has taught me anything, it's that just because we aren't out of space now does't mean that extending it is premature ;). (I keed I keed, I know it's slightly different. Still though).

My point is that with code that needs to be in frontends and backends and database interfaces and payment gateways and exchanges and wallets, the more people integrate it for adoption the harder changes will be.

@avl42
Copy link
Contributor

avl42 commented May 23, 2018

I see that argument, and I can understand why a different proposal might be better....

At least some agreement :-)

but I wanted something simple to implement ...

I certainly see how this change is simple and easy to implement, but:

  • Allowing users to pick random ~64bit values for the type doesn't solve anything. A "solution" in desperate search of its problem.
  • by doing that, it preempts simple solutions of other issues that may really come up.
  • each type-value would necessarily go with a code change (even just for accepting and knowing how to handle it), not only on one's own client, but also on all clients that one would ever want to share their enhanced base32 token with. That should make it obvious, how ridiculous a >60-bit sized type-specifier is.
  • Even if we consider special clients with special features, it's no bloody likely that they'd choose true random type numbers of the whole ~64bit range and be collision-free for the next billion years. Instead, most of them would likely pick one-digit values and users of more than one such client would then enjoy the collisions.
  • If we wanted users to add user-defined attributes to a base32 token, we would need a way to add nested structures, like xml (yuck!) or ASN.1 (yuck, too) or JSON. That could mean a dummy version-byte with an MSB=1, a VarInt for the exact payload-size (as it likely wouldn't stick to the standard hash sizes anymore) and then a container mechanism for a collection of free attributes and (if not already embedded as attributes) finally the actual address-data. Less simple, but at least with a remote chance of ever being useful.

@Steve132
Copy link
Contributor Author

Allowing users to pick random ~64bit values for the type doesn't solve anything. A "solution" in desperate search of its problem.

The problem is that some method will have to be designed to handle the MSB. Sooner rather than later.

If you want to propose "if the msb is set, then interpret the low bits as an integer K which represents a string length. Then, the next K bytes are a string representing the transaction type" then fine, but that seems to be inconsistent with the design philosophy of BCH and also it seems like it would waste a ton of space and still have the other problems you are proposing.

each type-value would necessarily go with a code change (even just for accepting and knowing how to handle it), not only on one's own client, but also on all clients that one would ever want to share their enhanced base32 token with. That should make it obvious, how ridiculous a >60-bit sized type-specifier is.

This is not necessarily the case. If your code doesn't support the version, then your code can simply ignore it. This is not something we can do without the relevant extension.

Furthermore, as you've pointed out, the additional bit space in the version can hold other data, like sizes or tags.

Even if we consider special clients with special features, it's no bloody likely that they'd choose true random type numbers of the whole ~64bit range and be collision-free for the next billion years. Instead, most of them would likely pick one-digit values and users of more than one such client would then enjoy the collisions.

I'm not proposing that clients be allowed to arbitrarily choose version types and allocate them for themselves. I'm proposing that the 'registry' of types remains owned by the spec and only speced values are allowed. If we ever want to go beyond 16 types or beyond 512 bits, we need to have this space.

Addresses who use version integers that are not registered in our spec are still to be considered invalid. This prevents the growth you are describing.

If we wanted users to add user-defined attributes to a base32 token,

As I said above, we don't, and that's not the point of my proposal. The point of my proposal is not to allow users to allocate arbitrary tokens to the version integer. The point of my proposal is to future-proof space for specified extensions to exist in a wider bitspace with enough lead time for implementations to at least be able to implement correct set-MSB error handling code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
[WG] Website
  
Website To do
Development

Successfully merging this pull request may close these issues.

None yet

5 participants