Processor design RFCs #125

dzhelezov · 2020-11-20T18:43:34Z

This PR adds three RFCs for the new Hydra processor:

Layout for the manifest file (/docs/rfc/manifest-file.md)
Proposal for generating type-safe mappings for events (type-safe-mappings.md)
Virtual events (user-generated events not originating from the chain) (virtual-events.md)

bedeho

Layout for the manifest file

What part of this has to be created by hand vs. created via tooling? In particular all the information about the handlers.
Why does the signature information need to be specified in the handlers? As long as you have a fully specified name (module + module level name), then there can be only one event or extrinsic, as there is no overloading. The full signature should then be available from some external resource (the chain metadata file?) This part seems like it can get really rough to keep in synch with chain if 100% manual.
The notion of a mapping script, Mappings API and virtual events are not really defined anywhere, despite being referenced quite a bit. I think some sort of glossary or defintion secton would be a benefit.
If EventHandler.virtual is true, then it appears the described format for EventHandler.event. Perhaps there needs to be a distinct VirtualEventHandler?
If I care about what extrinsic is causing my event, by using the EventHandler.extrinsic, then don't I almost certainly need to look at its parameter values as well in my event mapper? In which case isn't this manifest level filtering going to be redundant pre-filtering?
My assumption is that if I want to get a Hydra instance going for a given runtime, where the mappings and manifest has been prepared by someone else, then I should not really edit that file. The manifest, as I have understood it, be about coordinating the inherent components of a Subgraph, not about a particular deployment of it. However, this format does not follow that principle, as I need to edit it to change the Hydra Indexer Source. If my premise is correct, then not only should the endpoint not be in the manifest, but even the blocks possibly should not, as a particular manifest is really just locked to a runtime or runtime module, which itself could exist in different chains in different block intervals. Ultimately, it depends on what we see the role of the manifest being.
dataSources, along with its description, indicates multiple sources, but the type does not, should the type perhaps be [DataSource] or something, signalling a vector?
Why is entities actually needed in Mapping, is there some benefit to respecifying it here, as the dev will need to keep it in synch with the input file.
The filter property seems to me to be a risky proposition with unclear upside. By splitting the rule for what extrinsic invocations actually impact processor state across the mapping code and the manifest file, it seems its easy to lose track of what the effective rule is over time.
What is the purpose of ExtrinsicHandler.exports?
BlockHandler probably needs to have an option for whether it runs before or after all other mappings, as this would correspond to on_initialize and on_finalize in the runtime. The handler property description referenes an event, I assume this is a copy&paste error.

Proposal for generating type-safe mappings for events

I seem to recall that some feature called typegen needed to be part of this to makethings properly automatic, this is what Leszek said, is that still the case here?
This appears to not allow extrinsic only type safe mappings? Why is that? I suspect this to be used 99% of the time, as our current event based approach degenerates into working back into the extrinsic payload. The only reason to process an event is if the side-effect is triggered by an inherent (on_finalized, on_initialized) or it's triggered by multiple extrinsics, and you would like to avoid dealing with them separately, AND (for both cases) the event has sufficient information to compute the full side-effect. This latter constraint is almost never satisfied in our runtime.
Once we have transactional handlers, I cannot think of a single use case off the top of my head where we want to lock in pairs of extrinsics and events. I also think the example would get really messy with you had lots of pairs like this, you would have Mod1Extrinsc2BalancesBalancesSetEvent or something like that to capture combos of different extrinsics causing balance setting. If we do not have this, the example just boils down to what The Graph would have, which would be a single type Balances.BalanceSetEvent which would have the layout of current BalanceSet__Params.
I did not understand when we would end up in a situation like BalanceSet__Params, where there are no names for params, doesn't the metadata file have names for all extrinsics and events? Seems so https://whisperd.tech/post/substrate_metadata/.
In the example, the module name for an event is prefixing the name of the type extending SubstrateEvent, e.g. BalancesBalanceSetEvent. Could we use proper typescript scoping/namespacing to increase readability here, so it would be Balances.BalanceSetEvent?

Virtual events

I suggest the "Goals and motivations" section only discussed the problem/limitations being addressed in an approach without virtual events. Right now its simultanously describing the problem & solution. A high level description introducing the virtual events high levle "solution" to the stated problems deserves to live in it's own section perhaps.
Isn't just having transaction handlers an equally suitable solution to the first point in the motivations?
In the context of EVM support, I sort of interpret these virtual events as our current pre-handlers, with the main difference being that pre-handles directly call what method, while this is lifted out into the manifest file here. Is that correct? I presume the main value-add of this is that you can just grab someone elses virtual event generator off-the shelf, without needing to mess with changing what they call. If this is correct, then it would seem that perhaps virtual event generators should be a distinct kind of extrinsicshandler which is pure, i.e. it does not access the database. Because you really dont want to have to audit exactly what some off-the shelf pre-handler does or does not do.
I don't actually entirely see a smooth way of getting virtual event generators from the outside world and integrating them into my setup. It would seem to require quite a lot of manual stitching, copying, editing of manifest files, etc.
If the virtual event generators change, then you will probably need to do a bit of brittle housekeeping? I assume this can be made smoother.
If the prior point is correct, then the main new value unlocked is presumably the EVM handling, but for this specific problem I think it seems we are some ways off from the full experience. Lets say I want to write a query node for some new smart contracts. With the virtual events approach it seems I would need to sit down and dig into how to handle extrinsics to the EVM pallet, which is very complex, and I would have to hand-craft types that mirror the different parameters and extrinsics I care about. Once I had all this, I could write my actual handlers for these. At this point, if someone wanted to write another query node for the same smart contract, they could get my event emitters, and map events differently depending on what queries they wanted to support. Given that its often going to be one canonical query set of interest to a given system, the value of the reusability unlocked is relatively low. What I really want to be able to do is what we are thinking I can do in the type safe Substrate case. I want to provide the ABI of my contracts to the codegen tool, and I want to tell it that I care about methods x,y,z, and it gen generate all the required types I want so that I can just jump into writing handlers. In principle the codegen tool could generate virtual event generators, but that is only a technicality at that point.

Lezek123 · 2020-12-02T11:42:07Z

I seem to recall that some feature called typegen needed to be part of this to makethings properly automatic, this is what Leszek said, is that still the case here?

I just mentioned @polkadot/typegen (https://polkadot.js.org/docs/api/examples/promise/typegen/) as an example of a library that already does generate typesafe extrinsic args based on chian metadata for the purpose of api.tx decoriation (see: Joystream api.tx augmentation). I don't think we should use it directly in Hydra, but perhaps it can be a good reference.

Comments to Strongly typed mappings proposal

It looks really interesting and useful, I don't think I can add much to what @bedeho already mentioned, but I'll just mention a couple of things that came to my mind:

The types and interfaces must be available for import from ./types.ts. The standard polkadot types should be re-exported together with additional custom and manually defined types and interfaces.

So I imagine this to be a file like the types.ts file that can be generated from json typedefs with already mentioned @polkadot/typegen tool, or it can be a smiliar file that exports custom class-types instread of autogenerated interfaces (in @joystream/types we have a file like that here)

By default those files do not re-export any interfaces/classes from @polkadot/types. In order to do this we would need to re-export from multiple @polkadot/types locations, ie. types like AccountId or Balance will come from @polkadot/types/interfaces/runtime, u32, u64 from @polkadot/types/primitive, Vec from @polkadot/types/codec etc.

Lookup the definitions and argument types in the metadata file, both for the event and the extrinsic.

Is the metadata file just the metadata queried from the chain via api.rpc.state.getMetadata()?
Note that there are different versions of the metadata, ie MetadataV3, MetadataV4 etc.

There is a MetadataVersioned type in @polkadot/api that seems to be able to do some conversions between different versions and it seems like @polkadot/typegen uses metadata.asLatest to get the metadata in the latest version format, but I'm not sure if this will work for much older versions of the metadata.

Joystream chain currently uses V11 so in that case it shouldn't require any special handling.

// typeRegistry is declared there but must be defined elsewhere
import { typeRegistry } from '@dzlzv/hydra-common'

So if I understand correctly we will have a separate file like initializeRegistry.ts or do it directly inside a file like query-node/generated/indexer/index.ts after running hydra-cli codegen? It seems reasonable and it should allow us to use completely custom types (ie. classes) that may also include some additional getters etc. like ContentId

I was just worrying about possible @polkadot/api / @polkadot/types version clashes when the indexer relies on different version of @polkadot/api than is assumed by the types lib (currently we're relying on 1.26 in @joystream/types).

In such case:

we need to provide additional entries to typedefs that we provide into the indexer type registry (ie. the CompactAssignments: 'CompactAssignmentsTo257' example from @polkadot/api changelog), this is not that much of a problem and all such cases can be easily looked up in @polkadot/api changelog
we may run into some other incompatibilities (ie. the types we register may no longer be handled as expected by newer versions of the api), but I suspect this would only happen if there are some breaking changes in @polkadot/api.

We currently sidestep those issues along with some monorepo-setup related ones by having a -legacy.1.26.1 version of hydra-indexer-lib. I don't think we would necessarily run into any new issues with this new approach and it's probably not worth trying to add any special support for older api versions unless it turns out really necessary.

The property names will be derived using the name property (if present), otherwise by lowercasing the type. If there are several properties of the same type, prefix with a number.

Is it possible for substrate events to have named parameters? (I don't think I've seen a case like that)

I think that for unnamed parameters something like event.parmas[0] would work just fine, since the param is already strongly typed so we know what we're getting. I'm not sure whether it helps to have some type-based names like event.params.balance0, it seems like it may require some additional work and looks kind of unnecessary and less clean compared to just standard 0-indexed params.

Define all (event, extrinsic) pairs that are going to be handled according to the manifest file. Extrinsic may be optional

Here I was thinking about events that can be emitted by multiple different extrinsics and therefore having different callArgs, because in current Joystream runtime this is the case currently with both contentDirectory.create_entity and contentDirectory.transaction emitting EntityCreated event.

As already mentioned by @bedeho in that case in order to make callArgs typesafe we would need separate event definitions like ContentDirectoryEntityCreated_EmittedByCreateEntityTx and ContentDirectoryEntityCreated_EmittedByTransactionTx, unless we assume there's always a 1-to-1 relationship between events that we want to handle and extrinsics that emit them.

If we do want to handle events that can be possibly emitted by multiple different extrinsics that would probably require more consideration.

* fix: json list codegen issues affects: @subsquid/hydra-cli, hydra-e2e-tests * style: fix lint errors affects: @subsquid/hydra-cli, hydra-e2e-tests

dzhelezov added 3 commits November 20, 2020 16:20

add manifest file RFC

3b23911

type safe mappings RFC

efecff9

add virtual events RFC

053b8bd

dzhelezov requested review from bedeho, metmirr and Lezek123 November 20, 2020 18:43

dzhelezov changed the base branch from master to hydra_v1 November 20, 2020 18:43

dzhelezov changed the base branch from hydra_v1 to master November 20, 2020 18:44

dzhelezov mentioned this pull request Nov 25, 2020

Design pipeling for the processor #110

Closed

bedeho reviewed Nov 28, 2020

View reviewed changes

address comments + review.md

e850532

dzhelezov force-pushed the master branch 9 times, most recently from c769c56 to 54509f2 Compare March 5, 2021 16:51

dzhelezov force-pushed the master branch from 5303b18 to 45096b1 Compare March 11, 2021 18:48

dzhelezov force-pushed the master branch from 65b37a4 to 1e5aac0 Compare June 16, 2021 15:43

dmtrjsg added the low-prio label Aug 17, 2022

metmirr approved these changes Nov 30, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processor design RFCs #125

Processor design RFCs #125

dzhelezov commented Nov 20, 2020

bedeho left a comment

Lezek123 commented Dec 2, 2020

Processor design RFCs #125

Are you sure you want to change the base?

Processor design RFCs #125

Conversation

dzhelezov commented Nov 20, 2020

bedeho left a comment

Choose a reason for hiding this comment

Layout for the manifest file

Proposal for generating type-safe mappings for events

Virtual events

Lezek123 commented Dec 2, 2020

Comments to Strongly typed mappings proposal