Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Origo] Adds support to mark storage-node/distrubtion-node under maintenance #4793

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

zeeshanakram3
Copy link
Contributor

@zeeshanakram3 zeeshanakram3 commented Jun 21, 2023

Addresses #4765

@vercel
Copy link

vercel bot commented Jun 21, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated (UTC)
pioneer-testnet ⬜️ Ignored (Inspect) Jul 18, 2023 7:00am

@zeeshanakram3 zeeshanakram3 added colossus argus Argus distributor node origo Origo Release labels Jun 23, 2023
@mnaamani
Copy link
Contributor

mnaamani commented Jul 16, 2023

You need to add implicitly new dependency on inquirer-datepicker? (and maybe others) to storage node: this is why this check is failing building docker image for storage node https://github.com/Joystream/joystream/actions/runs/5561050023/jobs/10158465896?pr=4793#step:8:373

@zeeshanakram3
Copy link
Contributor Author

Integration tests will be added in #4815

@mnaamani mnaamani self-requested a review July 28, 2023 00:00
@mnaamani
Copy link
Contributor

@Zeeshan perhaps we can try to bump version of protobufjs in this PR?

see GHSA-h755-8qp9-cq85

@zeeshanakram3 zeeshanakram3 force-pushed the origo_argus_colossus_operational_status_feat branch from 2063272 to b07bbec Compare August 1, 2023 14:51
@zeeshanakram3
Copy link
Contributor Author

perhaps we can try to bump version of protobufjs in this PR?

Bumped package versions and updated CHANGELOG.md in d133ffb

@mnaamani
Copy link
Contributor

mnaamani commented Aug 2, 2023

perhaps we can try to bump version of protobufjs in this PR?

Bumped package versions and updated CHANGELOG.md in d133ffb

Great.
I think QN package also needs a version bump.

@zeeshanakram3 zeeshanakram3 force-pushed the origo_argus_colossus_operational_status_feat branch from 5cf5bd6 to d133ffb Compare August 2, 2023 13:40
@zeeshanakram3
Copy link
Contributor Author

I think QN package also needs a version bump.

Done in ba24cfe

@kdembler kdembler self-assigned this Dec 21, 2023
Copy link
Member

@kdembler kdembler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @zeeshanakram3 and sorry for doing this review so late 🙇 I think one additional feature we could introduce is for distribution nodes to only query active storage operators when doing pings/asset checks. But we can do that as a followup later, this in itself is quite valuable.
Edit: I see that issue for that is already created and scoped: #4816

}

// Node's Operational status to set
optional OperationalStatus status = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the status be required here? Full NodeOperationalStatusMetadata is already optional in DistributionBucketOperatorMetadata

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeeshanakram3 still relevant

Copy link
Contributor

@mnaamani mnaamani Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go further and suggest that we should probably have multiple messages, corresponding with the options in the commands for setting the metadata.

message NodeOperationalStatusNormal {
  optional string rationale = 1;
  required string bucketId = 2;
}
message NodeOperationalStatusNoService {
   optional string rationale = 1;
   required string bucketId = 2;
}
NodeOperationalStatusNoServiceFrom {
  optional string rationale = 1;
  required string from = 2;  // date
  required string bucketId = 3;
}
NodeOperationalStatusNoServiceUntil {
  optional string rationale = 1;
  optional string from = 2  // date
  required string until = 3;  // date
  required string bucketId = 4;
}

Doing this I think will help address a couple issues raised in other places in the review as well as simplifying implementation in the commands and squid mappings I presume.

Comment on lines 616 to 618
if ((await getWorker(store, workingGroup.name, workerId)) === undefined) {
return invalidMetadata(`The worker ${workerId} does not exist in the ${workingGroup.name} working group`)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering, do we need to bother with worker for storage buckets? Those only have a single operator so I guess the worker ID doesn't make any difference?

Comment on lines 97 to 98
status.from = new Date(meta.noServiceFrom)
status.to = new Date(meta.noServiceTo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those raw dates validated anywhere? If not, we probably should do that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this in storage squid PR Joystream/storage-squid#19

### SPECIFIC DATA ###

"Storage Bucket"
storageBucket: StorageBucket
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that be a required field?

storage-node/src/commands/operator/set-metadata.ts Outdated Show resolved Hide resolved
Comment on lines 92 to 101
metadata = {
...input,
operationalStatus: {
...input.operationalStatus,
status:
input.operationalStatus?.status === 'Normal'
? NodeOperationalStatusMetadata.OperationalStatus.NORMAL
: NodeOperationalStatusMetadata.OperationalStatus.NO_SERVICE,
},
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if somebody will try to set metadata without operational status then this will set it to NO_SERVICE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6ff4531

Comment on lines 79 to 87
metadata = {
...params,
operationalStatus: {
...params.operationalStatus,
status:
params.operationalStatus?.status === 'Normal'
? NodeOperationalStatusMetadata.OperationalStatus.NORMAL
: NodeOperationalStatusMetadata.OperationalStatus.NO_SERVICE,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if somebody will try to set metadata without operational status then this will set it to NO_SERVICE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 6ff4531

@@ -957,6 +962,9 @@ type DistributionBucketOperatorMetadata implements BaseGraphQLObject {
nodeLocation: NodeLocationMetadata
nodeLocationId: String

"""Optional node operational status"""
nodeOperationalStatus: NodeOperationalStatus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be hard to query because of QN nested filtering limitations. I think we need to design a schema in a way that I can easily fetch all the active operators. If I'm not mistaken with the current schema the query would have to be (paraphrasing, I don't remember the query syntax for checking specific union type):

query {
  distributionBucketOperators(
    where: { metadata: { nodeOperationalStatus: { status: "OKAY" } } }
  ) {
    id
  }
}

And that filter would be too deep for QN to execute. Same applies to storage buckets

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would it be possible to make operational status required part of metadata? Because empty operational status currently would also mean "Okay" but it would be hard to query it I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you would not have that problem with storage squid's graphql server

@zeeshanakram3
Copy link
Contributor Author

@kdembler I have addressed the requested changes. Also, removed the QN mappings needed for this feature as I think having the code at two places makes it difficult to review and maintain

Also, did changes as described in #4816 for both Argus/Colossus

const leadKey = await this.getDistributorLeadKey()

let operationalStatus: INodeOperationalStatusMetadata = {}
switch (statusType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statusType input is not required, should we default to Normal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes more sense to make it required, the command name doesn't imply what the default value should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes more sense to make it required, the command name doesn't imply what the default value should be.

Done

const workerKey = await this.getDistributorWorkerRoleKey(workerId)

let operationalStatus: INodeOperationalStatusMetadata = {}
switch (statusType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about default status here

? NodeOperationalStatusMetadata.OperationalStatus.NORMAL
: NodeOperationalStatusMetadata.OperationalStatus.NO_SERVICE,
}
: {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this override the current status? Or skip the update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should skip the update but previously it was overriding, I have fixed it

operatorMetadata: {
OR: [
{ nodeOperationalStatus_isNull: true }
{ nodeOperationalStatus: { isTypeOf_eq: "NodeOperationalStatusNormal" } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This filter would skip any nodes with future maintenance as well. We probably should get all the operators and later only filter those that are under maintenance currently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0537ee7

}

// Node's Operational status to set
optional OperationalStatus status = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeeshanakram3 still relevant

Comment on lines 25 to 30
operatorMetadata: {
OR: [
{ nodeOperationalStatus_isNull: true }
{ nodeOperationalStatus: { isTypeOf_eq: "NodeOperationalStatusNormal" } }
]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here as with the query in distribution node

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

Copy link
Contributor

@mnaamani mnaamani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting to make multiple messages rather than a single one. Feels like a better design, but this will require a bit of refactoring here and in the storage-squid mappings.

const leadKey = await this.getDistributorLeadKey()

let operationalStatus: INodeOperationalStatusMetadata = {}
switch (statusType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes more sense to make it required, the command name doesn't imply what the default value should be.


message SetNodeOperationalStatus {
optional NodeOperationalStatusMetadata operational_status = 1; // Node's operational status to set
optional string worker_id = 2; // Storage/Distribution Worker ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue only the bucket_id is strictly necessary in the message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct for storage nodes, but for distributor nodes we also need to know about the worker_id, since a bucket can be assigned to multiple operators?

bucketId: flags.bucketId({
required: true,
}),
workerId: flags.integer({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested that worker id is not needed in metadata. If we agree on that, then this workerId can be dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #4793 (comment)

}

// Node's Operational status to set
optional OperationalStatus status = 1;
Copy link
Contributor

@mnaamani mnaamani Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go further and suggest that we should probably have multiple messages, corresponding with the options in the commands for setting the metadata.

message NodeOperationalStatusNormal {
  optional string rationale = 1;
  required string bucketId = 2;
}
message NodeOperationalStatusNoService {
   optional string rationale = 1;
   required string bucketId = 2;
}
NodeOperationalStatusNoServiceFrom {
  optional string rationale = 1;
  required string from = 2;  // date
  required string bucketId = 3;
}
NodeOperationalStatusNoServiceUntil {
  optional string rationale = 1;
  optional string from = 2  // date
  required string until = 3;  // date
  required string bucketId = 4;
}

Doing this I think will help address a couple issues raised in other places in the review as well as simplifying implementation in the commands and squid mappings I presume.

type: 'datepicker',
name: 'result',
clearable: true,
default: new Date('2017-09-28 17:36:05').toISOString(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried the date picker, works really nicely in the terminal!

Perhaps a more reasonable default would be the current date new Date().toISOString(), so less fiddling with inital year and month for the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zeeshanakram3
Copy link
Contributor Author

@mnaamani @kdembler I have addressed the requested changes. Also, re-designed the protobuf message schemas as requested.

If the new protobuf messages are acceptable, then please publish the npm package so I would update the storage-squid mappings as well

@mnaamani
Copy link
Contributor

mnaamani commented Apr 8, 2024

@mnaamani @kdembler I have addressed the requested changes. Also, re-designed the protobuf message schemas as requested.

If the new protobuf messages are acceptable, then please publish the npm package so I would update the storage-squid mappings as well

published protobuf and joystream/js packages (for joystream/js I bumped the dependency on protobuf package)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
argus Argus distributor node colossus origo Origo Release
Projects
Status: Review
Development

Successfully merging this pull request may close these issues.

None yet

3 participants