-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add autonomy controller and autonomy manager #2033
base: master
Are you sure you want to change the base?
Conversation
41bcee9
to
eb0e566
Compare
@vie-serendipity Thanks for posting this pull request. it is ready for reviewing or not? |
@rambohe-ch I think it's ready for review, I just need to modify and add more tests. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2033 +/- ##
==========================================
+ Coverage 56.09% 56.15% +0.05%
==========================================
Files 186 188 +2
Lines 18092 18344 +252
==========================================
+ Hits 10149 10301 +152
- Misses 6910 6995 +85
- Partials 1033 1048 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
/rerun |
0abbe6a
to
9534a73
Compare
/rerun |
cm.Lock() | ||
cm.errorKeys[keyBuildInfo] = err | ||
cm.Unlock() | ||
cm.deltaFIFO <- Delta{Key: keyBuildInfo, Err: err, Type: Added} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worried about the overflow of this channel if there are too many error happened. should we need to consider this corner case?
// ConsistencyValidate check the data consistency between cache manager and api server | ||
func (am *AutonomyManager) ConsistencyValidate(client kubernetes.Interface) { | ||
klog.Info("start to check the data consistency between cache manager and api server") | ||
node, err := client.CoreV1().Nodes().Get(context.TODO(), am.nodeName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a very high cost for Yurthub to get node data for every 20s, because cloud-edge traffic will rise a lot.
} | ||
} | ||
|
||
func EnsureAutonomyCondition(client kubernetes.Interface, node *v1.Node, oldConditionStatus []v1.ConditionStatus, expectedConditionStatus v1.ConditionStatus, reason, message string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a good idea to update Node status by Yurthub, how about intercept the node status update request from kubelet and modify the node status condition of this request?
return | ||
} | ||
key := errorKey.(storage.KeyBuildInfo) | ||
unstructuredObj, err := dynamicClient.Resource(schema.GroupVersionResource{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to make sure that dynamicClient
has the correct rbac right to get resource from kube-apiserver.
if node.Labels == nil { | ||
node.Labels = make(map[string]string) | ||
} | ||
node.Labels[projectinfo.GetAutonomyStatusLabel()] = "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetAutonomyStatusLabel is not located in node annotations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the way, we also need to change GetAutonomyStatusLabel
to false when node condition is false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetAutonomyAnnotation is in node annotations and it is used by users to open node autonomy. And the real autonomy status is reflected in the label GetAutonomyStatusLabel
.
23a6968
to
ed7e038
Compare
d07fbaf
to
5f99b84
Compare
feat: persist error keys
fc0ba21
to
b31d834
Compare
Quality Gate passedIssues Measures |
chore: gci write
chore: add parameter
fix: run autonomy manager asynchronously
fix: disable autonomy manager if it's a cloud node
What type of PR is this?
/kind feature
What this PR does / why we need it:
Supplement autonomy ability.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
other Note