Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Watch and Read cilium network policies from static directory path #32599

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tamilmani1989
Copy link
Contributor

@tamilmani1989 tamilmani1989 commented May 17, 2024

Cilium reads CNP files from directory if path is configured via static-cnp-path field in cilium config. It watches the directory for any changes and read those files and convert to CNP object and add it to policy engine. This allows admin to configure policy to not allow traffic to certain endpoints without showing up as policy resource in kubernetes. This is implemented based on this discussion: #30060 (comment)

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Fixes: #issue-number

policy: Add support to watch and read CNP files from directory

@tamilmani1989 tamilmani1989 requested review from a team as code owners May 17, 2024 05:11
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 17, 2024
@github-actions github-actions bot added the sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. label May 17, 2024
@tamilmani1989 tamilmani1989 changed the title feat: Configure static cilium network policy feat: Watch and Read cilium network policies from static directory path May 17, 2024
@gandro gandro self-requested a review May 21, 2024 15:27
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think the motivation of the feature makes sense overall, but I have some feedback on the implementation

pkg/policy/k8s/watcher.go Outdated Show resolved Hide resolved
}

cnp := &cilium_v2.CiliumNetworkPolicy{}
err = json.Unmarshal(jsonData, cnp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we should also think about what we want to do with the CNP fields which don't make sense when loaded from file, e.g. K8s name and namespace. In particular if there are two files with the same policy name on the object metadata, then I think the current system is non-deterministic in what policy gets applied.

Instead of parsing in the full cilium_v2.CiliumNetworkPolicy schema, we could consider only reading in api.Rule and overwrite the labels of each rule to contain the filename or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are 2 policies with same name, then it would merge both. If policy is same, then it would ignore adding same rule and if different it merges rules from both policy. Wouldn't namespace be significant if user wants to apply policy for specific namespace.? If we skip name and namespace, then it supports only clusterwide policies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are 2 policies with same name, then it would merge both.

Yes, I think this could be a good solution. This is not yet implemented in the current version, right?

I am still unsure though if we even want these policies to have names. The names don't serve any purpose and I think the file name is a much better identifier (since it is actually unique). Maybe it would be better if we just ignored the name completely.

Wouldn't namespace be significant if user wants to apply policy for specific namespace.?

Ah, I missed this bit. So the way this is implemented is that if something is read from a CNP that we simply attach the namespace to the endpoint selector when coverting them to low-level api.Rules:

if namespace != "" {
userNamespace, present := r.EndpointSelector.GetMatch(podPrefixLbl)
if present && !namespacesAreValid(namespace, userNamespace) {
log.WithFields(logrus.Fields{
logfields.K8sNamespace: namespace,
logfields.CiliumNetworkPolicyName: name,
logfields.K8sNamespace + ".illegal": userNamespace,
}).Warn("CiliumNetworkPolicy contains illegal namespace match in EndpointSelector." +
" EndpointSelector always applies in namespace of the policy resource, removing illegal namespace match'.")
}
retRule.EndpointSelector.AddMatch(podPrefixLbl, namespace)

If we skip name and namespace, then it supports only clusterwide policies.

Sort of, but not necessarily - a clusterwide policy can always be translated into a namespaced policy by modifying the endpoint selector. So reading api.Rules does not take away any capabilities.

But I do see an argument to be made that the CNP/CCNPs format might be easier for most Cilium users to understand - since this is how most users interact with policies these days. But it does have the downside that there are the "K8s" name field in the CNP metadata causes some confusion, since it is completely unused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep correct. instead of name, can update with filename instead but still I like to namespace. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I only saw this after I re-reviewed. Yes, I think keeping the namespace and forcing the name to be the filename seems like an elegant solution

Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamilmani1989 Nice work! Thank you for this PR. Might be out of context here but I have a few concerns as well regarding this feature.

}
}
if event.Op&fsnotify.Remove == fsnotify.Remove {
p.log.WithField("file", event.Name).Debug("CNP file removed from directory")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this to a method i.e deleteFromPolicyEngine or the likes so we can test the functionality outside the watcher.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this should be moved to its own function. Why has this been marked as resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, not my intention to resolve. Agreed, will take as separate function as like add

pkg/policy/k8s/watcher.go Outdated Show resolved Hide resolved
@tamilmani1989
Copy link
Contributor Author

Thanks for reviewing this @gandro and @derailed. Initially I thought about having separate watcher for loading from file but it seems to be duplicating few api and structs again (PolicyResourcesWatcher, policyWatcher, newPolicyWatcher, newPolicyResourcesWatcher, WatchK8sPolicyResources, etc) and so I clubbed with k8s policy watcher to keep changes minimal. But I agree on your concerns on differentiating this policy from k8s policy makes debugging difficult. I can separate this out and add a separate watcher and also set labels on this policy for cilium-dbg or hubble to identify its source.

@gandro
Copy link
Member

gandro commented May 28, 2024

Thank you for the feedback! Overall I think one design decision to take is if we want ToServices and CIDRGroupRef to work for policies loaded from file. Those rules refer to resources in K8s - and therefore might not make sense in the context of a policy loaded from disk - it really depends on the usecase. The majority of the logic in pkg/policy/k8s exists to deal with those. If we don't want or need to support those K8s references, then I think the separate watcher can be much simpler.

CIDRGroupRef and ToServices currently don't work for policies imported via API cilium policy import, so I think there is precedence for not supporting all fields when a policy is not loaded from K8s.

@tamilmani1989 tamilmani1989 requested review from a team as code owners June 1, 2024 09:13
@tamilmani1989 tamilmani1989 force-pushed the tamilmani/staticPolicies branch 3 times, most recently from fcd8a43 to 02bc933 Compare June 1, 2024 09:23
@tamilmani1989
Copy link
Contributor Author

tamilmani1989 commented Jun 1, 2024

@gandro @derailed I updated based on your suggestion. Separated directory watcher and starting it from a different cell. Also removed CIDRGroupRef and ToServices which are not relevant for policy loaded from file. Added labels to rules to differentiate from policy created via k8s. Please let know if this approach looks ok.

@tamilmani1989 tamilmani1989 force-pushed the tamilmani/staticPolicies branch 2 times, most recently from 0601118 to f2066e0 Compare June 1, 2024 09:34
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we want to not add more logic to the daemon initializization.I understand the concern regarding the circular dependency on Daemon. I took a quick look this and it seems we can move the "policy manger" part into its own type.
I made a draft PR to do just that #32847.

I suggest we merge this PR in its current state, once it merges I can rebase and update this watcher to use the separate policy manager so a lifecycle can be used here.

@joestringer
Copy link
Member

joestringer commented Jun 4, 2024

@tamilmani1989 please adjust the ```release-note ...``` in the issue description. The text in that part of the description will appear verbatim in the actual release notes after this PR is merged, so the current text there doesn't make sense as a description of this field.

I'll also note that the release freeze for v1.16 is coming up quick, so you'll need to coordinate with sig-policy folks about timing and whether you think it's viable to land in the next week or so.

Additionally, the checkboxes in the issue description are for you to fill out to ensure the PR follows the guidelines for merging. Please check through them and check them off to indicate the status of the PR. Thanks.

@joestringer joestringer added the release-note/major This PR introduces major new functionality to Cilium. label Jun 4, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 4, 2024
@joestringer
Copy link
Member

/test

Copy link
Contributor

@learnitall learnitall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API changes look good to go, but I have a couple of questions regarding the implementation. Thanks!

}
}
if event.Op&fsnotify.Remove == fsnotify.Remove {
p.log.WithField("file", event.Name).Debug("CNP file removed from directory")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this should be moved to its own function. Why has this been marked as resolved?

// Listen for file add, update and delete
for {
select {
case event := <-watcher.Events:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why the Rename and Write events are not handled? Not handling these events could cause policy leaks and/or the state in the directory to not match the state in Cilium's policy engine. For example:

  • If a user modifies an existing policy on disk, that change will not be pushed to the policy engine.
  • If a user moves an existing policy on disk to a different directory, that policy will not be removed from the policy engine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I have to handle rename and write events. First wanted to make sure approach was right and planned to add this later. i addressed it now

for {
select {
case event := <-watcher.Events:
if !(event.Op&fsnotify.Create == fsnotify.Create ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be safer and easier to read if event.Has was used instead of event.Op&, as this is the recommended usage in the documentation.

}
reportCNPChangeMetrics(err)
case err := <-watcher.Errors:
p.log.Error("Error:", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we expand on the error handling here to provide more information on the context of this message? Specifically, it would be useful it we could we add a prefix like unknown error from fsnotify while watching policy directory and even handle specific errors such as ErrEventOverflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please clarify how should we handle ErrEventOverflow? I saw increasing buffer size is one option to auto recover but suggested method works only for windows.
This only has effect on Windows systems, and is a no-op for other backends.

I would rather keep simple and return error instead of increasing buffer size which may have other complications

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I see what you mean. Can we just change the Error: string to something like Unexpected error thrown by fsnotify watcher when watching policy directory then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd generally expect the form p.log.WithError(err).Error(...) to ensure proper structured logging, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep agreed. I addressed that

Comment on lines 62 to 63
labels.NewLabel("name", name, labels.LabelSourceDirectory),
labels.NewLabel("path", filePath, labels.LabelSourceDirectory),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes has restrictions on the characters that can be put into labels, as well as the label's format (see here). The filePath argument needs to undergo input validation to ensure it fits within these guidelines. For example, the forward-slashes in the path should probably be replaced with a different character, such as an underscore, since the forward-slash character has a specific meaning. Additionally, Kubernetes labels are restricted to ASCII characters, but file paths can contain UTF-8 characters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not K8s labels - they are Cilium labels. Slashes should be fine - as to my knowledge, they are not exposed to K8s anywhere, so I don't think we need to be overly restrictive with the label format. But I'm not 100% certain where labels on policy are are used - the only place I know is Hubble policy correlation.

Still, it probably would be worth sanitizing any non-UTF8 characters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting that's good to know, TIL. Do we have the format of Cilium labels documented somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of absolute path, i'm planning to add only name of file.

  1. Anyway name of file will be unique under directory and directory watcher listens only on one directory
  2. can avoid longer filepaths

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you! I like this version much better.

I have focused mainly on questions around policy ingestion and policy lifetime. I have not yet focused on the details of the file watcher itself, but I see Ryan has looked at that.

I think there are still some questions around the lifecycle of policies and in particular the meaning of the name of a policy

@@ -760,6 +760,12 @@ func newDaemon(ctx context.Context, cleaner *daemonCleanup, params *daemonParams
}
}

if option.Config.StaticCiliumNetworkPolicyPath != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having a global option, we could move this flag into the directory watcher cell. newDirectoryPolicyResourcesWatcher would then return nil if no path is set, thereby simplifying the code a bit and only requiring the nil check here

pkg/policy/directory/cell.go Show resolved Hide resolved
Comment on lines 62 to 63
labels.NewLabel("name", name, labels.LabelSourceDirectory),
labels.NewLabel("path", filePath, labels.LabelSourceDirectory),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not K8s labels - they are Cilium labels. Slashes should be fine - as to my knowledge, they are not exposed to K8s anywhere, so I don't think we need to be overly restrictive with the label format. But I'm not 100% certain where labels on policy are are used - the only place I know is Hubble policy correlation.

Still, it probably would be worth sanitizing any non-UTF8 characters

}

cnp := &cilium_v2.CiliumNetworkPolicy{}
err = json.Unmarshal(jsonData, cnp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are 2 policies with same name, then it would merge both.

Yes, I think this could be a good solution. This is not yet implemented in the current version, right?

I am still unsure though if we even want these policies to have names. The names don't serve any purpose and I think the file name is a much better identifier (since it is actually unique). Maybe it would be better if we just ignored the name completely.

Wouldn't namespace be significant if user wants to apply policy for specific namespace.?

Ah, I missed this bit. So the way this is implemented is that if something is read from a CNP that we simply attach the namespace to the endpoint selector when coverting them to low-level api.Rules:

if namespace != "" {
userNamespace, present := r.EndpointSelector.GetMatch(podPrefixLbl)
if present && !namespacesAreValid(namespace, userNamespace) {
log.WithFields(logrus.Fields{
logfields.K8sNamespace: namespace,
logfields.CiliumNetworkPolicyName: name,
logfields.K8sNamespace + ".illegal": userNamespace,
}).Warn("CiliumNetworkPolicy contains illegal namespace match in EndpointSelector." +
" EndpointSelector always applies in namespace of the policy resource, removing illegal namespace match'.")
}
retRule.EndpointSelector.AddMatch(podPrefixLbl, namespace)

If we skip name and namespace, then it supports only clusterwide policies.

Sort of, but not necessarily - a clusterwide policy can always be translated into a namespaced policy by modifying the endpoint selector. So reading api.Rules does not take away any capabilities.

But I do see an argument to be made that the CNP/CCNPs format might be easier for most Cilium users to understand - since this is how most users interact with policies these days. But it does have the downside that there are the "K8s" name field in the CNP metadata causes some confusion, since it is completely unused.

Comment on lines 89 to 94
resourceID := ipcacheTypes.NewResourceID(
ipcacheTypes.ResourceKindCNP,
cnp.ObjectMeta.Namespace,
cnp.ObjectMeta.Name,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. We must not use the K8s name and namespace for tracking the lifecycle of IPCache entries owned by this policy. This ID here is used to e.g. if the policy contains a ToCIDR rule, requiring us to insert a CIDR prefix into IPCache. If we use the K8s name here, and you have a real K8s policy and a non-K8s policy read from disk with the same name and namespace, both resources would claim ownership over the IPCache CIDR entry, causing conflicting updates for those IPCache entries.

Instead, I would suggest introducing a new ipcacheTypes.ResourceKind here (e.g. ResourceKind("file")), leave the namespace empty and use the filename as the name. Then the IPCache entry is always owned by the file (and thus updated when the file is updated and removed when the file is removed, which is exactly what we want).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on using fileName as the name but still thinking to keep namespace for the reasons i specified in other comment (to restrict policy to apply to specific namespace)

pkg/policy/directory/watcher.go Outdated Show resolved Hide resolved
pkg/policy/directory/watcher.go Outdated Show resolved Hide resolved
pkg/policy/directory/watcher.go Outdated Show resolved Hide resolved
pkg/policy/k8s/watcher.go Outdated Show resolved Hide resolved
@tamilmani1989 tamilmani1989 force-pushed the tamilmani/staticPolicies branch 2 times, most recently from f91b5a5 to f8670e7 Compare June 11, 2024 07:56
@tamilmani1989
Copy link
Contributor Author

Thanks for your wonderful suggestions @gandro @learnitall. Fairly new to this code and apologies if I'm missing basic stuff. Appreciate your help in reviewing this and guiding with right set of changes. I addressed most of your comments and replied for which I didn't address. I'm happy to address any comments further that makes this code better.

@tamilmani1989
Copy link
Contributor Author

@tamilmani1989 please adjust the ```release-note ...``` in the issue description. The text in that part of the description will appear verbatim in the actual release notes after this PR is merged, so the current text there doesn't make sense as a description of this field.

I'll also note that the release freeze for v1.16 is coming up quick, so you'll need to coordinate with sig-policy folks about timing and whether you think it's viable to land in the next week or so.

Additionally, the checkboxes in the issue description are for you to fill out to ensure the PR follows the guidelines for merging. Please check through them and check them off to indicate the status of the PR. Thanks.

@joestringer yep I sync'd with sig-policy folks in last sig-policy meeting to get this for 1.16 release

Cilium reads CNP yaml if `static-cnp-path` is specified in cilium
config. It converts to rules and add those rules to policy engine. This
allows admin to configure policy to not allow traffic to certain secure
infrastructure endpoints from pods running in cloud.

Signed-off-by: tamanoha <tamanoha@microsoft.com>
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look really really good. I still have some feedback, but most of it is rather minor

@@ -44,6 +44,7 @@ var (
ResourceKindCCNP = ResourceKind("ccnp")
ResourceKindDaemon = ResourceKind("daemon")
ResourceKindEndpoint = ResourceKind("ep")
ResourceKindFile = ResourceKind("File")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
ResourceKindFile = ResourceKind("File")
ResourceKindFile = ResourceKind("file")


// WatchDirectoryPolicyResources starts watching Cilium Network policy files created under a directory.
func (p *PolicyResourcesWatcher) WatchDirectoryPolicyResources(ctx context.Context, policyManager PolicyManager,
readStatus DirectoryWatcherReadStatus) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/suggestion:

Since DirectoryWatcherReadStatus is constructed in the hive, we could make it a field in PolicyWatcherParams instead of passing it in here.

This would also allow you to call close(readStatus) in newDirectoryPolicyResourcesWatcher if the watcher functionality is disabled

return err
}

//update labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro nit:

Suggested change
//update labels
// update labels


resourceID := ipcacheTypes.NewResourceID(
ipcacheTypes.ResourceKindFile,
cnp.ObjectMeta.Namespace,
Copy link
Member

@gandro gandro Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as below, if we end up changing the resource ID for addToPolicyEngine

Comment on lines +90 to +94
resourceID := ipcacheTypes.NewResourceID(
ipcacheTypes.ResourceKindFile,
cnp.ObjectMeta.Namespace,
cnp.ObjectMeta.Name,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion:

I tripped up on this, thinking you were using the CNP name instead of the filename here (which would be wrong). I think it would be easier to read if you assigned the filename into a separate variable to make sure this expression here looks the same as deleteFromPolicyEngine.

In addition, I think instead of using the namespace I'd use the base directory. That way, if we end up having other code in another package using ResourceKindFile, there is no accidental conflict because we're only using the base name here.

Suggested change
resourceID := ipcacheTypes.NewResourceID(
ipcacheTypes.ResourceKindFile,
cnp.ObjectMeta.Namespace,
cnp.ObjectMeta.Name,
)
resourceID := ipcacheTypes.NewResourceID(
ipcacheTypes.ResourceKindFile,
option.Config.StaticCiliumNetworkPolicyPath,
filepath.Base(cnpFilePath),
)


type policyWatcher struct {
log logrus.FieldLogger
config *option.DaemonConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Using DaemonConfig for flags is no longer recommended, unless the Daemon needs to know the flag.

In this case here, I don't think it does (since we simply check if the policy watcher is nil or not). Therefore I suggest we instead add the option.Config.StaticCiliumNetworkPolicyPath as a cell flag like this:

func (cfg Config) Flags(flags *pflag.FlagSet) {

dir := p.config.StaticCiliumNetworkPolicyPath
err = watcher.Add(dir)
if err != nil {
p.log.WithFields(logrus.Fields{"Dir": dir, "err": err}).Error("Directory Watcher Add failed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

We're using "Dir" here but "dir" below, that's an inconsistency . I'd recommend using logfields.Path as a key instead. Also please use WithError(err):

Suggested change
p.log.WithFields(logrus.Fields{"Dir": dir, "err": err}).Error("Directory Watcher Add failed")
p.log.WithError(err).WithField(logfields.Path, dir).Error("Failed to watch policy directory. Policies will not be loaded from disk")

Comment on lines +171 to +173
} else {

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty else

for {
select {
case event := <-watcher.Events:
if !utf8.ValidString(event.Name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we now use the filename as the CNP name, I wonder if we want to enforce that the filename is a valid CNP name. We can always relax later, but that way we can be sure there is no breaking change if somehow cnp.Parse() becomes more restrictive in the future.

To validate if a filename is a valid CNP name, you can use validation.IsDNS1123Subdomain(event.Name)

@@ -48,6 +48,8 @@ const (
// by the previous agent instance. Can be overwritten by all other
// sources (except for unspec).
Restored Source = "restored"

Directory Source = "Directory"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some Go doc string here. Also:

Suggested change
Directory Source = "Directory"
Directory Source = "directory"

Comment on lines +146 to +150
watcher, err := fsnotify.NewWatcher()
if err != nil {
p.log.WithField("Err", err).Error("Initializing NewWatcher failed")
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, it would be great if we could automate retrying the watcher if it fails here, with a specific backoff. That way if some kind of one-off error occurs, the watcher can recover without requiring a Cilium agent restart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/major This PR introduces major new functionality to Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants