Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws: fix assertion failure in debug due to timer access outside of main thread #34138

Merged
merged 31 commits into from
May 29, 2024

Conversation

nbaws
Copy link
Contributor

@nbaws nbaws commented May 14, 2024

Commit Message: aws: fix assertion failure in debug due to timer access outside of main thread

Additional Description:

Patch to ensure async credential providers do not access main thread timers when credentials are requested. This was a larger fix than originally expected, due to substantial test case rewrites, changes to refresh logic and changes to the cluster initialisation sequence.

This patch performs the following:

  • Changes async providers to kick off credential refresh process via onClusterAddOrUpdate, replacing the init handler
  • Changes async providers to be truly async. They now perform credential refresh based on a timer, set to the expiration time returned in the credential payload, or to the default cache duration of 1 hour. Async providers will no longer trigger credential refresh via the data plane (which was actually the cause of the original bug). Async providers will start refreshing with a 2 second timer and doubling to 32 seconds (or until they succeed) avoiding load on STS or IMDS.
  • Fixes the cache duration calculation to be actually 1 hour, rather than 1 hour * 60 * 60 :)
  • Fixes a reported bug that also caused assertion failure in route specific configuration
  • Async providers now honour any expiration provided from their credential source
  • Async providers now have statistics to capture number of success/failed refreshes, used in test cases
  • Webidentity credential provider now handles the (unlikely) case where more than one region is present in the configuration, such as multiple route specific configs. Webidentity sts clusters will have the region name appended.

Risk Level: Low
Testing: Unit
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue] #33962
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

Signed-off-by: Nigel Brittain <nbaws@amazon.com>
nbaws added 3 commits May 15, 2024 01:27
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
nbaws added 2 commits May 16, 2024 01:21
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
@nbaws nbaws closed this May 16, 2024
nbaws added 4 commits May 17, 2024 07:51
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
@nbaws nbaws reopened this May 18, 2024
@nbaws
Copy link
Contributor Author

nbaws commented May 18, 2024

@suniltheta

@nbaws nbaws closed this May 18, 2024
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
@nbaws nbaws reopened this May 19, 2024
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
@nbaws
Copy link
Contributor Author

nbaws commented May 19, 2024

A note on this PR - the majority of the change is to implement a handler for onClusterAddOrDelete, and that handler literally calls refresh() once the cluster is ready to handle requests and is never used again. If there is a shorter and/or cleaner way to do this, please let me know. I've used code from dynamic forward proxy to do most of the onClusterAddOrDelete handling.

nbaws added 7 commits May 19, 2024 11:23
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
@mattklein123
Copy link
Member

@suniltheta for first pass, thanks.

/wait

Copy link
Contributor

@suniltheta suniltheta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for making this change. I am still in process of reviewing tests. Though I would leave f/b as I have them.

Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
@nbaws
Copy link
Contributor Author

nbaws commented May 21, 2024

/retest

nbaws and others added 9 commits May 25, 2024 11:21
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
…metadata_async

Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Signed-off-by: Nigel Brittain <nbaws@amazon.com>
@nbaws
Copy link
Contributor Author

nbaws commented May 27, 2024

@suniltheta some minor refactoring - there was an initialization issue I found with upstream filter that I've now addressed.

Signed-off-by: Nigel Brittain <nbaws@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants