-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gateway API and EKS (recent regression): upstream connection timeout #32616
Comments
Hi there, Having the same problem with AWS EKS 1.27. Created an issue for this few days ago: "Gateway API backend PODs intermittent timeouts in hostNetwork mode" On my side, if the backend webserver POD and the envoy receiving the HTTP request from ALB are on the same workernode, it is 100% success. Otherwise, 100% failure. Bye |
@Smana Thanks for your issue, we didn't have any test coverage right now for bottlerocket (similar issue #32610), just curious if you are facing the same issue with Amazon Linux. @audacioustux @bhm-kyndryl It's hard to tell if the issue is the same, however, we have a couple of fixes merged recently in main, appreciate if you can test it out with main branch. Again, thanks a lot for your issue and comment. |
I've moved to AL2023 image from Bottlerocket, and it somehow got fixed completely. I'll try out the latest changes with Bottlerocket soon hopefully. |
Same here: that indeed fixed my issues by switching to AL2023. Running a few additional tests before closing. Is there an issue to follow for Bottlerocket support? |
I still have networking issues with AL2023, but that's probably another issue (maybe related to this) . I'm gonna try again with AL2 |
I don't think we have any issue to track the work for supporting bottlerocket though. |
We were upgrading cilium from We switched from bottlerocket to AL2023 and it worked for us, not ideal but will do it for now. |
It seems like the underlying issue is due to bottlerocket, but not related to Ingress/Gateway API implementation. I am closing this issue as we are already having a couple of bottlerocket related issue (e.g. #32610). Feel free to re-open if you think otherwise. Thanks all. |
Yes thank you @sayboras . I'm working on figure out how AL2023 isn't working properly too. But indeed, this is not related and I'll open an issue if necessary. |
Is there an existing issue for this?
What happened?
Hey,
Recently Gateway API stopped working on EKS with this error:
At firstt I thought this was caused by recent changes in my demo repo, so I tried a branch that I already used for demo purposes (with Gateway API working perfectly). Unfortunately, even without any changes in the code there's a regression. That's probably on AWS side but I didn't find the culprit so far:
Note that the traffic reaches the envoy service and there are no TLS issues, but envoy returns a 503.
Everything seems ok from the Gateway API resources perspective
httproute
gateway
Of course, I checked obvious things such as the service being reachable using port-forward.
Regards,
Smana
Cilium Version
Tested with
v1.15.5
v1.15.3
v1.15.0
Kernel Version
The one provided in the AMI
bottlerocket-aws-k8s-1.29-x86_64-v1.20.0-fcf71a47
Kubernetes Version
v1.29.4
Regression
That seems to be a regression but not related directly to Cilium changes.
Indeed using a branch that has already been used for GAPI purposes does not work anymore. (Same behavior)
Sysdump
cilium-sysdump-20240519-112239.zip
Relevant log output
No response
Anything else?
I've search for similar issues but they are pretty old:
#23906
#20942
Cilium Users Document
Code of Conduct
The text was updated successfully, but these errors were encountered: