Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch pool can resize continuously if nodes never become usable #574

Open
MattMcL4475 opened this issue Jan 23, 2024 · 0 comments
Open

Batch pool can resize continuously if nodes never become usable #574

MattMcL4475 opened this issue Jan 23, 2024 · 0 comments
Assignees
Labels
bug Something isn't working TES Priority: P2 Groomed to a Priority 2 issue

Comments

@MattMcL4475
Copy link
Collaborator

MattMcL4475 commented Jan 23, 2024

Describe the bug
If the Batch pool never is able to successfully scale up, it may continuously attempt to do so.

Steps to Reproduce
Create a Network Security Group for the Batch node subnet that has all outbound traffic blocked. This results in the Azure Batch service's node agent not able to call home, which results in the node becoming unusable

Expected behavior
TES should stop attempting to scale the pool back up, and TBD potentially stop scheduling tasks and fail Create Task API requests with a 503 error and include in the response a sentence such as "Permanent Azure Batch scaling issue - please contact your system administrator."

@MattMcL4475 MattMcL4475 added the bug Something isn't working label Jan 23, 2024
@BMurri BMurri self-assigned this Jan 23, 2024
@MattMcL4475 MattMcL4475 added the TES Priority: P2 Groomed to a Priority 2 issue label Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working TES Priority: P2 Groomed to a Priority 2 issue
Projects
None yet
Development

No branches or pull requests

2 participants