Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TES idempotency feature #707

Open
MattMcL4475 opened this issue May 10, 2024 · 0 comments
Open

Add TES idempotency feature #707

MattMcL4475 opened this issue May 10, 2024 · 0 comments
Labels
enhancement New feature or request Performance Enable users can run task as cheap and as fast as possible Scalability Enable users can scale TES workloads TES Priority: P2 Groomed to a Priority 2 issue

Comments

@MattMcL4475
Copy link
Collaborator

MattMcL4475 commented May 10, 2024

  • Add a system-level enum setting that makes TES idempotent. Default to Disabled. Values are "Disabled", "Enabled", "EnabledWithOutputCopying"
  • Add TesTask-level setting that makes the task idempotent. Default to Disabled. Values are "Disabled", "Enabled", "EnabledWithOutputCopying". If it's set at the task level, it shall override the system setting.
  • A Tes Task shall be considered identical for the sake of idempotency, if any previous TES task has the same:
  1. Has the exact same set of Inputs (same Urls)
  2. Has the same exact values for Executors

If EnabledWithOutputCopying, then TES shall use Azure server-side blob copy to copy the previous task's outputs to the current task output's specified location(s). This work item should be added to an in-memory queue and the task state shall be set to RUNNING. It should be done in a non-blocking way from the main task status checking loop, so as not to slow down overall task throughput (Tasks can have thousands of files that need to be copied, and even though it's done server side, calling that API 1000 times will take a while). Before starting the copy, the task state shall be set to RUNNING. There shall be two separate C# HostedServices that are long-running (Created in startup.cs), one and periodically checking if all of the copies are complete; then set the task state to COMPLETE. The other should be checking if any blob copy on the file(s) is already in progress, and if not, start the copy. If TES crashes, it should be able to pickup where it left off by looping through all RUNNING tasks and resuming each one that is currently copying inputs.

@MattMcL4475 MattMcL4475 added enhancement New feature or request Performance Enable users can run task as cheap and as fast as possible Scalability Enable users can scale TES workloads labels May 16, 2024
@BMurri BMurri added the TES Priority: P2 Groomed to a Priority 2 issue label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Performance Enable users can run task as cheap and as fast as possible Scalability Enable users can scale TES workloads TES Priority: P2 Groomed to a Priority 2 issue
Projects
None yet
Development

No branches or pull requests

2 participants