Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use blobfuse2 for streamable TesInputs #692

Open
MattMcL4475 opened this issue Apr 26, 2024 · 0 comments
Open

Use blobfuse2 for streamable TesInputs #692

MattMcL4475 opened this issue Apr 26, 2024 · 0 comments
Labels
enhancement New feature or request Performance Enable users can run task as cheap and as fast as possible Scalability Enable users can scale TES workloads

Comments

@MattMcL4475
Copy link
Collaborator

MattMcL4475 commented Apr 26, 2024

Problem:
Customers need the ability to perform random reads using a file system for large genomics reference files without downloading the entire file, which costs more and puts pressure on the storage account.

Solution:

  • If any TesInput.Streamable is set to true, the TES runner should download and install blobfuse2
  • It should aggregate all of the container mounts and only mount the minimum required mounts with blobfuse2 mount
  • It should ensure the path specified for the TesInput.path works

I confirmed that random reads in blobfuse2 work as expected:
blobfuse2 mount /ref --config-file=./b2.yaml
dd if=stLFR.split_read.1.fq.gz skip=50000000000 bs=1 count=128 iflag=skip_bytes 2>/dev/null | xxd

image
image

#!/bin/bash

# Azure Blob URL - NOTE SAS has been removed
blob_url="https://mattmcl.blob.core.windows.net/inputs/stLFR.split_read.1.fq.gz" 

# Byte range to download: Example uses the range from 50000000000 to 50000000127
range_start=50000000000
range_end=50000000127

# Using curl to download the specified byte range
curl -s -o downloaded_bytes.bin -H "Range: bytes=$range_start-$range_end" "$blob_url"
echo "From REST:"
# Display downloaded bytes in hex format for comparison
xxd downloaded_bytes.bin
echo "From blobfuse:"
# Optional: Compare with bytes extracted from the local file using dd
dd if=/ref/stLFR.split_read.1.fq.gz skip=$range_start bs=1 count=$((range_end - range_start + 1)) iflag=skip_bytes,count_bytes 2>/dev/null | xxd
@MattMcL4475 MattMcL4475 added enhancement New feature or request Performance Enable users can run task as cheap and as fast as possible Scalability Enable users can scale TES workloads labels May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Performance Enable users can run task as cheap and as fast as possible Scalability Enable users can scale TES workloads
Projects
None yet
Development

No branches or pull requests

1 participant