Skip to content

feat(AzBatchService): Include cpu-shares and memory in docker run command #5799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 19, 2025

Conversation

adamrtalbot
Copy link
Collaborator

@adamrtalbot adamrtalbot commented Feb 18, 2025

The Azure Batch task did not include the --cpu-shares or --memory options of the docker run command, meaning resource allocation was only controlled by slots on the node. This PR introduces the additional options to the container run command, including --cpu-shares and --memory. The behaviour will now be more similar to AWS Batch which includes a soft limit on CPUs and hard limit on memory.

…mand

The Azure Batch task did not include the --cpu-shares or --memory options of the docker run command, meaning resource allocation was only controlled by slots on the node. This PR introduces the additional options to the container run command, including --cpu-shares and --memory. The behaviour will now be more similar to AWS Batch which includes a soft limit on CPUs and hard limit on memory. It also refactors the container options to use a StringBuilder class instead of string concatenation.

Signed-off-by: adamrtalbot <[email protected]>
Copy link

netlify bot commented Feb 18, 2025

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 549bb8f
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/67b6497fce5d2500083634ce
😎 Deploy Preview https://deploy-preview-5799--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I'd prefer readability in this code, seem the befits of string builder is negligible in this scenario.

@adamrtalbot
Copy link
Collaborator Author

I quite like the StringBuilder 😢 but I can revert it quickly enough.

@adamrtalbot
Copy link
Collaborator Author

Nice. I'd prefer readability in this code, seem the befits of string builder is negligible in this scenario.

Reverted in e0daedb

Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @adamrtalbot, well done

@pditommaso pditommaso merged commit f9c0cbf into master Feb 19, 2025
5 checks passed
@pditommaso pditommaso deleted the azure_include_container_options_in_task_run branch February 19, 2025 21:15
@KH-NN
Copy link

KH-NN commented Jun 2, 2025

Hi,

Just fyi. This PR has implications on AZCOPY. If you set memory to e.g. 4GB azcopy will crash when transferring multiple large files (50GB+). For large files we found that a minimum of 16GB is required to copy the files to the VM.

Due to code in link it will then try to generate a folder that also crashs. It took a bit of time to find this bug :)

@adamrtalbot
Copy link
Collaborator Author

Just fyi. This PR has implications on AZCOPY. If you set memory to e.g. 4GB azcopy will crash when transferring multiple large files (50GB+). For large files we found that a minimum of 16GB is required to copy the files to the VM.

Is this because azcopy requires more than 4GB? This is now a hard limit, so if azcopy tries to use more than 4GB it will cause issues. In theory, it will cause fewer issues when packing large number of tasks onto a single node.

I'm not massively familiar with azcopy memory control but we could see if we could control azcopy memory usage, e.g. take longer but stay under that limit.

@adamrtalbot
Copy link
Collaborator Author

Due to code in link it will then try to generate a folder that also crashs. It took a bit of time to find this bug :)

I've created issue #6158 for this, please drop a reproducible example there and the impact this has on your workflow there.

@adamrtalbot
Copy link
Collaborator Author

adamrtalbot commented Jun 4, 2025

Looks like we should be able to fix this by setting AZCOPY_BUFFER_GB: https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-optimize#optimize-memory-use

#6161

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants