-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Description of the bug
I am trying to run pgsc_calc
on Google Cloud Batch to score chromosome files that I imputed from an ancestry.com report. Most of the pipeline is running successfully, but it fails on APPLY_SCORE:
Process `PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:PLINK2_SCORE (895a1bc3-77e6-4a28-858d-fc5d38c877e9 chromosome 12 effect type additive 0)` terminated with an error exit status (6)
The issue seems to be caused by the .psam
files that the pipeline generates not being accessible:
INFO: Error: No samples in GRCh37_895a1bc3-77e6-4a28-858d-fc5d38c877e9_12.psam.Fusion Info: fusion_version: 2.4.11-8ead802 clone_namespace: false kernel_version: 6.6 disk_cache_size: 368Gb max_open_files: 1048576
This is the executed command that causes the error:
INFO: plink2 --threads 2 --memory 8192 --seed 31 --extract 895a1bc3-77e6-4a28-858d-fc5d38c877e9_12_additive_0.scorefile.gz --allow-extra-chr --score 895a1bc3-77e6-4a28-858d-fc5d38c877e9_12_additive_0.scorefile.gz zs header-read cols=+scoresums,+denom,+fid list-variants no-mean-imputation --error-on-freq-calc --score-col-nums 3-6 --pfile vzs GRCh37_895a1bc3-77e6-4a28-858d-fc5d38c877e9_12 --out 895a1bc3-77e6-4a28-858d-fc5d38c877e9_12_additive_0
Some things I have tried:
-
I thought the issue might be with fusion, so I tried disabling it and rerunning the pipeline. I was met with a similar error:
Error: Failed to open GRCh37_895a1bc3-77e6-4a28-858d-fc5d38c877e9_12.psam : No such file or directory
. However, in this case the command exit status was3
instead of6
. -
Messing around with the format of my original .psam files. I tried these two formats, and neither seems to make a difference:
Format 1:
#IID SEX SAMPLE 1
Format 2:
#FID IID SEX 895a1bc3-77e6-4a28-858d-fc5d38c877e9 SAMPLE 1
I understand pgsc_calc
may run in to issues with imputed chromosome files due to lack of WGS support, but I am able to successfully run the pipeline on the same chromosome files on a local linux machine. So it seems the issue is coming from something wrong with the cloud executor and not my imputed chromosome files.
Any help would be appreciated, thanks.
Command used and terminal output
nextflow run pgscatalog/pgsc_calc \
-profile docker \
-c nextflow.config \
--input "$samplesheet_path" \
--target_build GRCh37 \
--pgs_id "$pgs_ids" \
-work-dir "$work_dir" \
--format json \
Relevant files
System information
nextflow.config
:
// Google Cloud Batch configuration for Nextflow
process {
// Define the executor
executor = 'google-batch'
// Define the container image using an environment variable
// Fallback to a generic gcloud image if not set
container = System.getenv('CONTAINER_IMAGE') ?: 'gcr.io/google-containers/google-cloud-cli:latest'
cpus = 7
memory = '28.GB'
time = '24.h'
// Error strategy for potential preemptions (exit code 50001 for GCE Spot VM preemption via Batch)
errorStrategy = { task.exitStatus == 50001 ? 'retry' : 'terminate' }
maxRetries = 3
}
// Google Cloud specific settings
google {
// Project ID and Location (Region) obtained from environment variables
project = System.getenv('PROJECT_ID')
location = System.getenv('GCP_REGION')
batch.spot = false
}
// Enable Fusion
fusion.enabled = true
// Enable Wave container service
wave.enabled = true
// Enable Tower
tower.accessToken = System.getenv('TOWER_ACCESS_TOKEN')
// Enable Docker, required for container execution
docker.enabled = true
// Scope for Nextflow execution reports
report.enabled = true
timeline.enabled = true
trace.enabled = true
// Manifest info (optional)
manifest {
name = 'pgscatalog/pgsc_calc'
description = 'PGS Catalog Score Calculation pipeline'
mainScript = 'main.nf'
}