Skip to content

APPLY_SCORE Failing on Google Cloud Batch #422

@alex1craig

Description

@alex1craig

Description of the bug

I am trying to run pgsc_calc on Google Cloud Batch to score chromosome files that I imputed from an ancestry.com report. Most of the pipeline is running successfully, but it fails on APPLY_SCORE:

Process `PGSCATALOG_PGSCCALC:PGSCCALC:APPLY_SCORE:PLINK2_SCORE (895a1bc3-77e6-4a28-858d-fc5d38c877e9 chromosome 12 effect type additive 0)` terminated with an error exit status (6)

The issue seems to be caused by the .psam files that the pipeline generates not being accessible:

INFO:   Error: No samples in GRCh37_895a1bc3-77e6-4a28-858d-fc5d38c877e9_12.psam.Fusion Info:    fusion_version: 2.4.11-8ead802    clone_namespace: false    kernel_version: 6.6    disk_cache_size: 368Gb    max_open_files: 1048576

This is the executed command that causes the error:

INFO:   plink2             --threads 2             --memory 8192             --seed 31             --extract 895a1bc3-77e6-4a28-858d-fc5d38c877e9_12_additive_0.scorefile.gz                          --allow-extra-chr             --score 895a1bc3-77e6-4a28-858d-fc5d38c877e9_12_additive_0.scorefile.gz zs header-read cols=+scoresums,+denom,+fid list-variants no-mean-imputation   --error-on-freq-calc             --score-col-nums 3-6             --pfile vzs GRCh37_895a1bc3-77e6-4a28-858d-fc5d38c877e9_12             --out 895a1bc3-77e6-4a28-858d-fc5d38c877e9_12_additive_0

Some things I have tried:

  • I thought the issue might be with fusion, so I tried disabling it and rerunning the pipeline. I was met with a similar error: Error: Failed to open GRCh37_895a1bc3-77e6-4a28-858d-fc5d38c877e9_12.psam : No such file or directory. However, in this case the command exit status was 3 instead of 6.

  • Messing around with the format of my original .psam files. I tried these two formats, and neither seems to make a difference:

    Format 1:

    #IID	SEX
    SAMPLE	1
    

    Format 2:

    #FID	IID	SEX
    895a1bc3-77e6-4a28-858d-fc5d38c877e9	SAMPLE	1
    

I understand pgsc_calc may run in to issues with imputed chromosome files due to lack of WGS support, but I am able to successfully run the pipeline on the same chromosome files on a local linux machine. So it seems the issue is coming from something wrong with the cloud executor and not my imputed chromosome files.

Any help would be appreciated, thanks.

Command used and terminal output

nextflow run pgscatalog/pgsc_calc \
    -profile docker \
    -c nextflow.config \
    --input "$samplesheet_path" \
    --target_build GRCh37 \
    --pgs_id "$pgs_ids" \
    -work-dir "$work_dir" \
    --format json \

Relevant files

batch-logs.json

samplesheet.json

nextflow.log

System information

nextflow.config:

// Google Cloud Batch configuration for Nextflow
process {
    // Define the executor
    executor = 'google-batch'

    // Define the container image using an environment variable
    // Fallback to a generic gcloud image if not set
    container = System.getenv('CONTAINER_IMAGE') ?: 'gcr.io/google-containers/google-cloud-cli:latest'

    cpus = 7
    memory = '28.GB'
    time = '24.h'

    // Error strategy for potential preemptions (exit code 50001 for GCE Spot VM preemption via Batch)
    errorStrategy = { task.exitStatus == 50001 ? 'retry' : 'terminate' }
    maxRetries = 3
}

// Google Cloud specific settings
google {
    // Project ID and Location (Region) obtained from environment variables
    project = System.getenv('PROJECT_ID')
    location = System.getenv('GCP_REGION')

    batch.spot = false
}

// Enable Fusion
fusion.enabled = true
// Enable Wave container service
wave.enabled = true
// Enable Tower
tower.accessToken = System.getenv('TOWER_ACCESS_TOKEN')

// Enable Docker, required for container execution
docker.enabled = true

// Scope for Nextflow execution reports
report.enabled = true
timeline.enabled = true
trace.enabled = true

// Manifest info (optional)
manifest {
    name = 'pgscatalog/pgsc_calc'
    description = 'PGS Catalog Score Calculation pipeline'
    mainScript = 'main.nf'
} 

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinguser-queryUser queries & requests

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions