Skip to content

bfile works but pfile errors out #394

@batzler

Description

@batzler

Description of the bug

I'm submitting a pgsc_calc job successfully using bfile (plink bed) however when I use the pfile format (plink pgen) it fails for unknown reasons (I cant interpret the error anyway...).

Input files are as follows

-rw-r----- 1 batzler bsi 23377581643 Dec 4 10:53 pg22.bed
-rw-r----- 1 batzler bsi 30106514 Dec 4 10:53 pg22.bim
-rw-r----- 1 batzler bsi 8074325 Dec 4 10:53 pg22.fam
-rw-r----- 1 batzler bsi 1396 Dec 4 10:53 pg22.log
-rw-r----- 1 batzler bsi 1294 Dec 4 13:34 pg_imputed22.log
-rw-r----- 1 batzler bsi 23982440039 Dec 4 13:34 pg_imputed22.pgen
-rw-r----- 1 batzler bsi 7259937 Dec 4 13:34 pg_imputed22.psam
-rw-r----- 1 batzler bsi 78753551 Dec 4 13:34 pg_imputed22.pvar

plink bed/binary files were created from the pgen files

$PLINK2 --pfile pg_imputed22 --make-bed --out pg$CHR

When I run through pipeline using format bfile everything executes properly.

When running with the pfile format and the pgen files
Error traceback is as follows

Traceback (most recent call last):
File "/app/pgscatalog.utils/.venv/bin/pgscatalog-match", line 8, in
sys.exit(run_match())
^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/match_cli.py", line 87, in run_match
ipc_path = get_match_candidates(
^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/match_cli.py", line 124, in get_match_candidates
with variants as target_df:
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/variantframe.py", line 54, in enter
self.arrowpaths = loose(self.variants, tmpdir=self._tmpdir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/functools.py", line 909, in wrapper
return dispatch(args[0].class)(*args, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_arrow.py", line 94, in _
return batch_read(reader, tmpdir=tmpdir, cols_keep=cols_keep)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_arrow.py", line 102, in batch_read
batches = reader.next_batches(batch_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/polars/io/csv/batched_reader.py", line 134, in next_batches
batches = self._reader.next_batches(n)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: found more fields than defined in 'Schema'

Consider setting 'truncate_ragged_lines=True'.

Command used and terminal output

nextflow run pgscatalog/pgsc_calc -profile singularity --min_overlap 0.0001 --input ${samplesheet} --scorefile ${scorefile} --output ${outdir} -r ${pgsc_calc_version} -c ${project}/nxf_config.config --target_build ${target_build} --genotypes_cache $cachedir

Nextflow command is the same whether running format bfile or format pfile.  Only thing I change is the samplesheet to represent the different path_prefix and format

Relevant files

No response

System information

Nextflow version
nextflow/23.04.2

slurm executor
apptainer/singularity
linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinguser-queryUser queries & requests

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions