CLI¶

This page describes the command line interface (CLI) for PyPGx.

For getting help on the CLI:

$ pypgx -h

usage: pypgx [-h] [-v] COMMAND ...

positional arguments:
  COMMAND
    call-genotypes      Call genotypes for target gene.
    call-phenotypes     Call phenotypes for target gene.
    combine-results     Combine various results for target gene.
    compare-genotypes   Calculate concordance between two genotype results.
    compute-control-statistics
                        Compute summary statistics for control gene from BAM
                        files.
    compute-copy-number
                        Compute copy number from read depth for target gene.
    compute-target-depth
                        Compute read depth for target gene from BAM files.
    create-consolidated-vcf
                        Create a consolidated VCF file.
    create-input-vcf    Call SNVs/indels from BAM files for all target genes.
    create-regions-bed  Create a BED file which contains all regions used by
                        PyPGx.
    estimate-phase-beagle
                        Estimate haplotype phase of observed variants with
                        the Beagle program.
    filter-samples      Filter Archive file for specified samples.
    import-read-depth   Import read depth data for target gene.
    import-variants     Import SNV/indel data for target gene.
    plot-bam-copy-number
                        Plot copy number profile from CovFrame[CopyNumber].
    plot-bam-read-depth
                        Plot read depth profile with BAM data.
    plot-cn-af          Plot both copy number profile and allele fraction
                        profile in one figure.
    plot-vcf-allele-fraction
                        Plot allele fraction profile with VCF data.
    plot-vcf-read-depth
                        Plot read depth profile with VCF data.
    predict-alleles     Predict candidate star alleles based on observed
                        variants.
    predict-cnv         Predict CNV from copy number data for target gene.
    prepare-depth-of-coverage
                        Prepare a depth of coverage file for all target
                        genes with SV from BAM files.
    print-data          Print the main data of specified archive.
    print-metadata      Print the metadata of specified archive.
    run-chip-pipeline   Run genotyping pipeline for chip data.
    run-long-read-pipeline
                        Run genotyping pipeline for long-read sequencing data.
    run-ngs-pipeline    Run genotyping pipeline for NGS data.
    slice-bam           Slice BAM file for all genes used by PyPGx.
    test-cnv-caller     Test CNV caller for target gene.
    train-cnv-caller    Train CNV caller for target gene.

options:
  -h, --help            Show this help message and exit.
  -v, --version         Show the version number and exit.

For getting help on a specific command (e.g. call-genotypes):

$ pypgx call-genotypes -h

call-genotypes¶

$ pypgx call-genotypes -h
usage: pypgx call-genotypes [-h] [--alleles PATH] [--cnv-calls PATH] genotypes

Call genotypes for target gene.

Positional arguments:
  genotypes         Output archive file with the semantic type
                    SampleTable[Genotypes].

Optional arguments:
  -h, --help        Show this help message and exit.
  --alleles PATH    Input archive file with the semantic type
                    SampleTable[Alleles].
  --cnv-calls PATH  Input archive file with the semantic type
                    SampleTable[CNVCalls].

call-phenotypes¶

$ pypgx call-phenotypes -h
usage: pypgx call-phenotypes [-h] genotypes phenotypes

Call phenotypes for target gene.

Positional arguments:
  genotypes   Input archive file with the semantic type
              SampleTable[Genotypes].
  phenotypes  Output archive file with the semantic type
              SampleTable[Phenotypes].

Optional arguments:
  -h, --help  Show this help message and exit.

combine-results¶

$ pypgx combine-results -h
usage: pypgx combine-results [-h] [--genotypes PATH] [--phenotypes PATH]
                             [--alleles PATH] [--cnv-calls PATH]
                             results

Combine various results for target gene.

Positional arguments:
  results            Output archive file with the semantic type
                     SampleTable[Results].

Optional arguments:
  -h, --help         Show this help message and exit.
  --genotypes PATH   Input archive file with the semantic type
                     SampleTable[Genotypes].
  --phenotypes PATH  Input archive file with the semantic type
                     SampleTable[Phenotypes].
  --alleles PATH     Input archive file with the semantic type
                     SampleTable[Alleles].
  --cnv-calls PATH   Input archive file with the semantic type
                     SampleTable[CNVCalls].

compare-genotypes¶

$ pypgx compare-genotypes -h
usage: pypgx compare-genotypes [-h] [--verbose] first second

Calculate concordance between two genotype results.

Only samples that appear in both genotype results will be used to calculate
concordance for genotype calls as well as CNV calls.

Positional arguments:
  first       First archive file with the semantic type
              SampleTable[Results].
  second      Second archive file with the semantic type
              SampleTable[Results].

Optional arguments:
  -h, --help  Show this help message and exit.
  --verbose   Whether to print the verbose version of output, including
              discordant calls.

compute-control-statistics¶

$ pypgx compute-control-statistics -h
usage: pypgx compute-control-statistics [-h] [--assembly TEXT] [--bed PATH]
                                        gene control-statistics bams
                                        [bams ...]

Compute summary statistics for control gene from BAM files.

Note that for the arguments gene and --bed, the 'chr' prefix in contig names
(e.g. 'chr1' vs. '1') will be automatically added or removed as necessary to
match the input BAM's contig names.

Positional arguments:
  gene                Control gene (recommended choices: 'EGFR', 'RYR1',
                      'VDR'). Alternatively, you can provide a custom region
                      (format: chrom:start-end).
  control-statistics  Output archive file with the semantic type
                      SampleTable[Statistics].
  bams                One or more input BAM files. Alternatively, you can
                      provide a text file (.txt, .tsv, .csv, or .list)
                      containing one BAM file per line.

Optional arguments:
  -h, --help          Show this help message and exit.
  --assembly TEXT     Reference genome assembly (default: 'GRCh37')
                      (choices: 'GRCh37', 'GRCh38').
  --bed PATH          By default, the input data is assumed to be WGS. If
                      it's targeted sequencing, you must provide a BED file
                      to indicate probed regions.

[Example] For the VDR gene from WGS data:
  $ pypgx compute-control-statistics \
  VDR \
  control-statistics.zip \
  1.bam 2.bam

[Example] For a custom region from targeted sequencing data:
  $ pypgx compute-control-statistics \
  chr1:100-200 \
  control-statistics.zip \
  bam.list \
  --bed probes.bed

compute-copy-number¶

$ pypgx compute-copy-number -h
usage: pypgx compute-copy-number [-h] [--samples-without-sv TEXT [TEXT ...]]
                                 read-depth control-statistics copy-number

Compute copy number from read depth for target gene.

The command will convert read depth to copy number by performing intra-sample
normalization using summary statistics from the control gene.

During copy number analysis, if the input data is targeted sequencing, the
command will apply inter-sample normalization using summary statistics across
all samples. For best results, it is recommended to specify known samples
without SV using --samples-without-sv.

Positional arguments:
  read-depth            Input archive file with the semantic type
                        CovFrame[ReadDepth].
  control-statistics    Input archive file with the semantic type
                        SampleTable[Statistics].
  copy-number           Output archive file with the semantic type
                        CovFrame[CopyNumber].

Optional arguments:
  -h, --help            Show this help message and exit.
  --samples-without-sv TEXT [TEXT ...]
                        List of known samples with no SV.

compute-target-depth¶

$ pypgx compute-target-depth -h
usage: pypgx compute-target-depth [-h] [--assembly TEXT] [--bed PATH]
                                  gene read-depth bams [bams ...]

Compute read depth for target gene from BAM files.

Positional arguments:
  gene             Target gene.
  read-depth       Output archive file with the semantic type
                   CovFrame[ReadDepth].
  bams             One or more input BAM files. Alternatively, you can
                   provide a text file (.txt, .tsv, .csv, or .list)
                   containing one BAM file per line.

Optional arguments:
  -h, --help       Show this help message and exit.
  --assembly TEXT  Reference genome assembly (default: 'GRCh37')
                   (choices: 'GRCh37', 'GRCh38').
  --bed PATH       By default, the input data is assumed to be WGS. If it
                   is targeted sequencing, you must provide a BED file to
                   indicate probed regions.

[Example] For the CYP2D6 gene from WGS data:
  $ pypgx compute-target-depth \
  CYP2D6 \
  read-depth.zip \
  1.bam 2.bam

[Example] For the CYP2D6 gene from targeted sequencing data:
  $ pypgx compute-target-depth \
  CYP2D6 \
  read-depth.zip \
  bam.list \
  --bed probes.bed

create-consolidated-vcf¶

$ pypgx create-consolidated-vcf -h
usage: pypgx create-consolidated-vcf [-h]
                                     imported-variants phased-variants
                                     consolidated-variants

Create a consolidated VCF file.

Positional arguments:
  imported-variants     Input archive file with the semantic type
                        VcfFrame[Imported].
  phased-variants       Input archive file with the semantic type
                        VcfFrame[Phased].
  consolidated-variants
                        Output archive file with the semantic type
                        VcfFrame[Consolidated].

Optional arguments:
  -h, --help            Show this help message and exit.

create-input-vcf¶

$ pypgx create-input-vcf -h
usage: pypgx create-input-vcf [-h] [--assembly TEXT] [--genes TEXT [TEXT ...]]
                              [--exclude] [--dir-path PATH] [--max-depth INT]
                              vcf fasta bams [bams ...]

Call SNVs/indels from BAM files for all target genes.

To save computing resources, this method will call variants only for target
genes whose at least one star allele is defined by SNVs/indels. Therefore,
variants will not be called for target genes that have star alleles defined
only by structural variation (e.g. UGT2B17).

Positional arguments:
  vcf                   Output VCF file. It must have .vcf.gz as suffix.
  fasta                 Reference FASTA file.
  bams                  One or more input BAM files. Alternatively, you can
                        provide a text file (.txt, .tsv, .csv, or .list)
                        containing one BAM file per line.

Optional arguments:
  -h, --help            Show this help message and exit.
  --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                        (choices: 'GRCh37', 'GRCh38').
  --genes TEXT [TEXT ...]
                        List of genes to include.
  --exclude             Exclude specified genes. Ignored when --genes is not
                        used.
  --dir-path PATH       By default, intermediate files (likelihoods.bcf,
                        calls.bcf, and calls.normalized.bcf) will be stored
                        in a temporary directory, which is automatically
                        deleted after creating final VCF. If you provide a
                        directory path, intermediate files will be stored
                        there.
  --max-depth INT       At a position, read maximally this number of reads
                        per input file (default: 250). If your input data is
                        from WGS (e.g. 30X), you don't need to change this
                        option. However, if it's from targeted sequencing
                        with ultra-deep coverage (e.g. 500X), then you need
                        to increase the maximum depth.

create-regions-bed¶

$ pypgx create-regions-bed -h
usage: pypgx create-regions-bed [-h] [--assembly TEXT] [--add-chr-prefix]
                                [--merge] [--target-genes] [--sv-genes]
                                [--var-genes] [--genes TEXT [TEXT ...]]
                                [--exclude]

Create a BED file which contains all regions used by PyPGx.

Optional arguments:
  -h, --help            Show this help message and exit.
  --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                        (choices: 'GRCh37', 'GRCh38').
  --add-chr-prefix      Whether to add the 'chr' string in contig names.
  --merge               Whether to merge overlapping intervals (gene names
                        will be removed too).
  --target-genes        Whether to only return target genes, excluding
                        control genes and paralogs.
  --sv-genes            Whether to only return target genes whose at least
                        one star allele is defined by structural variation
  --var-genes           Whether to only return target genes whose at least
                        one star allele is defined by SNVs/indels.
  --genes TEXT [TEXT ...]
                        List of genes to include.
  --exclude             Exclude specified genes. Ignored when --genes is not
                        used.

estimate-phase-beagle¶

$ pypgx estimate-phase-beagle -h
usage: pypgx estimate-phase-beagle [-h] [--panel PATH] [--impute]
                                   imported-variants phased-variants

Estimate haplotype phase of observed variants with the Beagle program.

Positional arguments:
  imported-variants  Input archive file with the semantic type
                     VcfFrame[Imported]. The 'chr' prefix in contig names
                     (e.g. 'chr1' vs. '1') will be automatically added or
                     removed as necessary to match the reference VCF's contig
                     names.
  phased-variants    Output archive file with the semantic type
                     VcfFrame[Phased].

Optional arguments:
  -h, --help         Show this help message and exit.
  --panel PATH       VCF file (compressed or uncompressed) corresponding to a
                     reference haplotype panel. By default, the 1KGP panel in
                     the pypgx-bundle directory will be used.
  --impute           Perform imputation of missing genotypes.

filter-samples¶

$ pypgx filter-samples -h
usage: pypgx filter-samples [-h] [--exclude]
                            input output samples [samples ...]

Filter Archive file for specified samples.

Positional arguments:
  input       Input archive file.
  output      Output archive file.
  samples     Specify which samples should be included for analysis
              by providing a text file (.txt, .tsv, .csv, or .list)
              containing one sample per line. Alternatively, you can
              provide a list of samples.

Optional arguments:
  -h, --help  Show this help message and exit.
  --exclude   Exclude specified samples.

import-read-depth¶

$ pypgx import-read-depth -h
usage: pypgx import-read-depth [-h] [--samples TEXT [TEXT ...]] [--exclude]
                               gene depth-of-coverage read-depth

Import read depth data for target gene.

Positional arguments:
  gene                  Target gene.
  depth-of-coverage     Input archive file with the semantic type
                        CovFrame[DepthOfCoverage].
  read-depth            Output archive file with the semantic type
                        CovFrame[ReadDepth].

Optional arguments:
  -h, --help            Show this help message and exit.
  --samples TEXT [TEXT ...]
                        Specify which samples should be included for analysis
                        by providing a text file (.txt, .tsv, .csv, or .list)
                        containing one sample per line. Alternatively, you can
                        provide a list of samples.
  --exclude             Exclude specified samples.

import-variants¶

$ pypgx import-variants -h
usage: pypgx import-variants [-h] [--assembly TEXT] [--platform TEXT]
                             [--samples TEXT [TEXT ...]] [--exclude]
                             gene vcf imported-variants

Import SNV/indel data for target gene.

The command will slice the input VCF for the target gene to create an archive
file with the semantic type VcfFrame[Imported] or VcfFrame[Consolidated].

Positional arguments:
  gene                  Target gene.
  vcf                   Input VCF file must be already BGZF compressed (.gz)
                        and indexed (.tbi) to allow random access.
  imported-variants     Output archive file with the semantic type
                        VcfFrame[Imported] or VcfFrame[Consolidated].

Optional arguments:
  -h, --help            Show this help message and exit.
  --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                        (choices: 'GRCh37', 'GRCh38').
  --platform TEXT       Genotyping platform used (default: 'WGS') (choices:
                        'WGS', 'Targeted', 'Chip', 'LongRead'). When the
                        platform is 'WGS', 'Targeted', or 'Chip', the command
                        will assess whether every genotype call in the sliced
                        VCF is haplotype phased (e.g. '0|1'). If the sliced
                        VCF is fully phased, the command will return
                        VcfFrame[Consolidated] or otherwise
                        VcfFrame[Imported]. When the platform is 'LongRead',
                        the command will return VcfFrame[Consolidated] after
                        applying the phase-extension algorithm to estimate
                        haplotype phase of any variants that could not be
                        resolved by read-backed phasing.
  --samples TEXT [TEXT ...]
                        Specify which samples should be included for analysis
                        by providing a text file (.txt, .tsv, .csv, or .list)
                        containing one sample per line. Alternatively, you
                        can provide a list of samples.
  --exclude             Exclude specified samples.

plot-bam-copy-number¶

$ pypgx plot-bam-copy-number -h
usage: pypgx plot-bam-copy-number [-h] [--fitted] [--path PATH]
                                  [--samples TEXT [TEXT ...]] [--ymin FLOAT]
                                  [--ymax FLOAT] [--fontsize FLOAT]
                                  copy-number

Plot copy number profile from CovFrame[CopyNumber].

Positional arguments:
  copy-number           Input archive file with the semantic type
                        CovFrame[CopyNumber].

Optional arguments:
  -h, --help            Show this help message and exit.
  --fitted              Show the fitted line as well.
  --path PATH           Create plots in this directory (default: current
                        directory).
  --samples TEXT [TEXT ...]
                        Specify which samples should be included for analysis
                        by providing a text file (.txt, .tsv, .csv, or .list)
                        containing one sample per line. Alternatively, you can
                        provide a list of samples.
  --ymin FLOAT          Y-axis bottom (default: -0.3).
  --ymax FLOAT          Y-axis top (default: 6.3).
  --fontsize FLOAT      Text fontsize (default: 25).

plot-bam-read-depth¶

$ pypgx plot-bam-read-depth -h
usage: pypgx plot-bam-read-depth [-h] [--path PATH]
                                 [--samples TEXT [TEXT ...]] [--ymin FLOAT]
                                 [--ymax FLOAT] [--fontsize FLOAT]
                                 read-depth

Plot read depth profile with BAM data.

Positional arguments:
  read-depth            Input archive file with the semantic type
                        CovFrame[ReadDepth].

Optional arguments:
  -h, --help            Show this help message and exit.
  --path PATH           Create plots in this directory (default: current
                        directory).
  --samples TEXT [TEXT ...]
                        Specify which samples should be included for analysis
                        by providing a text file (.txt, .tsv, .csv, or .list)
                        containing one sample per line. Alternatively, you can
                        provide a list of samples.
  --ymin FLOAT          Y-axis bottom.
  --ymax FLOAT          Y-axis top.
  --fontsize FLOAT      Text fontsize (default: 25).

plot-cn-af¶

$ pypgx plot-cn-af -h
usage: pypgx plot-cn-af [-h] [--path PATH] [--samples TEXT [TEXT ...]]
                        [--ymin FLOAT] [--ymax FLOAT] [--fontsize FLOAT]
                        copy-number imported-variants

Plot both copy number profile and allele fraction profile in one figure.

Positional arguments:
  copy-number           Input archive file with the semantic type
                        CovFrame[CopyNumber].
  imported-variants     Input archive file with the semantic type
                        VcfFrame[Imported].

Optional arguments:
  -h, --help            Show this help message and exit.
  --path PATH           Create plots in this directory (default: current
                        directory).
  --samples TEXT [TEXT ...]
                        Specify which samples should be included for analysis
                        by providing a text file (.txt, .tsv, .csv, or .list)
                        containing one sample per line. Alternatively, you can
                        provide a list of samples.
  --ymin FLOAT          Y-axis bottom (default: -0.3).
  --ymax FLOAT          Y-axis top (default: 6.3).
  --fontsize FLOAT      Text fontsize (default: 25).

plot-vcf-allele-fraction¶

$ pypgx plot-vcf-allele-fraction -h
usage: pypgx plot-vcf-allele-fraction [-h] [--path PATH]
                                      [--samples TEXT [TEXT ...]]
                                      [--fontsize FLOAT]
                                      imported-variants

Plot allele fraction profile from VcfFrame[Imported].

Positional arguments:
  imported-variants     Input archive file with the semantic type
                        VcfFrame[Imported].

Optional arguments:
  -h, --help            Show this help message and exit.
  --path PATH           Create plots in this directory (default: current
                        directory).
  --samples TEXT [TEXT ...]
                        Specify which samples should be included for analysis
                        by providing a text file (.txt, .tsv, .csv, or .list)
                        containing one sample per line. Alternatively, you can
                        provide a list of samples.
  --fontsize FLOAT      Text fontsize (default: 25).

plot-vcf-read-depth¶

$ pypgx plot-vcf-read-depth -h
usage: pypgx plot-vcf-read-depth [-h] [--assembly TEXT] [--path PATH]
                                 [--samples TEXT [TEXT ...]] [--ymin FLOAT]
                                 [--ymax FLOAT]
                                 gene vcf

Plot read depth profile with VCF data.

Positional arguments:
  gene                  Target gene.
  vcf                   Input VCF file.

Optional arguments:
  -h, --help            Show this help message and exit.
  --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                        (choices: 'GRCh37', 'GRCh38').
  --path PATH           Create plots in this directory (default: current
                        directory).
  --samples TEXT [TEXT ...]
                        Specify which samples should be included for analysis
                        by providing a text file (.txt, .tsv, .csv, or .list)
                        containing one sample per line. Alternatively, you can
                        provide a list of samples.
  --ymin FLOAT          Y-axis bottom.
  --ymax FLOAT          Y-axis top.

predict-alleles¶

$ pypgx predict-alleles -h
usage: pypgx predict-alleles [-h] consolidated-variants alleles

Predict candidate star alleles based on observed variants.

Positional arguments:
  consolidated-variants
                        Input archive file with the semantic type
                        VcfFrame[Consolidated].
  alleles               Output archive file with the semantic type
                        SampleTable[Alleles].

Optional arguments:
  -h, --help            Show this help message and exit.

predict-cnv¶

$ pypgx predict-cnv -h
usage: pypgx predict-cnv [-h] [--cnv-caller PATH] copy-number cnv-calls

Predict CNV from copy number data for target gene.

Genomic positions that are missing copy number because, for example, the
input data is targeted sequencing will be imputed with forward filling.

Positional arguments:
  copy-number        Input archive file with the semantic type
                     CovFrame[CopyNumber].
  cnv-calls          Output archive file with the semantic type
                     SampleTable[CNVCalls].

Optional arguments:
  -h, --help         Show this help message and exit.
  --cnv-caller PATH  Archive file with the semantic type Model[CNV]. By
                     default, a pre-trained CNV caller in the pypgx-bundle
                     directory will be used.

prepare-depth-of-coverage¶

$ pypgx prepare-depth-of-coverage -h
usage: pypgx prepare-depth-of-coverage [-h] [--assembly TEXT] [--bed PATH]
                                       [--genes TEXT [TEXT ...]] [--exclude]
                                       depth-of-coverage bams [bams ...]

Prepare a depth of coverage file for all target genes with SV from BAM files.

To save computing resources, this method will count read depth only for
target genes whose at least one star allele is defined by structural
variation. Therefore, read depth will not be computed for target genes that
have star alleles defined only by SNVs/indels (e.g. CYP3A5).

Positional arguments:
  depth-of-coverage     Output archive file with the semantic type
                        CovFrame[DepthOfCoverage].
  bams                  One or more input BAM files. Alternatively, you can
                        provide a text file (.txt, .tsv, .csv, or .list)
                        containing one BAM file per line.

Optional arguments:
  -h, --help            Show this help message and exit.
  --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                        (choices: 'GRCh37', 'GRCh38').
  --bed PATH            By default, the input data is assumed to be WGS. If
                        it's targeted sequencing, you must provide a BED file
                        to indicate probed regions. Note that the 'chr' prefix
                        in contig names (e.g. 'chr1' vs. '1') will be
                        automatically added or removed as necessary to match
                        the input BAM's contig names.
  --genes TEXT [TEXT ...]
                        List of genes to include.
  --exclude             Exclude specified genes. Ignored when --genes is not
                        used.

[Example] From WGS data:
  $ pypgx prepare-depth-of-coverage \
  depth-of-coverage.zip \
  1.bam 2.bam

[Example] From targeted sequencing data:
  $ pypgx prepare-depth-of-coverage \
  depth-of-coverage.zip \
  bam.list \
  --bed probes.bed

print-data¶

$ pypgx print-data -h
usage: pypgx print-data [-h] input

Print the main data of specified archive.

Positional arguments:
  input       Input archive file.

Optional arguments:
  -h, --help  Show this help message and exit.

print-metadata¶

$ pypgx print-metadata -h
usage: pypgx print-metadata [-h] input

Print the metadata of specified archive.

Positional arguments:
  input       Input archive file.

Optional arguments:
  -h, --help  Show this help message and exit.

run-chip-pipeline¶

$ pypgx run-chip-pipeline -h
usage: pypgx run-chip-pipeline [-h] [--assembly TEXT] [--panel PATH]
                               [--impute] [--force]
                               [--samples TEXT [TEXT ...]] [--exclude]
                               gene output variants

Run genotyping pipeline for chip data.

Positional arguments:
  gene                  Target gene.
  output                Output directory.
  variants              Input VCF file must be already BGZF compressed (.gz)
                        and indexed (.tbi) to allow random access.
                        Statistical haplotype phasing will be skipped if
                        input VCF is already fully phased.

Optional arguments:
  -h, --help            Show this help message and exit.
  --assembly TEXT
                        Reference genome assembly (default: 'GRCh37')
                        (choices: 'GRCh37', 'GRCh38').
  --panel PATH          VCF file corresponding to a reference haplotype panel
                        (compressed or uncompressed). By default, the 1KGP
                        panel in the pypgx-bundle directory will be used.
  --impute              Perform imputation of missing genotypes.
  --force               Overwrite output directory if it already exists.
  --samples TEXT [TEXT ...]
                        Specify which samples should be included for analysis
                        by providing a text file (.txt, .tsv, .csv, or .list)
                        containing one sample per line. Alternatively, you
                        can provide a list of samples.
  --exclude             Exclude specified samples.

[Example] To genotype the CYP3A5 gene from chip data:
  $ pypgx run-chip-pipeline \
  CYP3A5 \
  CYP3A5-pipeline \
  variants.vcf.gz

run-long-read-pipeline¶

$ pypgx run-long-read-pipeline -h
usage: pypgx run-long-read-pipeline [-h] [--assembly TEXT] [--force]
                                    [--samples TEXT [TEXT ...]] [--exclude]
                                    gene output variants

Run genotyping pipeline for long-read sequencing data.

Positional arguments:
  gene                  Target gene.
  output                Output directory.
  variants              Input VCF file must be already BGZF compressed (.gz)
                        and indexed (.tbi) to allow random access.

Optional arguments:
  -h, --help            Show this help message and exit.
  --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                        (choices: 'GRCh37', 'GRCh38').
  --force               Overwrite output directory if it already exists.
  --samples TEXT [TEXT ...]
                        Specify which samples should be included for analysis
                        by providing a text file (.txt, .tsv, .csv, or .list)
                        containing one sample per line. Alternatively, you
                        can provide a list of samples.
  --exclude             Exclude specified samples.

[Example] To genotype the CYP3A5 gene from long-read sequencing data:
  $ pypgx run-long-read-pipeline \
  CYP3A5 \
  CYP3A5-pipeline \
  variants.vcf.gz

run-ngs-pipeline¶

$ pypgx run-ngs-pipeline -h
usage: pypgx run-ngs-pipeline [-h] [--variants PATH]
                              [--depth-of-coverage PATH]
                              [--control-statistics PATH] [--platform TEXT]
                              [--assembly TEXT] [--panel PATH] [--force]
                              [--samples TEXT [TEXT ...]] [--exclude]
                              [--samples-without-sv TEXT [TEXT ...]]
                              [--do-not-plot-copy-number]
                              [--do-not-plot-allele-fraction]
                              [--cnv-caller PATH]
                              gene output

Run genotyping pipeline for NGS data.

During copy number analysis, if the input data is targeted sequencing, the
command will apply inter-sample normalization using summary statistics across
all samples. For best results, it is recommended to specify known samples
without SV using --samples-without-sv.

Positional arguments:
  gene                  Target gene.
  output                Output directory.

Optional arguments:
  -h, --help            Show this help message and exit.
  --variants PATH       Input VCF file must be already BGZF compressed (.gz)
                        and indexed (.tbi) to allow random access.
                        Statistical haplotype phasing will be skipped if
                        input VCF is already fully phased.
  --depth-of-coverage PATH
                        Archive file with the semantic type
                        CovFrame[DepthOfCoverage].
  --control-statistics PATH
                        Archive file with the semantic type
                        SampleTable[Statistics].
  --platform TEXT       Genotyping platform (default: 'WGS') (choices: 'WGS',
                        'Targeted')
  --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                        (choices: 'GRCh37', 'GRCh38').
  --panel PATH          VCF file corresponding to a reference haplotype panel
                        (compressed or uncompressed). By default, the 1KGP panel
                        in the pypgx-bundle directory will be used.
  --force               Overwrite output directory if it already exists.
  --samples TEXT [TEXT ...]
                        Specify which samples should be included for analysis
                        by providing a text file (.txt, .tsv, .csv, or .list)
                        containing one sample per line. Alternatively, you
                        can provide a list of samples.
  --exclude             Exclude specified samples.
  --samples-without-sv TEXT [TEXT ...]
                        List of known samples without SV.
  --do-not-plot-copy-number
                        Do not plot copy number profile.
  --do-not-plot-allele-fraction
                        Do not plot allele fraction profile.
  --cnv-caller PATH     Archive file with the semantic type Model[CNV]. By
                        default, a pre-trained CNV caller in the pypgx-bundle
                        directory will be used.

[Example] To genotype the CYP3A5 gene, which does not have SV, from WGS data:
  $ pypgx run-ngs-pipeline \
  CYP3A5 \
  CYP3A5-pipeline \
  --variants variants.vcf.gz

[Example] To genotype the CYP2D6 gene, which does have SV, from WGS data:
  $ pypgx run-ngs-pipeline \
  CYP2D6 \
  CYP2D6-pipeline \
  --variants variants.vcf.gz \
  --depth-of-coverage depth-of-coverage.zip \
  --control-statistics control-statistics-VDR.zip

[Example] To genotype the CYP2D6 gene from targeted sequencing data:
  $ pypgx run-ngs-pipeline \
  CYP2D6 \
  CYP2D6-pipeline \
  --variants variants.vcf.gz \
  --depth-of-coverage depth-of-coverage.zip \
  --control-statistics control-statistics-VDR.zip \
  --platform Targeted

slice-bam¶

$ pypgx slice-bam -h
usage: pypgx slice-bam [-h] [--assembly TEXT] [--genes TEXT [TEXT ...]]
                       [--exclude]
                       input output

Slice BAM file for all genes used by PyPGx.

Positional arguments:
  input                 Input BAM file. It must be already indexed to allow
                        random access.
  output                Output BAM file.

Optional arguments:
  -h, --help            Show this help message and exit.
  --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                        (choices: 'GRCh37', 'GRCh38').
  --genes TEXT [TEXT ...]
                        List of genes to include.
  --exclude             Exclude specified genes. Ignored when --genes is not
                        used.

test-cnv-caller¶

$ pypgx test-cnv-caller -h
usage: pypgx test-cnv-caller [-h] [--confusion-matrix PATH]
                             [--comparison-table PATH]
                             cnv-caller copy-number cnv-calls

Test CNV caller for target gene.

Positional arguments:
  cnv-caller            Input archive file with the semantic type Model[CNV].
  copy-number           Input archive file with the semantic type
                        CovFrame[CopyNumber].
  cnv-calls             Input archive file with the semantic type
                        SampleTable[CNVCalls].

Optional arguments:
  -h, --help            Show this help message and exit.
  --confusion-matrix PATH
                        Write the confusion matrix as a CSV file where rows
                        indicate actual class and columns indicate prediction
                        class.
  --comparison-table PATH
                        Write a CSV file comparing actual vs. predicted CNV
                        calls for each sample.

train-cnv-caller¶

$ pypgx train-cnv-caller -h
usage: pypgx train-cnv-caller [-h] [--confusion-matrix PATH]
                              [--comparison-table PATH]
                              copy-number cnv-calls cnv-caller

Train CNV caller for target gene.

This command will return a SVM-based multiclass classifier that makes CNV
calls using the one-vs-rest strategy.

Positional arguments:
  copy-number           Input archive file with the semantic type
                        CovFrame[CopyNumber].
  cnv-calls             Input archive file with the semantic type
                        SampleTable[CNVCalls].
  cnv-caller            Output archive file with the semantic type Model[CNV].

Optional arguments:
  -h, --help            Show this help message and exit.
  --confusion-matrix PATH
                        Write the confusion matrix as a CSV file where rows
                        indicate actual class and columns indicate prediction
                        class.
  --comparison-table PATH
                        Write a CSV file comparing actual vs. predicted CNV
                        calls for each sample.