..
   This file was automatically generated by docs/create.py.


CLI
***

This page describes the command line interface (CLI) for PyPGx.

For getting help on the CLI:

.. code-block:: text

   $ pypgx -h

   usage: pypgx [-h] [-v] COMMAND ...
   
   positional arguments:
     COMMAND
       call-genotypes      Call genotypes for target gene.
       call-phenotypes     Call phenotypes for target gene.
       combine-results     Combine various results for target gene.
       compare-genotypes   Calculate concordance between two genotype results.
       compute-control-statistics
                           Compute summary statistics for control gene from BAM
                           files.
       compute-copy-number
                           Compute copy number from read depth for target gene.
       compute-target-depth
                           Compute read depth for target gene from BAM files.
       create-consolidated-vcf
                           Create a consolidated VCF file.
       create-input-vcf    Call SNVs/indels from BAM files for all target genes.
       create-regions-bed  Create a BED file which contains all regions used by
                           PyPGx.
       estimate-phase-beagle
                           Estimate haplotype phase of observed variants with
                           the Beagle program.
       filter-samples      Filter Archive file for specified samples.
       import-read-depth   Import read depth data for target gene.
       import-variants     Import SNV/indel data for target gene.
       plot-bam-copy-number
                           Plot copy number profile from CovFrame[CopyNumber].
       plot-bam-read-depth
                           Plot read depth profile with BAM data.
       plot-cn-af          Plot both copy number profile and allele fraction
                           profile in one figure.
       plot-vcf-allele-fraction
                           Plot allele fraction profile with VCF data.
       plot-vcf-read-depth
                           Plot read depth profile with VCF data.
       predict-alleles     Predict candidate star alleles based on observed
                           variants.
       predict-cnv         Predict CNV from copy number data for target gene.
       prepare-depth-of-coverage
                           Prepare a depth of coverage file for all target
                           genes with SV from BAM files.
       print-data          Print the main data of specified archive.
       print-metadata      Print the metadata of specified archive.
       run-chip-pipeline   Run genotyping pipeline for chip data.
       run-long-read-pipeline
                           Run genotyping pipeline for long-read sequencing data.
       run-ngs-pipeline    Run genotyping pipeline for NGS data.
       slice-bam           Slice BAM file for all genes used by PyPGx.
       test-cnv-caller     Test CNV caller for target gene.
       train-cnv-caller    Train CNV caller for target gene.
   
   options:
     -h, --help            Show this help message and exit.
     -v, --version         Show the version number and exit.

For getting help on a specific command (e.g. call-genotypes):

.. code-block:: text

   $ pypgx call-genotypes -h

call-genotypes
==============

.. code-block:: text

   $ pypgx call-genotypes -h
   usage: pypgx call-genotypes [-h] [--alleles PATH] [--cnv-calls PATH] genotypes
   
   Call genotypes for target gene.
   
   Positional arguments:
     genotypes         Output archive file with the semantic type
                       SampleTable[Genotypes].
   
   Optional arguments:
     -h, --help        Show this help message and exit.
     --alleles PATH    Input archive file with the semantic type
                       SampleTable[Alleles].
     --cnv-calls PATH  Input archive file with the semantic type
                       SampleTable[CNVCalls].

call-phenotypes
===============

.. code-block:: text

   $ pypgx call-phenotypes -h
   usage: pypgx call-phenotypes [-h] genotypes phenotypes
   
   Call phenotypes for target gene.
   
   Positional arguments:
     genotypes   Input archive file with the semantic type
                 SampleTable[Genotypes].
     phenotypes  Output archive file with the semantic type
                 SampleTable[Phenotypes].
   
   Optional arguments:
     -h, --help  Show this help message and exit.

combine-results
===============

.. code-block:: text

   $ pypgx combine-results -h
   usage: pypgx combine-results [-h] [--genotypes PATH] [--phenotypes PATH]
                                [--alleles PATH] [--cnv-calls PATH]
                                results
   
   Combine various results for target gene.
   
   Positional arguments:
     results            Output archive file with the semantic type
                        SampleTable[Results].
   
   Optional arguments:
     -h, --help         Show this help message and exit.
     --genotypes PATH   Input archive file with the semantic type
                        SampleTable[Genotypes].
     --phenotypes PATH  Input archive file with the semantic type
                        SampleTable[Phenotypes].
     --alleles PATH     Input archive file with the semantic type
                        SampleTable[Alleles].
     --cnv-calls PATH   Input archive file with the semantic type
                        SampleTable[CNVCalls].

compare-genotypes
=================

.. code-block:: text

   $ pypgx compare-genotypes -h
   usage: pypgx compare-genotypes [-h] [--verbose] first second
   
   Calculate concordance between two genotype results.
   
   Only samples that appear in both genotype results will be used to calculate
   concordance for genotype calls as well as CNV calls.
   
   Positional arguments:
     first       First archive file with the semantic type
                 SampleTable[Results].
     second      Second archive file with the semantic type
                 SampleTable[Results].
   
   Optional arguments:
     -h, --help  Show this help message and exit.
     --verbose   Whether to print the verbose version of output, including
                 discordant calls.

compute-control-statistics
==========================

.. code-block:: text

   $ pypgx compute-control-statistics -h
   usage: pypgx compute-control-statistics [-h] [--assembly TEXT] [--bed PATH]
                                           gene control-statistics bams
                                           [bams ...]
   
   Compute summary statistics for control gene from BAM files.
   
   Note that for the arguments gene and --bed, the 'chr' prefix in contig names
   (e.g. 'chr1' vs. '1') will be automatically added or removed as necessary to
   match the input BAM's contig names.
   
   Positional arguments:
     gene                Control gene (recommended choices: 'EGFR', 'RYR1',
                         'VDR'). Alternatively, you can provide a custom region
                         (format: chrom:start-end).
     control-statistics  Output archive file with the semantic type
                         SampleTable[Statistics].
     bams                One or more input BAM files. Alternatively, you can
                         provide a text file (.txt, .tsv, .csv, or .list)
                         containing one BAM file per line.
   
   Optional arguments:
     -h, --help          Show this help message and exit.
     --assembly TEXT     Reference genome assembly (default: 'GRCh37')
                         (choices: 'GRCh37', 'GRCh38').
     --bed PATH          By default, the input data is assumed to be WGS. If
                         it's targeted sequencing, you must provide a BED file
                         to indicate probed regions.
   
   [Example] For the VDR gene from WGS data:
     $ pypgx compute-control-statistics \
     VDR \
     control-statistics.zip \
     1.bam 2.bam
   
   [Example] For a custom region from targeted sequencing data:
     $ pypgx compute-control-statistics \
     chr1:100-200 \
     control-statistics.zip \
     bam.list \
     --bed probes.bed

compute-copy-number
===================

.. code-block:: text

   $ pypgx compute-copy-number -h
   usage: pypgx compute-copy-number [-h] [--samples-without-sv TEXT [TEXT ...]]
                                    read-depth control-statistics copy-number
   
   Compute copy number from read depth for target gene.
   
   The command will convert read depth to copy number by performing intra-sample
   normalization using summary statistics from the control gene.
   
   During copy number analysis, if the input data is targeted sequencing, the
   command will apply inter-sample normalization using summary statistics across
   all samples. For best results, it is recommended to specify known samples
   without SV using --samples-without-sv.
   
   Positional arguments:
     read-depth            Input archive file with the semantic type
                           CovFrame[ReadDepth].
     control-statistics    Input archive file with the semantic type
                           SampleTable[Statistics].
     copy-number           Output archive file with the semantic type
                           CovFrame[CopyNumber].
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --samples-without-sv TEXT [TEXT ...]
                           List of known samples with no SV.

compute-target-depth
====================

.. code-block:: text

   $ pypgx compute-target-depth -h
   usage: pypgx compute-target-depth [-h] [--assembly TEXT] [--bed PATH]
                                     gene read-depth bams [bams ...]
   
   Compute read depth for target gene from BAM files.
   
   Positional arguments:
     gene             Target gene.
     read-depth       Output archive file with the semantic type
                      CovFrame[ReadDepth].
     bams             One or more input BAM files. Alternatively, you can
                      provide a text file (.txt, .tsv, .csv, or .list)
                      containing one BAM file per line.
   
   Optional arguments:
     -h, --help       Show this help message and exit.
     --assembly TEXT  Reference genome assembly (default: 'GRCh37')
                      (choices: 'GRCh37', 'GRCh38').
     --bed PATH       By default, the input data is assumed to be WGS. If it
                      is targeted sequencing, you must provide a BED file to
                      indicate probed regions.
   
   [Example] For the CYP2D6 gene from WGS data:
     $ pypgx compute-target-depth \
     CYP2D6 \
     read-depth.zip \
     1.bam 2.bam
   
   [Example] For the CYP2D6 gene from targeted sequencing data:
     $ pypgx compute-target-depth \
     CYP2D6 \
     read-depth.zip \
     bam.list \
     --bed probes.bed

create-consolidated-vcf
=======================

.. code-block:: text

   $ pypgx create-consolidated-vcf -h
   usage: pypgx create-consolidated-vcf [-h]
                                        imported-variants phased-variants
                                        consolidated-variants
   
   Create a consolidated VCF file.
   
   Positional arguments:
     imported-variants     Input archive file with the semantic type
                           VcfFrame[Imported].
     phased-variants       Input archive file with the semantic type
                           VcfFrame[Phased].
     consolidated-variants
                           Output archive file with the semantic type
                           VcfFrame[Consolidated].
   
   Optional arguments:
     -h, --help            Show this help message and exit.

create-input-vcf
================

.. code-block:: text

   $ pypgx create-input-vcf -h
   usage: pypgx create-input-vcf [-h] [--assembly TEXT] [--genes TEXT [TEXT ...]]
                                 [--exclude] [--dir-path PATH] [--max-depth INT]
                                 vcf fasta bams [bams ...]
   
   Call SNVs/indels from BAM files for all target genes.
   
   To save computing resources, this method will call variants only for target
   genes whose at least one star allele is defined by SNVs/indels. Therefore,
   variants will not be called for target genes that have star alleles defined
   only by structural variation (e.g. UGT2B17).
   
   Positional arguments:
     vcf                   Output VCF file. It must have .vcf.gz as suffix.
     fasta                 Reference FASTA file.
     bams                  One or more input BAM files. Alternatively, you can
                           provide a text file (.txt, .tsv, .csv, or .list)
                           containing one BAM file per line.
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                           (choices: 'GRCh37', 'GRCh38').
     --genes TEXT [TEXT ...]
                           List of genes to include.
     --exclude             Exclude specified genes. Ignored when --genes is not
                           used.
     --dir-path PATH       By default, intermediate files (likelihoods.bcf,
                           calls.bcf, and calls.normalized.bcf) will be stored
                           in a temporary directory, which is automatically
                           deleted after creating final VCF. If you provide a
                           directory path, intermediate files will be stored
                           there.
     --max-depth INT       At a position, read maximally this number of reads
                           per input file (default: 250). If your input data is
                           from WGS (e.g. 30X), you don't need to change this
                           option. However, if it's from targeted sequencing
                           with ultra-deep coverage (e.g. 500X), then you need
                           to increase the maximum depth.

create-regions-bed
==================

.. code-block:: text

   $ pypgx create-regions-bed -h
   usage: pypgx create-regions-bed [-h] [--assembly TEXT] [--add-chr-prefix]
                                   [--merge] [--target-genes] [--sv-genes]
                                   [--var-genes] [--genes TEXT [TEXT ...]]
                                   [--exclude]
   
   Create a BED file which contains all regions used by PyPGx.
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                           (choices: 'GRCh37', 'GRCh38').
     --add-chr-prefix      Whether to add the 'chr' string in contig names.
     --merge               Whether to merge overlapping intervals (gene names
                           will be removed too).
     --target-genes        Whether to only return target genes, excluding
                           control genes and paralogs.
     --sv-genes            Whether to only return target genes whose at least
                           one star allele is defined by structural variation
     --var-genes           Whether to only return target genes whose at least
                           one star allele is defined by SNVs/indels.
     --genes TEXT [TEXT ...]
                           List of genes to include.
     --exclude             Exclude specified genes. Ignored when --genes is not
                           used.

estimate-phase-beagle
=====================

.. code-block:: text

   $ pypgx estimate-phase-beagle -h
   usage: pypgx estimate-phase-beagle [-h] [--panel PATH] [--impute]
                                      imported-variants phased-variants
   
   Estimate haplotype phase of observed variants with the Beagle program.
   
   Positional arguments:
     imported-variants  Input archive file with the semantic type
                        VcfFrame[Imported]. The 'chr' prefix in contig names
                        (e.g. 'chr1' vs. '1') will be automatically added or
                        removed as necessary to match the reference VCF's contig
                        names.
     phased-variants    Output archive file with the semantic type
                        VcfFrame[Phased].
   
   Optional arguments:
     -h, --help         Show this help message and exit.
     --panel PATH       VCF file (compressed or uncompressed) corresponding to a
                        reference haplotype panel. By default, the 1KGP panel in
                        the pypgx-bundle directory will be used.
     --impute           Perform imputation of missing genotypes.

filter-samples
==============

.. code-block:: text

   $ pypgx filter-samples -h
   usage: pypgx filter-samples [-h] [--exclude]
                               input output samples [samples ...]
   
   Filter Archive file for specified samples.
   
   Positional arguments:
     input       Input archive file.
     output      Output archive file.
     samples     Specify which samples should be included for analysis
                 by providing a text file (.txt, .tsv, .csv, or .list)
                 containing one sample per line. Alternatively, you can
                 provide a list of samples.
   
   Optional arguments:
     -h, --help  Show this help message and exit.
     --exclude   Exclude specified samples.

import-read-depth
=================

.. code-block:: text

   $ pypgx import-read-depth -h
   usage: pypgx import-read-depth [-h] [--samples TEXT [TEXT ...]] [--exclude]
                                  gene depth-of-coverage read-depth
   
   Import read depth data for target gene.
   
   Positional arguments:
     gene                  Target gene.
     depth-of-coverage     Input archive file with the semantic type
                           CovFrame[DepthOfCoverage].
     read-depth            Output archive file with the semantic type
                           CovFrame[ReadDepth].
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --samples TEXT [TEXT ...]
                           Specify which samples should be included for analysis
                           by providing a text file (.txt, .tsv, .csv, or .list)
                           containing one sample per line. Alternatively, you can
                           provide a list of samples.
     --exclude             Exclude specified samples.

import-variants
===============

.. code-block:: text

   $ pypgx import-variants -h
   usage: pypgx import-variants [-h] [--assembly TEXT] [--platform TEXT]
                                [--samples TEXT [TEXT ...]] [--exclude]
                                gene vcf imported-variants
   
   Import SNV/indel data for target gene.
   
   The command will slice the input VCF for the target gene to create an archive
   file with the semantic type VcfFrame[Imported] or VcfFrame[Consolidated].
   
   Positional arguments:
     gene                  Target gene.
     vcf                   Input VCF file must be already BGZF compressed (.gz)
                           and indexed (.tbi) to allow random access.
     imported-variants     Output archive file with the semantic type
                           VcfFrame[Imported] or VcfFrame[Consolidated].
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                           (choices: 'GRCh37', 'GRCh38').
     --platform TEXT       Genotyping platform used (default: 'WGS') (choices:
                           'WGS', 'Targeted', 'Chip', 'LongRead'). When the
                           platform is 'WGS', 'Targeted', or 'Chip', the command
                           will assess whether every genotype call in the sliced
                           VCF is haplotype phased (e.g. '0|1'). If the sliced
                           VCF is fully phased, the command will return
                           VcfFrame[Consolidated] or otherwise
                           VcfFrame[Imported]. When the platform is 'LongRead',
                           the command will return VcfFrame[Consolidated] after
                           applying the phase-extension algorithm to estimate
                           haplotype phase of any variants that could not be
                           resolved by read-backed phasing.
     --samples TEXT [TEXT ...]
                           Specify which samples should be included for analysis
                           by providing a text file (.txt, .tsv, .csv, or .list)
                           containing one sample per line. Alternatively, you
                           can provide a list of samples.
     --exclude             Exclude specified samples.

plot-bam-copy-number
====================

.. code-block:: text

   $ pypgx plot-bam-copy-number -h
   usage: pypgx plot-bam-copy-number [-h] [--fitted] [--path PATH]
                                     [--samples TEXT [TEXT ...]] [--ymin FLOAT]
                                     [--ymax FLOAT] [--fontsize FLOAT]
                                     copy-number
   
   Plot copy number profile from CovFrame[CopyNumber].
   
   Positional arguments:
     copy-number           Input archive file with the semantic type
                           CovFrame[CopyNumber].
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --fitted              Show the fitted line as well.
     --path PATH           Create plots in this directory (default: current
                           directory).
     --samples TEXT [TEXT ...]
                           Specify which samples should be included for analysis
                           by providing a text file (.txt, .tsv, .csv, or .list)
                           containing one sample per line. Alternatively, you can
                           provide a list of samples.
     --ymin FLOAT          Y-axis bottom (default: -0.3).
     --ymax FLOAT          Y-axis top (default: 6.3).
     --fontsize FLOAT      Text fontsize (default: 25).

plot-bam-read-depth
===================

.. code-block:: text

   $ pypgx plot-bam-read-depth -h
   usage: pypgx plot-bam-read-depth [-h] [--path PATH]
                                    [--samples TEXT [TEXT ...]] [--ymin FLOAT]
                                    [--ymax FLOAT] [--fontsize FLOAT]
                                    read-depth
   
   Plot read depth profile with BAM data.
   
   Positional arguments:
     read-depth            Input archive file with the semantic type
                           CovFrame[ReadDepth].
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --path PATH           Create plots in this directory (default: current
                           directory).
     --samples TEXT [TEXT ...]
                           Specify which samples should be included for analysis
                           by providing a text file (.txt, .tsv, .csv, or .list)
                           containing one sample per line. Alternatively, you can
                           provide a list of samples.
     --ymin FLOAT          Y-axis bottom.
     --ymax FLOAT          Y-axis top.
     --fontsize FLOAT      Text fontsize (default: 25).

plot-cn-af
==========

.. code-block:: text

   $ pypgx plot-cn-af -h
   usage: pypgx plot-cn-af [-h] [--path PATH] [--samples TEXT [TEXT ...]]
                           [--ymin FLOAT] [--ymax FLOAT] [--fontsize FLOAT]
                           copy-number imported-variants
   
   Plot both copy number profile and allele fraction profile in one figure.
   
   Positional arguments:
     copy-number           Input archive file with the semantic type
                           CovFrame[CopyNumber].
     imported-variants     Input archive file with the semantic type
                           VcfFrame[Imported].
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --path PATH           Create plots in this directory (default: current
                           directory).
     --samples TEXT [TEXT ...]
                           Specify which samples should be included for analysis
                           by providing a text file (.txt, .tsv, .csv, or .list)
                           containing one sample per line. Alternatively, you can
                           provide a list of samples.
     --ymin FLOAT          Y-axis bottom (default: -0.3).
     --ymax FLOAT          Y-axis top (default: 6.3).
     --fontsize FLOAT      Text fontsize (default: 25).

plot-vcf-allele-fraction
========================

.. code-block:: text

   $ pypgx plot-vcf-allele-fraction -h
   usage: pypgx plot-vcf-allele-fraction [-h] [--path PATH]
                                         [--samples TEXT [TEXT ...]]
                                         [--fontsize FLOAT]
                                         imported-variants
   
   Plot allele fraction profile from VcfFrame[Imported].
   
   Positional arguments:
     imported-variants     Input archive file with the semantic type
                           VcfFrame[Imported].
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --path PATH           Create plots in this directory (default: current
                           directory).
     --samples TEXT [TEXT ...]
                           Specify which samples should be included for analysis
                           by providing a text file (.txt, .tsv, .csv, or .list)
                           containing one sample per line. Alternatively, you can
                           provide a list of samples.
     --fontsize FLOAT      Text fontsize (default: 25).

plot-vcf-read-depth
===================

.. code-block:: text

   $ pypgx plot-vcf-read-depth -h
   usage: pypgx plot-vcf-read-depth [-h] [--assembly TEXT] [--path PATH]
                                    [--samples TEXT [TEXT ...]] [--ymin FLOAT]
                                    [--ymax FLOAT]
                                    gene vcf
   
   Plot read depth profile with VCF data.
   
   Positional arguments:
     gene                  Target gene.
     vcf                   Input VCF file.
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                           (choices: 'GRCh37', 'GRCh38').
     --path PATH           Create plots in this directory (default: current
                           directory).
     --samples TEXT [TEXT ...]
                           Specify which samples should be included for analysis
                           by providing a text file (.txt, .tsv, .csv, or .list)
                           containing one sample per line. Alternatively, you can
                           provide a list of samples.
     --ymin FLOAT          Y-axis bottom.
     --ymax FLOAT          Y-axis top.

predict-alleles
===============

.. code-block:: text

   $ pypgx predict-alleles -h
   usage: pypgx predict-alleles [-h] consolidated-variants alleles
   
   Predict candidate star alleles based on observed variants.
   
   Positional arguments:
     consolidated-variants
                           Input archive file with the semantic type
                           VcfFrame[Consolidated].
     alleles               Output archive file with the semantic type
                           SampleTable[Alleles].
   
   Optional arguments:
     -h, --help            Show this help message and exit.

predict-cnv
===========

.. code-block:: text

   $ pypgx predict-cnv -h
   usage: pypgx predict-cnv [-h] [--cnv-caller PATH] copy-number cnv-calls
   
   Predict CNV from copy number data for target gene.
   
   Genomic positions that are missing copy number because, for example, the
   input data is targeted sequencing will be imputed with forward filling.
   
   Positional arguments:
     copy-number        Input archive file with the semantic type
                        CovFrame[CopyNumber].
     cnv-calls          Output archive file with the semantic type
                        SampleTable[CNVCalls].
   
   Optional arguments:
     -h, --help         Show this help message and exit.
     --cnv-caller PATH  Archive file with the semantic type Model[CNV]. By
                        default, a pre-trained CNV caller in the pypgx-bundle
                        directory will be used.

prepare-depth-of-coverage
=========================

.. code-block:: text

   $ pypgx prepare-depth-of-coverage -h
   usage: pypgx prepare-depth-of-coverage [-h] [--assembly TEXT] [--bed PATH]
                                          [--genes TEXT [TEXT ...]] [--exclude]
                                          depth-of-coverage bams [bams ...]
   
   Prepare a depth of coverage file for all target genes with SV from BAM files.
   
   To save computing resources, this method will count read depth only for
   target genes whose at least one star allele is defined by structural
   variation. Therefore, read depth will not be computed for target genes that
   have star alleles defined only by SNVs/indels (e.g. CYP3A5).
   
   Positional arguments:
     depth-of-coverage     Output archive file with the semantic type
                           CovFrame[DepthOfCoverage].
     bams                  One or more input BAM files. Alternatively, you can
                           provide a text file (.txt, .tsv, .csv, or .list)
                           containing one BAM file per line.
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                           (choices: 'GRCh37', 'GRCh38').
     --bed PATH            By default, the input data is assumed to be WGS. If
                           it's targeted sequencing, you must provide a BED file
                           to indicate probed regions. Note that the 'chr' prefix
                           in contig names (e.g. 'chr1' vs. '1') will be
                           automatically added or removed as necessary to match
                           the input BAM's contig names.
     --genes TEXT [TEXT ...]
                           List of genes to include.
     --exclude             Exclude specified genes. Ignored when --genes is not
                           used.
   
   [Example] From WGS data:
     $ pypgx prepare-depth-of-coverage \
     depth-of-coverage.zip \
     1.bam 2.bam
   
   [Example] From targeted sequencing data:
     $ pypgx prepare-depth-of-coverage \
     depth-of-coverage.zip \
     bam.list \
     --bed probes.bed

print-data
==========

.. code-block:: text

   $ pypgx print-data -h
   usage: pypgx print-data [-h] input
   
   Print the main data of specified archive.
   
   Positional arguments:
     input       Input archive file.
   
   Optional arguments:
     -h, --help  Show this help message and exit.

print-metadata
==============

.. code-block:: text

   $ pypgx print-metadata -h
   usage: pypgx print-metadata [-h] input
   
   Print the metadata of specified archive.
   
   Positional arguments:
     input       Input archive file.
   
   Optional arguments:
     -h, --help  Show this help message and exit.

run-chip-pipeline
=================

.. code-block:: text

   $ pypgx run-chip-pipeline -h
   usage: pypgx run-chip-pipeline [-h] [--assembly TEXT] [--panel PATH]
                                  [--impute] [--force]
                                  [--samples TEXT [TEXT ...]] [--exclude]
                                  gene output variants
   
   Run genotyping pipeline for chip data.
   
   Positional arguments:
     gene                  Target gene.
     output                Output directory.
     variants              Input VCF file must be already BGZF compressed (.gz)
                           and indexed (.tbi) to allow random access.
                           Statistical haplotype phasing will be skipped if
                           input VCF is already fully phased.
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --assembly TEXT       
                           Reference genome assembly (default: 'GRCh37')
                           (choices: 'GRCh37', 'GRCh38').
     --panel PATH          VCF file corresponding to a reference haplotype panel
                           (compressed or uncompressed). By default, the 1KGP
                           panel in the pypgx-bundle directory will be used.
     --impute              Perform imputation of missing genotypes.
     --force               Overwrite output directory if it already exists.
     --samples TEXT [TEXT ...]
                           Specify which samples should be included for analysis
                           by providing a text file (.txt, .tsv, .csv, or .list)
                           containing one sample per line. Alternatively, you
                           can provide a list of samples.
     --exclude             Exclude specified samples.
   
   [Example] To genotype the CYP3A5 gene from chip data:
     $ pypgx run-chip-pipeline \
     CYP3A5 \
     CYP3A5-pipeline \
     variants.vcf.gz

run-long-read-pipeline
======================

.. code-block:: text

   $ pypgx run-long-read-pipeline -h
   usage: pypgx run-long-read-pipeline [-h] [--assembly TEXT] [--force]
                                       [--samples TEXT [TEXT ...]] [--exclude]
                                       gene output variants
   
   Run genotyping pipeline for long-read sequencing data.
   
   Positional arguments:
     gene                  Target gene.
     output                Output directory.
     variants              Input VCF file must be already BGZF compressed (.gz)
                           and indexed (.tbi) to allow random access.
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                           (choices: 'GRCh37', 'GRCh38').
     --force               Overwrite output directory if it already exists.
     --samples TEXT [TEXT ...]
                           Specify which samples should be included for analysis
                           by providing a text file (.txt, .tsv, .csv, or .list)
                           containing one sample per line. Alternatively, you
                           can provide a list of samples.
     --exclude             Exclude specified samples.
   
   [Example] To genotype the CYP3A5 gene from long-read sequencing data:
     $ pypgx run-long-read-pipeline \
     CYP3A5 \
     CYP3A5-pipeline \
     variants.vcf.gz

run-ngs-pipeline
================

.. code-block:: text

   $ pypgx run-ngs-pipeline -h
   usage: pypgx run-ngs-pipeline [-h] [--variants PATH]
                                 [--depth-of-coverage PATH]
                                 [--control-statistics PATH] [--platform TEXT]
                                 [--assembly TEXT] [--panel PATH] [--force]
                                 [--samples TEXT [TEXT ...]] [--exclude]
                                 [--samples-without-sv TEXT [TEXT ...]]
                                 [--do-not-plot-copy-number]
                                 [--do-not-plot-allele-fraction]
                                 [--cnv-caller PATH]
                                 gene output
   
   Run genotyping pipeline for NGS data.
   
   During copy number analysis, if the input data is targeted sequencing, the
   command will apply inter-sample normalization using summary statistics across
   all samples. For best results, it is recommended to specify known samples
   without SV using --samples-without-sv.
   
   Positional arguments:
     gene                  Target gene.
     output                Output directory.
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --variants PATH       Input VCF file must be already BGZF compressed (.gz)
                           and indexed (.tbi) to allow random access.
                           Statistical haplotype phasing will be skipped if
                           input VCF is already fully phased.
     --depth-of-coverage PATH
                           Archive file with the semantic type
                           CovFrame[DepthOfCoverage].
     --control-statistics PATH
                           Archive file with the semantic type
                           SampleTable[Statistics].
     --platform TEXT       Genotyping platform (default: 'WGS') (choices: 'WGS',
                           'Targeted')
     --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                           (choices: 'GRCh37', 'GRCh38').
     --panel PATH          VCF file corresponding to a reference haplotype panel
                           (compressed or uncompressed). By default, the 1KGP panel
                           in the pypgx-bundle directory will be used.
     --force               Overwrite output directory if it already exists.
     --samples TEXT [TEXT ...]
                           Specify which samples should be included for analysis
                           by providing a text file (.txt, .tsv, .csv, or .list)
                           containing one sample per line. Alternatively, you
                           can provide a list of samples.
     --exclude             Exclude specified samples.
     --samples-without-sv TEXT [TEXT ...]
                           List of known samples without SV.
     --do-not-plot-copy-number
                           Do not plot copy number profile.
     --do-not-plot-allele-fraction
                           Do not plot allele fraction profile.
     --cnv-caller PATH     Archive file with the semantic type Model[CNV]. By
                           default, a pre-trained CNV caller in the pypgx-bundle
                           directory will be used.
   
   [Example] To genotype the CYP3A5 gene, which does not have SV, from WGS data:
     $ pypgx run-ngs-pipeline \
     CYP3A5 \
     CYP3A5-pipeline \
     --variants variants.vcf.gz
   
   [Example] To genotype the CYP2D6 gene, which does have SV, from WGS data:
     $ pypgx run-ngs-pipeline \
     CYP2D6 \
     CYP2D6-pipeline \
     --variants variants.vcf.gz \
     --depth-of-coverage depth-of-coverage.zip \
     --control-statistics control-statistics-VDR.zip
   
   [Example] To genotype the CYP2D6 gene from targeted sequencing data:
     $ pypgx run-ngs-pipeline \
     CYP2D6 \
     CYP2D6-pipeline \
     --variants variants.vcf.gz \
     --depth-of-coverage depth-of-coverage.zip \
     --control-statistics control-statistics-VDR.zip \
     --platform Targeted

slice-bam
=========

.. code-block:: text

   $ pypgx slice-bam -h
   usage: pypgx slice-bam [-h] [--assembly TEXT] [--genes TEXT [TEXT ...]]
                          [--exclude]
                          input output
   
   Slice BAM file for all genes used by PyPGx.
   
   Positional arguments:
     input                 Input BAM file. It must be already indexed to allow
                           random access.
     output                Output BAM file.
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --assembly TEXT       Reference genome assembly (default: 'GRCh37')
                           (choices: 'GRCh37', 'GRCh38').
     --genes TEXT [TEXT ...]
                           List of genes to include.
     --exclude             Exclude specified genes. Ignored when --genes is not
                           used.

test-cnv-caller
===============

.. code-block:: text

   $ pypgx test-cnv-caller -h
   usage: pypgx test-cnv-caller [-h] [--confusion-matrix PATH]
                                [--comparison-table PATH]
                                cnv-caller copy-number cnv-calls
   
   Test CNV caller for target gene.
   
   Positional arguments:
     cnv-caller            Input archive file with the semantic type Model[CNV].
     copy-number           Input archive file with the semantic type
                           CovFrame[CopyNumber].
     cnv-calls             Input archive file with the semantic type
                           SampleTable[CNVCalls].
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --confusion-matrix PATH
                           Write the confusion matrix as a CSV file where rows
                           indicate actual class and columns indicate prediction
                           class.
     --comparison-table PATH
                           Write a CSV file comparing actual vs. predicted CNV
                           calls for each sample.

train-cnv-caller
================

.. code-block:: text

   $ pypgx train-cnv-caller -h
   usage: pypgx train-cnv-caller [-h] [--confusion-matrix PATH]
                                 [--comparison-table PATH]
                                 copy-number cnv-calls cnv-caller
   
   Train CNV caller for target gene.
   
   This command will return a SVM-based multiclass classifier that makes CNV
   calls using the one-vs-rest strategy.
   
   Positional arguments:
     copy-number           Input archive file with the semantic type
                           CovFrame[CopyNumber].
     cnv-calls             Input archive file with the semantic type
                           SampleTable[CNVCalls].
     cnv-caller            Output archive file with the semantic type Model[CNV].
   
   Optional arguments:
     -h, --help            Show this help message and exit.
     --confusion-matrix PATH
                           Write the confusion matrix as a CSV file where rows
                           indicate actual class and columns indicate prediction
                           class.
     --comparison-table PATH
                           Write a CSV file comparing actual vs. predicted CNV
                           calls for each sample.