CLI¶
This page describes the command line interface (CLI) for PyPGx.
For getting help on the CLI:
$ pypgx -h
usage: pypgx [-h] [-v] COMMAND ...
positional arguments:
COMMAND
call-genotypes Call genotypes for target gene.
call-phenotypes Call phenotypes for target gene.
combine-results Combine various results for target gene.
compare-genotypes Calculate concordance between two genotype results.
compute-control-statistics
Compute summary statistics for control gene from BAM
files.
compute-copy-number
Compute copy number from read depth for target gene.
compute-target-depth
Compute read depth for target gene from BAM files.
create-consolidated-vcf
Create a consolidated VCF file.
create-input-vcf Call SNVs/indels from BAM files for all target genes.
create-regions-bed Create a BED file which contains all regions used by
PyPGx.
estimate-phase-beagle
Estimate haplotype phase of observed variants with
the Beagle program.
filter-samples Filter Archive file for specified samples.
import-read-depth Import read depth data for target gene.
import-variants Import SNV/indel data for target gene.
plot-bam-copy-number
Plot copy number profile from CovFrame[CopyNumber].
plot-bam-read-depth
Plot read depth profile with BAM data.
plot-cn-af Plot both copy number profile and allele fraction
profile in one figure.
plot-vcf-allele-fraction
Plot allele fraction profile with VCF data.
plot-vcf-read-depth
Plot read depth profile with VCF data.
predict-alleles Predict candidate star alleles based on observed
variants.
predict-cnv Predict CNV from copy number data for target gene.
prepare-depth-of-coverage
Prepare a depth of coverage file for all target
genes with SV from BAM files.
print-data Print the main data of specified archive.
print-metadata Print the metadata of specified archive.
run-chip-pipeline Run genotyping pipeline for chip data.
run-long-read-pipeline
Run genotyping pipeline for long-read sequencing data.
run-ngs-pipeline Run genotyping pipeline for NGS data.
slice-bam Slice BAM file for all genes used by PyPGx.
test-cnv-caller Test CNV caller for target gene.
train-cnv-caller Train CNV caller for target gene.
options:
-h, --help Show this help message and exit.
-v, --version Show the version number and exit.
For getting help on a specific command (e.g. call-genotypes):
$ pypgx call-genotypes -h
call-genotypes¶
$ pypgx call-genotypes -h
usage: pypgx call-genotypes [-h] [--alleles PATH] [--cnv-calls PATH] genotypes
Call genotypes for target gene.
Positional arguments:
genotypes Output archive file with the semantic type
SampleTable[Genotypes].
Optional arguments:
-h, --help Show this help message and exit.
--alleles PATH Input archive file with the semantic type
SampleTable[Alleles].
--cnv-calls PATH Input archive file with the semantic type
SampleTable[CNVCalls].
call-phenotypes¶
$ pypgx call-phenotypes -h
usage: pypgx call-phenotypes [-h] genotypes phenotypes
Call phenotypes for target gene.
Positional arguments:
genotypes Input archive file with the semantic type
SampleTable[Genotypes].
phenotypes Output archive file with the semantic type
SampleTable[Phenotypes].
Optional arguments:
-h, --help Show this help message and exit.
combine-results¶
$ pypgx combine-results -h
usage: pypgx combine-results [-h] [--genotypes PATH] [--phenotypes PATH]
[--alleles PATH] [--cnv-calls PATH]
results
Combine various results for target gene.
Positional arguments:
results Output archive file with the semantic type
SampleTable[Results].
Optional arguments:
-h, --help Show this help message and exit.
--genotypes PATH Input archive file with the semantic type
SampleTable[Genotypes].
--phenotypes PATH Input archive file with the semantic type
SampleTable[Phenotypes].
--alleles PATH Input archive file with the semantic type
SampleTable[Alleles].
--cnv-calls PATH Input archive file with the semantic type
SampleTable[CNVCalls].
compare-genotypes¶
$ pypgx compare-genotypes -h
usage: pypgx compare-genotypes [-h] [--verbose] first second
Calculate concordance between two genotype results.
Only samples that appear in both genotype results will be used to calculate
concordance for genotype calls as well as CNV calls.
Positional arguments:
first First archive file with the semantic type
SampleTable[Results].
second Second archive file with the semantic type
SampleTable[Results].
Optional arguments:
-h, --help Show this help message and exit.
--verbose Whether to print the verbose version of output, including
discordant calls.
compute-control-statistics¶
$ pypgx compute-control-statistics -h
usage: pypgx compute-control-statistics [-h] [--assembly TEXT] [--bed PATH]
gene control-statistics bams
[bams ...]
Compute summary statistics for control gene from BAM files.
Note that for the arguments gene and --bed, the 'chr' prefix in contig names
(e.g. 'chr1' vs. '1') will be automatically added or removed as necessary to
match the input BAM's contig names.
Positional arguments:
gene Control gene (recommended choices: 'EGFR', 'RYR1',
'VDR'). Alternatively, you can provide a custom region
(format: chrom:start-end).
control-statistics Output archive file with the semantic type
SampleTable[Statistics].
bams One or more input BAM files. Alternatively, you can
provide a text file (.txt, .tsv, .csv, or .list)
containing one BAM file per line.
Optional arguments:
-h, --help Show this help message and exit.
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--bed PATH By default, the input data is assumed to be WGS. If
it's targeted sequencing, you must provide a BED file
to indicate probed regions.
[Example] For the VDR gene from WGS data:
$ pypgx compute-control-statistics \
VDR \
control-statistics.zip \
1.bam 2.bam
[Example] For a custom region from targeted sequencing data:
$ pypgx compute-control-statistics \
chr1:100-200 \
control-statistics.zip \
bam.list \
--bed probes.bed
compute-copy-number¶
$ pypgx compute-copy-number -h
usage: pypgx compute-copy-number [-h] [--samples-without-sv TEXT [TEXT ...]]
read-depth control-statistics copy-number
Compute copy number from read depth for target gene.
The command will convert read depth to copy number by performing intra-sample
normalization using summary statistics from the control gene.
During copy number analysis, if the input data is targeted sequencing, the
command will apply inter-sample normalization using summary statistics across
all samples. For best results, it is recommended to specify known samples
without SV using --samples-without-sv.
Positional arguments:
read-depth Input archive file with the semantic type
CovFrame[ReadDepth].
control-statistics Input archive file with the semantic type
SampleTable[Statistics].
copy-number Output archive file with the semantic type
CovFrame[CopyNumber].
Optional arguments:
-h, --help Show this help message and exit.
--samples-without-sv TEXT [TEXT ...]
List of known samples with no SV.
compute-target-depth¶
$ pypgx compute-target-depth -h
usage: pypgx compute-target-depth [-h] [--assembly TEXT] [--bed PATH]
gene read-depth bams [bams ...]
Compute read depth for target gene from BAM files.
Positional arguments:
gene Target gene.
read-depth Output archive file with the semantic type
CovFrame[ReadDepth].
bams One or more input BAM files. Alternatively, you can
provide a text file (.txt, .tsv, .csv, or .list)
containing one BAM file per line.
Optional arguments:
-h, --help Show this help message and exit.
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--bed PATH By default, the input data is assumed to be WGS. If it
is targeted sequencing, you must provide a BED file to
indicate probed regions.
[Example] For the CYP2D6 gene from WGS data:
$ pypgx compute-target-depth \
CYP2D6 \
read-depth.zip \
1.bam 2.bam
[Example] For the CYP2D6 gene from targeted sequencing data:
$ pypgx compute-target-depth \
CYP2D6 \
read-depth.zip \
bam.list \
--bed probes.bed
create-consolidated-vcf¶
$ pypgx create-consolidated-vcf -h
usage: pypgx create-consolidated-vcf [-h]
imported-variants phased-variants
consolidated-variants
Create a consolidated VCF file.
Positional arguments:
imported-variants Input archive file with the semantic type
VcfFrame[Imported].
phased-variants Input archive file with the semantic type
VcfFrame[Phased].
consolidated-variants
Output archive file with the semantic type
VcfFrame[Consolidated].
Optional arguments:
-h, --help Show this help message and exit.
create-input-vcf¶
$ pypgx create-input-vcf -h
usage: pypgx create-input-vcf [-h] [--assembly TEXT] [--genes TEXT [TEXT ...]]
[--exclude] [--dir-path PATH] [--max-depth INT]
vcf fasta bams [bams ...]
Call SNVs/indels from BAM files for all target genes.
To save computing resources, this method will call variants only for target
genes whose at least one star allele is defined by SNVs/indels. Therefore,
variants will not be called for target genes that have star alleles defined
only by structural variation (e.g. UGT2B17).
Positional arguments:
vcf Output VCF file. It must have .vcf.gz as suffix.
fasta Reference FASTA file.
bams One or more input BAM files. Alternatively, you can
provide a text file (.txt, .tsv, .csv, or .list)
containing one BAM file per line.
Optional arguments:
-h, --help Show this help message and exit.
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--genes TEXT [TEXT ...]
List of genes to include.
--exclude Exclude specified genes. Ignored when --genes is not
used.
--dir-path PATH By default, intermediate files (likelihoods.bcf,
calls.bcf, and calls.normalized.bcf) will be stored
in a temporary directory, which is automatically
deleted after creating final VCF. If you provide a
directory path, intermediate files will be stored
there.
--max-depth INT At a position, read maximally this number of reads
per input file (default: 250). If your input data is
from WGS (e.g. 30X), you don't need to change this
option. However, if it's from targeted sequencing
with ultra-deep coverage (e.g. 500X), then you need
to increase the maximum depth.
create-regions-bed¶
$ pypgx create-regions-bed -h
usage: pypgx create-regions-bed [-h] [--assembly TEXT] [--add-chr-prefix]
[--merge] [--target-genes] [--sv-genes]
[--var-genes] [--genes TEXT [TEXT ...]]
[--exclude]
Create a BED file which contains all regions used by PyPGx.
Optional arguments:
-h, --help Show this help message and exit.
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--add-chr-prefix Whether to add the 'chr' string in contig names.
--merge Whether to merge overlapping intervals (gene names
will be removed too).
--target-genes Whether to only return target genes, excluding
control genes and paralogs.
--sv-genes Whether to only return target genes whose at least
one star allele is defined by structural variation
--var-genes Whether to only return target genes whose at least
one star allele is defined by SNVs/indels.
--genes TEXT [TEXT ...]
List of genes to include.
--exclude Exclude specified genes. Ignored when --genes is not
used.
estimate-phase-beagle¶
$ pypgx estimate-phase-beagle -h
usage: pypgx estimate-phase-beagle [-h] [--panel PATH] [--impute]
imported-variants phased-variants
Estimate haplotype phase of observed variants with the Beagle program.
Positional arguments:
imported-variants Input archive file with the semantic type
VcfFrame[Imported]. The 'chr' prefix in contig names
(e.g. 'chr1' vs. '1') will be automatically added or
removed as necessary to match the reference VCF's contig
names.
phased-variants Output archive file with the semantic type
VcfFrame[Phased].
Optional arguments:
-h, --help Show this help message and exit.
--panel PATH VCF file (compressed or uncompressed) corresponding to a
reference haplotype panel. By default, the 1KGP panel in
the pypgx-bundle directory will be used.
--impute Perform imputation of missing genotypes.
filter-samples¶
$ pypgx filter-samples -h
usage: pypgx filter-samples [-h] [--exclude]
input output samples [samples ...]
Filter Archive file for specified samples.
Positional arguments:
input Input archive file.
output Output archive file.
samples Specify which samples should be included for analysis
by providing a text file (.txt, .tsv, .csv, or .list)
containing one sample per line. Alternatively, you can
provide a list of samples.
Optional arguments:
-h, --help Show this help message and exit.
--exclude Exclude specified samples.
import-read-depth¶
$ pypgx import-read-depth -h
usage: pypgx import-read-depth [-h] [--samples TEXT [TEXT ...]] [--exclude]
gene depth-of-coverage read-depth
Import read depth data for target gene.
Positional arguments:
gene Target gene.
depth-of-coverage Input archive file with the semantic type
CovFrame[DepthOfCoverage].
read-depth Output archive file with the semantic type
CovFrame[ReadDepth].
Optional arguments:
-h, --help Show this help message and exit.
--samples TEXT [TEXT ...]
Specify which samples should be included for analysis
by providing a text file (.txt, .tsv, .csv, or .list)
containing one sample per line. Alternatively, you can
provide a list of samples.
--exclude Exclude specified samples.
import-variants¶
$ pypgx import-variants -h
usage: pypgx import-variants [-h] [--assembly TEXT] [--platform TEXT]
[--samples TEXT [TEXT ...]] [--exclude]
gene vcf imported-variants
Import SNV/indel data for target gene.
The command will slice the input VCF for the target gene to create an archive
file with the semantic type VcfFrame[Imported] or VcfFrame[Consolidated].
Positional arguments:
gene Target gene.
vcf Input VCF file must be already BGZF compressed (.gz)
and indexed (.tbi) to allow random access.
imported-variants Output archive file with the semantic type
VcfFrame[Imported] or VcfFrame[Consolidated].
Optional arguments:
-h, --help Show this help message and exit.
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--platform TEXT Genotyping platform used (default: 'WGS') (choices:
'WGS', 'Targeted', 'Chip', 'LongRead'). When the
platform is 'WGS', 'Targeted', or 'Chip', the command
will assess whether every genotype call in the sliced
VCF is haplotype phased (e.g. '0|1'). If the sliced
VCF is fully phased, the command will return
VcfFrame[Consolidated] or otherwise
VcfFrame[Imported]. When the platform is 'LongRead',
the command will return VcfFrame[Consolidated] after
applying the phase-extension algorithm to estimate
haplotype phase of any variants that could not be
resolved by read-backed phasing.
--samples TEXT [TEXT ...]
Specify which samples should be included for analysis
by providing a text file (.txt, .tsv, .csv, or .list)
containing one sample per line. Alternatively, you
can provide a list of samples.
--exclude Exclude specified samples.
plot-bam-copy-number¶
$ pypgx plot-bam-copy-number -h
usage: pypgx plot-bam-copy-number [-h] [--fitted] [--path PATH]
[--samples TEXT [TEXT ...]] [--ymin FLOAT]
[--ymax FLOAT] [--fontsize FLOAT]
copy-number
Plot copy number profile from CovFrame[CopyNumber].
Positional arguments:
copy-number Input archive file with the semantic type
CovFrame[CopyNumber].
Optional arguments:
-h, --help Show this help message and exit.
--fitted Show the fitted line as well.
--path PATH Create plots in this directory (default: current
directory).
--samples TEXT [TEXT ...]
Specify which samples should be included for analysis
by providing a text file (.txt, .tsv, .csv, or .list)
containing one sample per line. Alternatively, you can
provide a list of samples.
--ymin FLOAT Y-axis bottom (default: -0.3).
--ymax FLOAT Y-axis top (default: 6.3).
--fontsize FLOAT Text fontsize (default: 25).
plot-bam-read-depth¶
$ pypgx plot-bam-read-depth -h
usage: pypgx plot-bam-read-depth [-h] [--path PATH]
[--samples TEXT [TEXT ...]] [--ymin FLOAT]
[--ymax FLOAT] [--fontsize FLOAT]
read-depth
Plot read depth profile with BAM data.
Positional arguments:
read-depth Input archive file with the semantic type
CovFrame[ReadDepth].
Optional arguments:
-h, --help Show this help message and exit.
--path PATH Create plots in this directory (default: current
directory).
--samples TEXT [TEXT ...]
Specify which samples should be included for analysis
by providing a text file (.txt, .tsv, .csv, or .list)
containing one sample per line. Alternatively, you can
provide a list of samples.
--ymin FLOAT Y-axis bottom.
--ymax FLOAT Y-axis top.
--fontsize FLOAT Text fontsize (default: 25).
plot-cn-af¶
$ pypgx plot-cn-af -h
usage: pypgx plot-cn-af [-h] [--path PATH] [--samples TEXT [TEXT ...]]
[--ymin FLOAT] [--ymax FLOAT] [--fontsize FLOAT]
copy-number imported-variants
Plot both copy number profile and allele fraction profile in one figure.
Positional arguments:
copy-number Input archive file with the semantic type
CovFrame[CopyNumber].
imported-variants Input archive file with the semantic type
VcfFrame[Imported].
Optional arguments:
-h, --help Show this help message and exit.
--path PATH Create plots in this directory (default: current
directory).
--samples TEXT [TEXT ...]
Specify which samples should be included for analysis
by providing a text file (.txt, .tsv, .csv, or .list)
containing one sample per line. Alternatively, you can
provide a list of samples.
--ymin FLOAT Y-axis bottom (default: -0.3).
--ymax FLOAT Y-axis top (default: 6.3).
--fontsize FLOAT Text fontsize (default: 25).
plot-vcf-allele-fraction¶
$ pypgx plot-vcf-allele-fraction -h
usage: pypgx plot-vcf-allele-fraction [-h] [--path PATH]
[--samples TEXT [TEXT ...]]
[--fontsize FLOAT]
imported-variants
Plot allele fraction profile from VcfFrame[Imported].
Positional arguments:
imported-variants Input archive file with the semantic type
VcfFrame[Imported].
Optional arguments:
-h, --help Show this help message and exit.
--path PATH Create plots in this directory (default: current
directory).
--samples TEXT [TEXT ...]
Specify which samples should be included for analysis
by providing a text file (.txt, .tsv, .csv, or .list)
containing one sample per line. Alternatively, you can
provide a list of samples.
--fontsize FLOAT Text fontsize (default: 25).
plot-vcf-read-depth¶
$ pypgx plot-vcf-read-depth -h
usage: pypgx plot-vcf-read-depth [-h] [--assembly TEXT] [--path PATH]
[--samples TEXT [TEXT ...]] [--ymin FLOAT]
[--ymax FLOAT]
gene vcf
Plot read depth profile with VCF data.
Positional arguments:
gene Target gene.
vcf Input VCF file.
Optional arguments:
-h, --help Show this help message and exit.
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--path PATH Create plots in this directory (default: current
directory).
--samples TEXT [TEXT ...]
Specify which samples should be included for analysis
by providing a text file (.txt, .tsv, .csv, or .list)
containing one sample per line. Alternatively, you can
provide a list of samples.
--ymin FLOAT Y-axis bottom.
--ymax FLOAT Y-axis top.
predict-alleles¶
$ pypgx predict-alleles -h
usage: pypgx predict-alleles [-h] consolidated-variants alleles
Predict candidate star alleles based on observed variants.
Positional arguments:
consolidated-variants
Input archive file with the semantic type
VcfFrame[Consolidated].
alleles Output archive file with the semantic type
SampleTable[Alleles].
Optional arguments:
-h, --help Show this help message and exit.
predict-cnv¶
$ pypgx predict-cnv -h
usage: pypgx predict-cnv [-h] [--cnv-caller PATH] copy-number cnv-calls
Predict CNV from copy number data for target gene.
Genomic positions that are missing copy number because, for example, the
input data is targeted sequencing will be imputed with forward filling.
Positional arguments:
copy-number Input archive file with the semantic type
CovFrame[CopyNumber].
cnv-calls Output archive file with the semantic type
SampleTable[CNVCalls].
Optional arguments:
-h, --help Show this help message and exit.
--cnv-caller PATH Archive file with the semantic type Model[CNV]. By
default, a pre-trained CNV caller in the pypgx-bundle
directory will be used.
prepare-depth-of-coverage¶
$ pypgx prepare-depth-of-coverage -h
usage: pypgx prepare-depth-of-coverage [-h] [--assembly TEXT] [--bed PATH]
[--genes TEXT [TEXT ...]] [--exclude]
depth-of-coverage bams [bams ...]
Prepare a depth of coverage file for all target genes with SV from BAM files.
To save computing resources, this method will count read depth only for
target genes whose at least one star allele is defined by structural
variation. Therefore, read depth will not be computed for target genes that
have star alleles defined only by SNVs/indels (e.g. CYP3A5).
Positional arguments:
depth-of-coverage Output archive file with the semantic type
CovFrame[DepthOfCoverage].
bams One or more input BAM files. Alternatively, you can
provide a text file (.txt, .tsv, .csv, or .list)
containing one BAM file per line.
Optional arguments:
-h, --help Show this help message and exit.
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--bed PATH By default, the input data is assumed to be WGS. If
it's targeted sequencing, you must provide a BED file
to indicate probed regions. Note that the 'chr' prefix
in contig names (e.g. 'chr1' vs. '1') will be
automatically added or removed as necessary to match
the input BAM's contig names.
--genes TEXT [TEXT ...]
List of genes to include.
--exclude Exclude specified genes. Ignored when --genes is not
used.
[Example] From WGS data:
$ pypgx prepare-depth-of-coverage \
depth-of-coverage.zip \
1.bam 2.bam
[Example] From targeted sequencing data:
$ pypgx prepare-depth-of-coverage \
depth-of-coverage.zip \
bam.list \
--bed probes.bed
print-data¶
$ pypgx print-data -h
usage: pypgx print-data [-h] input
Print the main data of specified archive.
Positional arguments:
input Input archive file.
Optional arguments:
-h, --help Show this help message and exit.
print-metadata¶
$ pypgx print-metadata -h
usage: pypgx print-metadata [-h] input
Print the metadata of specified archive.
Positional arguments:
input Input archive file.
Optional arguments:
-h, --help Show this help message and exit.
run-chip-pipeline¶
$ pypgx run-chip-pipeline -h
usage: pypgx run-chip-pipeline [-h] [--assembly TEXT] [--panel PATH]
[--impute] [--force]
[--samples TEXT [TEXT ...]] [--exclude]
gene output variants
Run genotyping pipeline for chip data.
Positional arguments:
gene Target gene.
output Output directory.
variants Input VCF file must be already BGZF compressed (.gz)
and indexed (.tbi) to allow random access.
Statistical haplotype phasing will be skipped if
input VCF is already fully phased.
Optional arguments:
-h, --help Show this help message and exit.
--assembly TEXT
Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--panel PATH VCF file corresponding to a reference haplotype panel
(compressed or uncompressed). By default, the 1KGP
panel in the pypgx-bundle directory will be used.
--impute Perform imputation of missing genotypes.
--force Overwrite output directory if it already exists.
--samples TEXT [TEXT ...]
Specify which samples should be included for analysis
by providing a text file (.txt, .tsv, .csv, or .list)
containing one sample per line. Alternatively, you
can provide a list of samples.
--exclude Exclude specified samples.
[Example] To genotype the CYP3A5 gene from chip data:
$ pypgx run-chip-pipeline \
CYP3A5 \
CYP3A5-pipeline \
variants.vcf.gz
run-long-read-pipeline¶
$ pypgx run-long-read-pipeline -h
usage: pypgx run-long-read-pipeline [-h] [--assembly TEXT] [--force]
[--samples TEXT [TEXT ...]] [--exclude]
gene output variants
Run genotyping pipeline for long-read sequencing data.
Positional arguments:
gene Target gene.
output Output directory.
variants Input VCF file must be already BGZF compressed (.gz)
and indexed (.tbi) to allow random access.
Optional arguments:
-h, --help Show this help message and exit.
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--force Overwrite output directory if it already exists.
--samples TEXT [TEXT ...]
Specify which samples should be included for analysis
by providing a text file (.txt, .tsv, .csv, or .list)
containing one sample per line. Alternatively, you
can provide a list of samples.
--exclude Exclude specified samples.
[Example] To genotype the CYP3A5 gene from long-read sequencing data:
$ pypgx run-long-read-pipeline \
CYP3A5 \
CYP3A5-pipeline \
variants.vcf.gz
run-ngs-pipeline¶
$ pypgx run-ngs-pipeline -h
usage: pypgx run-ngs-pipeline [-h] [--variants PATH]
[--depth-of-coverage PATH]
[--control-statistics PATH] [--platform TEXT]
[--assembly TEXT] [--panel PATH] [--force]
[--samples TEXT [TEXT ...]] [--exclude]
[--samples-without-sv TEXT [TEXT ...]]
[--do-not-plot-copy-number]
[--do-not-plot-allele-fraction]
[--cnv-caller PATH]
gene output
Run genotyping pipeline for NGS data.
During copy number analysis, if the input data is targeted sequencing, the
command will apply inter-sample normalization using summary statistics across
all samples. For best results, it is recommended to specify known samples
without SV using --samples-without-sv.
Positional arguments:
gene Target gene.
output Output directory.
Optional arguments:
-h, --help Show this help message and exit.
--variants PATH Input VCF file must be already BGZF compressed (.gz)
and indexed (.tbi) to allow random access.
Statistical haplotype phasing will be skipped if
input VCF is already fully phased.
--depth-of-coverage PATH
Archive file with the semantic type
CovFrame[DepthOfCoverage].
--control-statistics PATH
Archive file with the semantic type
SampleTable[Statistics].
--platform TEXT Genotyping platform (default: 'WGS') (choices: 'WGS',
'Targeted')
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--panel PATH VCF file corresponding to a reference haplotype panel
(compressed or uncompressed). By default, the 1KGP panel
in the pypgx-bundle directory will be used.
--force Overwrite output directory if it already exists.
--samples TEXT [TEXT ...]
Specify which samples should be included for analysis
by providing a text file (.txt, .tsv, .csv, or .list)
containing one sample per line. Alternatively, you
can provide a list of samples.
--exclude Exclude specified samples.
--samples-without-sv TEXT [TEXT ...]
List of known samples without SV.
--do-not-plot-copy-number
Do not plot copy number profile.
--do-not-plot-allele-fraction
Do not plot allele fraction profile.
--cnv-caller PATH Archive file with the semantic type Model[CNV]. By
default, a pre-trained CNV caller in the pypgx-bundle
directory will be used.
[Example] To genotype the CYP3A5 gene, which does not have SV, from WGS data:
$ pypgx run-ngs-pipeline \
CYP3A5 \
CYP3A5-pipeline \
--variants variants.vcf.gz
[Example] To genotype the CYP2D6 gene, which does have SV, from WGS data:
$ pypgx run-ngs-pipeline \
CYP2D6 \
CYP2D6-pipeline \
--variants variants.vcf.gz \
--depth-of-coverage depth-of-coverage.zip \
--control-statistics control-statistics-VDR.zip
[Example] To genotype the CYP2D6 gene from targeted sequencing data:
$ pypgx run-ngs-pipeline \
CYP2D6 \
CYP2D6-pipeline \
--variants variants.vcf.gz \
--depth-of-coverage depth-of-coverage.zip \
--control-statistics control-statistics-VDR.zip \
--platform Targeted
slice-bam¶
$ pypgx slice-bam -h
usage: pypgx slice-bam [-h] [--assembly TEXT] [--genes TEXT [TEXT ...]]
[--exclude]
input output
Slice BAM file for all genes used by PyPGx.
Positional arguments:
input Input BAM file. It must be already indexed to allow
random access.
output Output BAM file.
Optional arguments:
-h, --help Show this help message and exit.
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--genes TEXT [TEXT ...]
List of genes to include.
--exclude Exclude specified genes. Ignored when --genes is not
used.
test-cnv-caller¶
$ pypgx test-cnv-caller -h
usage: pypgx test-cnv-caller [-h] [--confusion-matrix PATH]
[--comparison-table PATH]
cnv-caller copy-number cnv-calls
Test CNV caller for target gene.
Positional arguments:
cnv-caller Input archive file with the semantic type Model[CNV].
copy-number Input archive file with the semantic type
CovFrame[CopyNumber].
cnv-calls Input archive file with the semantic type
SampleTable[CNVCalls].
Optional arguments:
-h, --help Show this help message and exit.
--confusion-matrix PATH
Write the confusion matrix as a CSV file where rows
indicate actual class and columns indicate prediction
class.
--comparison-table PATH
Write a CSV file comparing actual vs. predicted CNV
calls for each sample.
train-cnv-caller¶
$ pypgx train-cnv-caller -h
usage: pypgx train-cnv-caller [-h] [--confusion-matrix PATH]
[--comparison-table PATH]
copy-number cnv-calls cnv-caller
Train CNV caller for target gene.
This command will return a SVM-based multiclass classifier that makes CNV
calls using the one-vs-rest strategy.
Positional arguments:
copy-number Input archive file with the semantic type
CovFrame[CopyNumber].
cnv-calls Input archive file with the semantic type
SampleTable[CNVCalls].
cnv-caller Output archive file with the semantic type Model[CNV].
Optional arguments:
-h, --help Show this help message and exit.
--confusion-matrix PATH
Write the confusion matrix as a CSV file where rows
indicate actual class and columns indicate prediction
class.
--comparison-table PATH
Write a CSV file comparing actual vs. predicted CNV
calls for each sample.