API¶
This page describes the application programming interface (API) for PyPGx.
Below is the list of submodules available in the API:
core : The core submodule is the main suite of tools for PGx research.
genotype : The genotype submodule is primarily used to make final diplotype calls by interpreting candidate star alleles and/or detected structural variants.
pipeline : The pipeline submodule is used to provide convenient methods that combine multiple PyPGx actions and automatically handle semantic types.
plot : The plot submodule is used to plot various kinds of profiles such as read depth, copy number, and allele fraction.
utils : The utils submodule contains main actions of PyPGx.
For getting help on a specific submodule (e.g. utils):
from pypgx.api import utils
help(utils)
core¶
The core submodule is the main suite of tools for PGx research.
Functions:
|
Build the definition table of star alleles for specified gene. |
|
Collapse redundant candidate star alleles. |
|
Get the default allele of specified gene. |
|
Get exon ends for specified gene. |
|
Get exon starts for specified gene. |
|
Get matched function from the allele table. |
|
Get the paralog of specified gene. |
|
Get matched priority from the phenotype table. |
|
Get recommendation for specified drug-phenotype combination. |
|
Get the reference allele for target gene. |
|
Get matched region from the gene table. |
|
Get matched score from the allele table. |
|
Get DNA strand (‘+’ or ‘-‘) for specified gene. |
|
Get variant impact from the variant table. |
|
Get variant synonyms. |
|
Return True if specified gene has phenotype data. |
|
Return True if specified gene has activity score. |
|
Return True if specified gene or allele has SV. |
|
Return True if specified allele exists in the allele table. |
|
Return True if specified gene is one of the target genes. |
|
List all star alleles present in the allele table. |
|
List all functions present in the allele table. |
|
List genes in the gene table. |
|
List all phenotypes present in the phenotype table. |
|
List variants that are used to define star alleles. |
Load the allele table. |
|
Load the CNV table. |
|
Load the CPIC table. |
|
Load the diplotype table. |
|
Load the phenotype equation table. |
|
Load the gene table. |
|
Load the phenotype table. |
|
Load the recommendation table. |
|
Load the variant table. |
|
|
Predict phenotype based on two haplotype calls. |
|
Predict activity score based on haplotype call. |
|
Sort star alleles by either priority or name. |
- pypgx.api.core.build_definition_table(gene, assembly='GRCh37')[source]¶
Build the definition table of star alleles for specified gene.
The table will only contain star alleles that are defined by SNVs and/or indels. It will not include alleles with SV (e.g. CYP2D6*5) or alleles with no variants (e.g. CYP2D6*2 for GRCh37 and CYP2D6*1 for GRCh38).
- Parameters
gene (str) – Target gene.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
- Returns
Definition table.
- Return type
fuc.api.pyvcf.VcfFrame
Examples
>>> import pypgx >>> vf = pypgx.build_definition_table('CYP4F2') >>> vf.df CHROM POS ID REF ALT QUAL FILTER INFO FORMAT *2 *3 0 19 15990431 rs2108622 C T . . VI=V433M GT 0 1 1 19 16008388 rs3093105 A C . . VI=W12G GT 1 0 >>> vf = pypgx.build_definition_table('CYP4F2', assembly='GRCh38') >>> vf.df CHROM POS ID REF ALT QUAL FILTER INFO FORMAT *2 *3 0 19 15879621 rs2108622 C T . . VI=V433M GT 0 1 1 19 15897578 rs3093105 A C . . VI=W12G GT 1 0
- pypgx.api.core.collapse_alleles(gene, alleles, assembly='GRCh37')[source]¶
Collapse redundant candidate star alleles.
Note that this method only considers core variants for collapsing.
- Parameters
gene (str) – Gene name.
alleles (list) – List of alleles.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
- Returns
Collapsed list of alleles.
- Return type
list
Examples
>>> import pypgx >>> pypgx.list_variants('CYP2B6', alleles='*6', mode='core') ['19-41512841-G-T', '19-41515263-A-G'] >>> pypgx.list_variants('CYP2B6', alleles='*7', mode='core') ['19-41512841-G-T', '19-41515263-A-G', '19-41522715-C-T'] >>> pypgx.collapse_alleles('CYP2B6', ['*6', '*7']) ['*7']
- pypgx.api.core.get_default_allele(gene, assembly='GRCh37')[source]¶
Get the default allele of specified gene.
- Parameters
gene (str) – Gene name.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
- Returns
Default allele.
- Return type
str
Examples
>>> import pypgx >>> pypgx.get_default_allele('CYP2D6') '*2' >>> pypgx.get_default_allele('CYP2D6', assembly='GRCh38') '*1'
- pypgx.api.core.get_exon_ends(gene, assembly='GRCh37')[source]¶
Get exon ends for specified gene.
- Parameters
gene (str) – Gene name.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
- Returns
List of end positions.
- Return type
list
See also
get_exon_starts
Get exon starts for specified gene.
Examples
>>> import pypgx >>> pypgx.get_exon_ends('CYP2D6') [42522754, 42522994, 42523636, 42523985, 42524352, 42524946, 42525187, 42525911, 42526883] >>> pypgx.get_exon_ends('CYP2D6', assembly='GRCh38') [42126752, 42126992, 42127634, 42127983, 42128350, 42128944, 42129185, 42129909, 42130810]
- pypgx.api.core.get_exon_starts(gene, assembly='GRCh37')[source]¶
Get exon starts for specified gene.
- Parameters
gene (str) – Gene name.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
- Returns
List of start positions.
- Return type
list
See also
get_exon_ends
Get exon ends for specified gene.
Examples
>>> import pypgx >>> pypgx.get_exon_starts('CYP2D6') [42522500, 42522852, 42523448, 42523843, 42524175, 42524785, 42525034, 42525739, 42526613] >>> pypgx.get_exon_starts('CYP2D6', assembly='GRCh38') [42126498, 42126850, 42127446, 42127841, 42128173, 42128783, 42129032, 42129737, 42130611]
- pypgx.api.core.get_function(gene, allele)[source]¶
Get matched function from the allele table.
- Parameters
gene (str) – Gene name.
allele (str) – Star allele.
- Returns
Function status.
- Return type
str
Examples
>>> import pypgx >>> pypgx.get_function('CYP2D6', '*1') 'Normal Function' >>> pypgx.get_function('CYP2D6', '*4') 'No Function' >>> pypgx.get_function('CYP2D6', '*22') 'Uncertain Function' >>> pypgx.get_function('UGT1A1', '*80+*37') 'Decreased Function' >>> pypgx.get_function('CYP2D6', '*140') nan
- pypgx.api.core.get_paralog(gene)[source]¶
Get the paralog of specified gene.
- Parameters
gene (str) – Gene name.
- Returns
Paralog gene. Empty string if none exists.
- Return type
str
Examples
>>> import pypgx >>> pypgx.get_paralog('CYP2D6') 'CYP2D7' >>> pypgx.get_paralog('CYP2D7') 'CYP2D6' >>> pypgx.get_paralog('CYP2B6') 'CYP2B7' >>> pypgx.get_paralog('CYP2E1') ''
- pypgx.api.core.get_priority(gene, phenotype)[source]¶
Get matched priority from the phenotype table.
- Parameters
gene (str) – Gene name.
phenotype (str) – Phenotype name.
- Returns
EHR priority.
- Return type
str
Examples
>>> import pypgx >>> pypgx.get_priority('CYP2D6', 'Normal Metabolizer') 'Normal/Routine/Low Risk' >>> pypgx.get_priority('CYP2D6', 'Ultrarapid Metabolizer') 'Abnormal/Priority/High Risk' >>> pypgx.get_priority('CYP3A5', 'Normal Metabolizer') 'Abnormal/Priority/High Risk' >>> pypgx.get_priority('CYP3A5', 'Poor Metabolizer') 'Normal/Routine/Low Risk'
- pypgx.api.core.get_recommendation(drug, gene1, phenotype1, gene2=None, phenotype2=None)[source]¶
Get recommendation for specified drug-phenotype combination.
- Parameters
drug (str) – Drug name.
gene1 (str) – Gene name.
phenotype1 (str) – Phenotype name.
gene2 (str, optional) – Second gene name.
phenotype2 (str, optional) – Second phenotype name.
- Returns
Drug recommendation.
- Return type
str
Examples
>>> import pypgx >>> # Codeine, an opiate and prodrug of morphine, is metabolized by CYP2D6 >>> pypgx.get_recommendation('codeine', 'CYP2D6', 'Normal Metabolizer') 'Use codeine label recommended age- or weight-specific dosing.' >>> pypgx.get_recommendation('codeine', 'CYP2D6', 'Ultrarapid Metabolizer') 'Avoid codeine use because of potential for serious toxicity. If opioid use is warranted, consider a non-tramadol opioid.' >>> pypgx.get_recommendation('codeine', 'CYP2D6', 'Poor Metabolizer') 'Avoid codeine use because of possibility of diminished analgesia. If opioid use is warranted, consider a non-tramadol opioid.' >>> pypgx.get_recommendation('codeine', 'CYP2D6', 'Indeterminate') 'None' >>> # It's possible to have an altered recommendation for Normal Metabolizer >>> pypgx.get_recommendation('tacrolimus', 'CYP3A5', 'Normal Metabolizer') 'Increase starting dose 1.5 to 2 times recommended starting dose. Total starting dose should not exceed 0.3 mg/kg/day. Use therapeutic drug monitoring to guide dose adjustments.' >>> # Some recommendations are determined by multiple genes (the order doesn't matter) >>> pypgx.get_recommendation('fluvastatin', 'CYP2C9', 'Normal Metabolizer') /Users/sbslee/Desktop/pypgx/pypgx/api/core.py:633: UserWarning: Recommendations for fluvastatin are determined by multiple genes (CYP2C9, SLCO1B1); for best results, specify phenotype for each gene warnings.warn(message) 'Prescribe desired starting dose and adjust doses of fluvastatin based on disease-specific guidelines.' >>> pypgx.get_recommendation('fluvastatin', 'SLCO1B1', 'Normal Function') /Users/sbslee/Desktop/pypgx/pypgx/api/core.py:633: UserWarning: Recommendations for fluvastatin are determined by multiple genes (CYP2C9, SLCO1B1); for best results, specify phenotype for each gene warnings.warn(message) 'Prescribe desired starting dose and adjust doses of fluvastatin based on disease-specific guidelines.' >>> pypgx.get_recommendation('fluvastatin', 'CYP2C9', 'Normal Metabolizer', 'SLCO1B1', 'Normal Function') 'Prescribe desired starting dose and adjust doses of fluvastatin based on disease-specific guidelines.' >>> pypgx.get_recommendation('fluvastatin', 'SLCO1B1', 'Normal Function', 'CYP2C9', 'Normal Metabolizer') 'Prescribe desired starting dose and adjust doses of fluvastatin based on disease-specific guidelines.'
- pypgx.api.core.get_ref_allele(gene)[source]¶
Get the reference allele for target gene.
- Parameters
gene (str) – Target gene.
- Returns
Reference allele.
- Return type
str
Examples
>>> import pypgx >>> pypgx.get_ref_allele('CYP2D6') '*1' >>> pypgx.get_ref_allele('NAT1') '*4'
- pypgx.api.core.get_region(gene, assembly='GRCh37')[source]¶
Get matched region from the gene table.
- Parameters
gene (str) – Gene name.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
- Returns
Requested region.
- Return type
str
- pypgx.api.core.get_score(gene, allele)[source]¶
Get matched score from the allele table.
- Parameters
gene (str) – Gene name.
allele (str) – Star allele.
- Returns
Activity score.
- Return type
float
See also
predict_score
Predict activity score based on haplotype call.
Examples
>>> import pypgx >>> pypgx.get_score('CYP2D6', '*1') # Allele with normal function 1.0 >>> pypgx.get_score('CYP2D6', '*4') # Allele with no function 0.0 >>> pypgx.get_score('CYP2D6', '*22') # Allele with uncertain function nan >>> pypgx.get_score('CYP2B6', '*1') # CYP2B6 does not have activity score nan
- pypgx.api.core.get_strand(gene)[source]¶
Get DNA strand (‘+’ or ‘-‘) for specified gene.
- Parameters
gene (str) – Gene name.
- Returns
‘+’ or ‘-‘.
- Return type
str
- pypgx.api.core.get_variant_impact(variant)[source]¶
Get variant impact from the variant table.
- Parameters
variant (str) – Variant name.
- Returns
Variant impact.
- Return type
str
Examples
>>> import pypgx >>> pypgx.get_variant_impact('22-42522580-C-T') # Missense variant 'R497H' >>> pypgx.get_variant_impact('10-96541756-T-A') # Splice variant 'Splice Defect' >>> pypgx.get_variant_impact('22-42524435-T-A') # Intron variant '' >>> pypgx.get_variant_impact('22-42524435-T-C') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/sbslee/Desktop/pypgx/pypgx/api/core.py", line 588, in get_variant_impact raise sdk.utils.VariantNotFoundError(variant) pypgx.sdk.utils.VariantNotFoundError: 22-42524435-T-C
- pypgx.api.core.get_variant_synonyms(gene, assembly='GRCh37')[source]¶
Get variant synonyms.
- Parameters
gene (str) – Target gene.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
- Returns
Variant synonyms.
- Return type
dict
Examples
>>> import pypgx >>> pypgx.get_variant_synonyms('UGT1A1') {'2-234668879-CAT-CATAT': '2-234668879-C-CAT', '2-234668879-CAT-CATATAT': '2-234668879-C-CATAT'} >>> pypgx.get_variant_synonyms('CYP2D6') {}
- pypgx.api.core.has_phenotype(gene)[source]¶
Return True if specified gene has phenotype data.
- Parameters
gene (str) – Gene name.
- Returns
Whether phenotype is supported.
- Return type
bool
Examples
>>> import pypgx >>> pypgx.has_phenotype('CYP2D6') True >>> pypgx.has_phenotype('CYP4F2') False
- pypgx.api.core.has_score(gene)[source]¶
Return True if specified gene has activity score.
- Parameters
gene (str) – Gene name.
- Returns
Whether activity score is supported.
- Return type
bool
Examples
>>> import pypgx >>> pypgx.has_score('CYP2D6') True >>> pypgx.has_score('CYP2B6') False
- pypgx.api.core.has_sv(gene, allele=None)[source]¶
Return True if specified gene or allele has SV.
The method will return
False
regardless of the specified allele if the target gene is not in the list of genes with SV. Additionally, it will returnFalse
if the specified allele is'Indeterminate'
.- Parameters
gene (str) – Target gene.
allele (str, optional) – Allele to be tested.
- Returns
True if the allele has SV.
- Return type
bool
Examples
>>> import pypgx >>> pypgx.has_sv('CYP2D6') # PyPGx has SV data for CYP2D6 True >>> pypgx.has_sv('CYP3A5') # PyPGx does not have SV data for CYP3A5 False >>> pypgx.has_sv('CYP2D6', '*1') # No SV False >>> pypgx.has_sv('CYP2D6', '*5') # Gene deletion True >>> pypgx.has_sv('CYP2D6', '*2x2') # Gene duplication True >>> pypgx.has_sv('CYP2D6', '*36+*10') # Tandem arrangement True >>> pypgx.has_sv('CYP3A5', '*1x2+*2') # Imaginary SV /Users/sbslee/Desktop/pypgx/pypgx/api/core.py:289: UserWarning: PyPGx currently has no SV data available for CYP3A5. For more details, please visit the Genes section (https://pypgx.readthedocs.io/en/latest/genes.html) in the Read the Docs. warnings.warn(f"PyPGx currently has no SV data available for {gene}. " False >>> pypgx.has_sv('CYP2D6', 'Indeterminate') False
- pypgx.api.core.is_legit_allele(gene, allele)[source]¶
Return True if specified allele exists in the allele table.
- Parameters
gene (str) – Target gene.
allele (str) – Allele to be tested.
- Returns
True if the allele is legit.
- Return type
bool
- pypgx.api.core.is_target_gene(gene)[source]¶
Return True if specified gene is one of the target genes.
- Parameters
gene (str) – Gene name.
- Returns
True if specified gene is one of the target genes.
- Return type
bool
Examples
>>> import pypgx >>> pypgx.is_target_gene('CYP2D6') True >>> pypgx.is_target_gene('CYP2D7') False
- pypgx.api.core.list_alleles(gene, variants=None, assembly='GRCh37')[source]¶
List all star alleles present in the allele table.
- Parameters
gene (str) – Target gene.
variants (str or list, optional) – Only list alleles carrying specified variant(s) as a part of definition.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
- Returns
Requested alleles.
- Return type
list
Examples
>>> import pypgx >>> pypgx.list_alleles('CYP4F2') ['*1', '*2', '*3'] >>> pypgx.list_alleles('CYP2B6', variants=['19-41515263-A-G'], assembly='GRCh37') ['*4', '*6', '*7', '*13', '*19', '*20', '*26', '*34', '*36', '*37', '*38']
- pypgx.api.core.list_functions(gene=None)[source]¶
List all functions present in the allele table.
- Parameters
gene (str, optional) – Return only functions belonging to this gene.
- Returns
Available functions.
- Return type
list
Examples
>>> import pypgx >>> pypgx.list_functions() [nan, 'Normal Function', 'Uncertain Function', 'Increased Function', 'Decreased Function', 'No Function', 'Unknown Function', 'Possible Decreased Function', 'Possible Increased Function'] >>> pypgx.list_functions(gene='CYP2D6') ['Normal Function', 'No Function', 'Decreased Function', 'Uncertain Function', 'Unknown Function', nan]
- pypgx.api.core.list_genes(mode='target')[source]¶
List genes in the gene table.
- Parameters
mode ({‘target’, ‘control’, ‘all’}, default: ‘target’) – Specify which gene set to return.
- Returns
Gene set.
- Return type
list
Examples
>>> import pypgx >>> pypgx.list_genes(mode='target')[:5] # First five target genes ['CACNA1S', 'CFTR', 'CYP1A2', 'CYP2A6', 'CYP2A13'] >>> pypgx.list_genes(mode='control') ['EGFR', 'RYR1', 'VDR'] >>> pypgx.list_genes(mode='all')[:5] # Includes pseudogenes ['CACNA1S', 'CFTR', 'CYP1A2', 'CYP2A6', 'CYP2A7']
- pypgx.api.core.list_phenotypes(gene=None)[source]¶
List all phenotypes present in the phenotype table.
- Parameters
gene (str, optional) – Return only phenotypes belonging to this gene.
- Returns
Available phenotypes.
- Return type
list
Examples
>>> import pypgx >>> pypgx.list_phenotypes() ['Intermediate Metabolizer', 'Normal Metabolizer', 'Poor Metabolizer', 'Rapid Metabolizer', 'Ultrarapid Metabolizer', 'Likely Intermediate Metabolizer', 'Likely Poor Metabolizer', 'Possible Intermediate Metabolizer'] >>> pypgx.list_phenotypes(gene='CYP2D6') ['Ultrarapid Metabolizer', 'Normal Metabolizer', 'Intermediate Metabolizer', 'Poor Metabolizer']
- pypgx.api.core.list_variants(gene, alleles=None, mode='all', assembly='GRCh37')[source]¶
List variants that are used to define star alleles.
Some alleles, such as reference alleles, may return an empty list because they do not contain any variants.
- Parameters
gene (str) – Target gene.
alleles (str or list, optional) – Allele name or list of alleles.
mode ({‘all’, ‘core’, ‘tag’}, default: ‘all’) – Whether to return all variants, core variants only, or tag variants only.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
- Returns
Coordinate sorted list of variants.
- Return type
list
Examples
>>> import pypgx >>> pypgx.list_variants('CYP4F2') ['19-15990431-C-T', '19-16008388-A-C'] >>> pypgx.list_variants('CYP4F2', alleles=['*2']) ['19-16008388-A-C'] >>> pypgx.list_variants('CYP4F2', alleles=['*2', '*3']) ['19-15990431-C-T', '19-16008388-A-C'] >>> pypgx.list_variants('CYP4F2', alleles=['*2'], assembly='GRCh38') ['19-15897578-A-C'] >>> pypgx.list_variants('CYP4F2', alleles=['*1']) [] >>> pypgx.list_variants('CYP2B6', alleles=['*6'], mode='all') ['19-41495755-T-C', '19-41496461-T-C', '19-41512841-G-T', '19-41515263-A-G'] >>> pypgx.list_variants('CYP2B6', alleles=['*6'], mode='core') ['19-41512841-G-T', '19-41515263-A-G'] >>> pypgx.list_variants('CYP2B6', alleles=['*6'], mode='tag') ['19-41495755-T-C', '19-41496461-T-C']
- pypgx.api.core.load_allele_table()[source]¶
Load the allele table.
- Returns
Requested table.
- Return type
pandas.DataFrame
Examples
>>> import pypgx >>> df = pypgx.load_allele_table() >>> df.head() Gene StarAllele ActivityScore Function GRCh37Core GRCh37Tag GRCh38Core GRCh38Tag SV 0 ABCB1 *1 NaN Normal Function 7-87138645-A-G,7-87160618-A-C,7-87179601-A-G NaN 7-87509329-A-G,7-87531302-A-C,7-87550285-A-G NaN False 1 ABCB1 *2 NaN Increased Function NaN NaN NaN NaN False 2 CACNA1S Reference NaN Normal Function NaN NaN NaN NaN False 3 CACNA1S c.520C>T NaN Malignant Hyperthermia Associated 1-201061121-G-A NaN 1-201091993-G-A NaN False 4 CACNA1S c.3257G>A NaN Malignant Hyperthermia Associated 1-201029943-C-T NaN 1-201060815-C-T NaN False
- pypgx.api.core.load_cnv_table()[source]¶
Load the CNV table.
- Returns
Requested table.
- Return type
pandas.DataFrame
Examples
>>> import pypgx >>> df = pypgx.load_cnv_table() >>> df.head() Gene Name 0 CYP2A6 Normal 1 CYP2A6 Deletion1Het 2 CYP2A6 Deletion1Hom 3 CYP2A6 Deletion2Het 4 CYP2A6 Deletion3Het
- pypgx.api.core.load_cpic_table()[source]¶
Load the CPIC table.
The copy of the CPIC table in PyPGx is current as of August 19, 2023. To obtain the latest CPIC table, you can visit the Genes-Drugs page on the CPIC website.
- Returns
Requested table.
- Return type
pandas.DataFrame
Examples
>>> import pypgx >>> df = pypgx.load_cpic_table() >>> df.head() Gene Drug RxNorm ATC Guideline CPICLevel CPICLevelStatus PharmGKBLevel FDALabel PMID 0 HLA-B abacavir 190521.0 J05AF06, J05AR02, J05AR13, J05AR04 https://cpicpgx.org/guidelines/guideline-for-a... A Final 1A Testing required 24561393;22378157 1 HLA-B allopurinol 519.0 M04AA01, M04AA51 https://cpicpgx.org/guidelines/guideline-for-a... A Final 1A Testing recommended 23232549;26094938 2 MT-RNR1 amikacin 641.0 D06AX12, J01GB06, S01AA21 https://cpicpgx.org/guidelines/cpic-guideline-... A Final 1A NaN 34032273 3 CYP2C19 amitriptyline 704.0 N06AA09, N06CA01 https://cpicpgx.org/guidelines/guideline-for-t... A Final 1A NaN 23486447;27997040 4 CYP2D6 amitriptyline 704.0 N06AA09, N06CA01 https://cpicpgx.org/guidelines/guideline-for-t... A Final 1A Actionable PGx 23486447;27997040
- pypgx.api.core.load_diplotype_table()[source]¶
Load the diplotype table.
- Returns
Requested table.
- Return type
pandas.DataFrame
Examples
>>> import pypgx >>> df = pypgx.load_diplotype_table() >>> df.head() Gene Diplotype Phenotype 0 CACNA1S Reference/Reference Uncertain Susceptibility 1 CACNA1S Reference/c.520C>T Malignant Hyperthermia Susceptibility 2 CACNA1S Reference/c.3257G>A Malignant Hyperthermia Susceptibility 3 CACNA1S c.520C>T/c.520C>T Malignant Hyperthermia Susceptibility 4 CACNA1S c.520C>T/c.3257G>A Malignant Hyperthermia Susceptibility
- pypgx.api.core.load_equation_table()[source]¶
Load the phenotype equation table.
- Returns
Requested table.
- Return type
pandas.DataFrame
Examples
>>> import pypgx >>> df = pypgx.load_equation_table() >>> df.head() Gene Phenotype Equation 0 CYP2C9 Poor Metabolizer 0 <= score < 1 1 CYP2C9 Intermediate Metabolizer 1 <= score < 2 2 CYP2C9 Normal Metabolizer 2 == score 3 CYP2D6 Poor Metabolizer 0 <= score < 0.25 4 CYP2D6 Intermediate Metabolizer 0.25 <= score < 1.25
- pypgx.api.core.load_gene_table()[source]¶
Load the gene table.
- Returns
Requested table.
- Return type
pandas.DataFrame
Examples
>>> import pypgx >>> df = pypgx.load_gene_table() >>> df.head() Gene Target Control Paralog Variants SV PhenotypeMethod RefAllele GRCh37Default GRCh38Default Strand GRCh37Region GRCh38Region GRCh37ExonStarts GRCh37ExonEnds GRCh38ExonStarts GRCh38ExonEnds 0 ABCB1 True False NaN True False NaN *1 *2 *2 - 7:87130178-87345639 7:87500862-87716323 87133178,87135212,87138590,87144546,87145824,8... 87133765,87135359,87138797,87144744,87145981,8... 87503862,87505896,87509274,87515230,87516508,8... 87504449,87506043,87509481,87515428,87516665,8... 1 CACNA1S True False NaN True False Diplotype Reference Reference Reference - 1:201005639-201084694 1:201036511-201115426 201008639,201009358,201009749,201010631,201012... 201009210,201009502,201009841,201010717,201012... 201039511,201040230,201040621,201041503,201043... 201040082,201040374,201040713,201041589,201043... 2 CFTR True False NaN True False Diplotype Reference Reference Reference + 7:117117016-117311719 7:117477024-117671665 117120016,117144306,117149087,117170952,117174... 117120201,117144417,117149196,117171168,117174... 117480024,117504252,117509033,117530898,117534... 117480147,117504363,117509142,117531114,117534... 3 CYP1A1 True False NaN True False NaN *1 *1 *1 - 15:75008882-75020951 15:74716541-74728528 75011882,75013307,75013539,75013754,75013931,7... 75013115,75013394,75013663,75013844,75014058,7... 74719541,74720966,74721198,74721413,74721590,7... 74720774,74721053,74721322,74721503,74721717,7... 4 CYP1A2 True False NaN True False NaN *1A *1A *1A + 15:75038183-75051941 15:74745844-74759607 75041183,75042070,75043529,75044105,75044464,7... 75041238,75042910,75043650,75044195,75044588,7... 74748844,74749729,74751188,74751764,74752123,7... 74748897,74750569,74751309,74751854,74752247,7...
- pypgx.api.core.load_phenotype_table()[source]¶
Load the phenotype table.
- Returns
Requested table.
- Return type
pandas.DataFrame
Examples
>>> import pypgx >>> df = pypgx.load_phenotype_table() >>> df.head() Gene Phenotype Priority 0 CACNA1S Uncertain Susceptibility Normal Risk 1 CACNA1S Malignant Hyperthermia Susceptibility Abnormal/Priority/High Risk 2 CFTR Favorable Response None 3 CFTR Unfavorable Response None 4 CFTR Indeterminate None
- pypgx.api.core.load_recommendation_table()[source]¶
Load the recommendation table.
- Returns
Requested table.
- Return type
pandas.DataFrame
Examples
>>> import pypgx >>> df = pypgx.load_recommendation_table() >>> df.head() Drug Gene1 Phenotype1 Gene2 Phenotype2 Recommendation 0 tacrolimus CYP3A5 Normal Metabolizer None None Increase starting dose 1.5 to 2 times recommen... 1 tacrolimus CYP3A5 Intermediate Metabolizer None None Increase starting dose 1.5 to 2 times recommen... 2 tacrolimus CYP3A5 Possible Intermediate Metabolizer None None None 3 tacrolimus CYP3A5 Poor Metabolizer None None Initiate therapy with standard recommended dos... 4 tacrolimus CYP3A5 Indeterminate None None None
- pypgx.api.core.load_variant_table()[source]¶
Load the variant table.
- Returns
Requested table.
- Return type
pandas.DataFrame
Examples
>>> import pypgx >>> df = pypgx.load_phenotype_table() >>> df.head() Gene Phenotype Priority 0 CACNA1S Uncertain Susceptibility Normal Risk 1 CACNA1S Malignant Hyperthermia Susceptibility Abnormal/Priority/High Risk 2 CFTR Favorable Response None 3 CFTR Unfavorable Response None 4 CFTR Indeterminate None
- pypgx.api.core.predict_phenotype(gene, a, b)[source]¶
Predict phenotype based on two haplotype calls.
The method can handle star alleles with structural variation including gene deletion, duplication, and tandem arrangement.
For detailed implementation, please see the Phenotype prediction section.
- Parameters
gene (str) – Target gene.
a, b (str) – Star allele for each haplotype. The order of alleles does not matter.
- Returns
Phenotype prediction.
- Return type
str
Examples
>>> import pypgx >>> pypgx.predict_phenotype('CYP2D6', '*4', '*5') # Both alleles have no function 'Poor Metabolizer' >>> pypgx.predict_phenotype('CYP2D6', '*5', '*4') # The order of alleles does not matter 'Poor Metabolizer' >>> pypgx.predict_phenotype('CYP2D6', '*1', '*22') # *22 has uncertain function 'Indeterminate' >>> pypgx.predict_phenotype('CYP2D6', '*1', '*1x2') # Gene duplication 'Ultrarapid Metabolizer' >>> pypgx.predict_phenotype('CYP2B6', '*1', '*4') # *4 has increased function 'Rapid Metabolizer'
- pypgx.api.core.predict_score(gene, allele)[source]¶
Predict activity score based on haplotype call.
The method can handle star alleles with structural variation including gene deletion, duplication, and tandem arrangement.
Note that the method will return
NaN
for alleles with uncertain function as well as for alleles from a gene that does not use the activity score system.For detailed implementation, please see the Phenotype prediction section.
- Parameters
gene (str) – Gene name.
allele (str) – Star allele.
- Returns
Activity score.
- Return type
float
See also
get_score
Get matched data from the allele table.
Examples
Here are some examples for the CYP2D6 gene:
>>> import pypgx >>> pypgx.predict_score('CYP2D6', '*1') # Allele with normal function 1.0 >>> pypgx.predict_score('CYP2D6', '*1x2') # Gene duplication of *1 2.0 >>> pypgx.predict_score('CYP2D6', '*1x4') # Gene multiplication of *1 4.0 >>> pypgx.predict_score('CYP2D6', '*4') # Allele with no function 0.0 >>> pypgx.predict_score('CYP2D6', '*4x2') # Gene duplication of *4 0.0 >>> pypgx.predict_score('CYP2D6', '*22') # Allele with uncertain function nan >>> pypgx.predict_score('CYP2D6', '*22x2') # Gene duplication of *22 nan >>> pypgx.predict_score('CYP2D6', '*36+*10') # Tandem arrangement 0.25 >>> pypgx.predict_score('CYP2D6', '*1x2+*4x2+*10') # Complex event 2.25
We can also predict activity score for the DPYD gene:
>>> pypgx.predict_score('DPYD', 'Reference') 1.0 >>> pypgx.predict_score('DPYD', 'c.1905+1G>A (*2A)') 0.0 >>> pypgx.predict_score('DPYD', 'c.295_298delTCAT (*7)') 0.0 >>> pypgx.predict_score('DPYD', 'c.703C>T (*8)') 0.0
All of the CYP2B6 alleles will return
NaN
because it does not have activity score:>>> pypgx.predict_score('CYP2B6', '*1') nan >>> pypgx.predict_score('CYP2B6', '*2') nan
- pypgx.api.core.sort_alleles(alleles, by='priority', gene=None, assembly='GRCh37')[source]¶
Sort star alleles by either priority or name.
By default (
by='priority'
) the method reports high priority alleles first. This means alleles are sorted by the following order: 1. allele function (e.g. ‘No Function’ > ‘Normal Function’), 2. number of core variants (e.g. three SNVs > one SNV), 3. number of core variants that impact protein coding (e.g. two misssense variants > one missense variant plus one intron variant), and 4. reference allele status (e.g. non-reference allele with two SNVs > reference allele with two SNVs such that CYP2D6*46 > CYP2D6*1 in GRCh37). Note that the priority of allele function decreases in the following order: ‘No Function’, ‘Decreased Function’, ‘Possible Decreased Function’, ‘Increased Function’, ‘Possible Increased Function’, ‘Uncertain Function’, ‘Unknown Function’, ‘Normal Function’.When
by='name'
the method will report alleles with a smaller number first. This means, for example, ‘*4’ will come before ‘*10’ whereas lexicographic sorting would produce the opposite result. This is particularly useful when forming a diplotype (e.g. ‘*4/*10’ vs. ‘*10/*4’).- Parameters
alleles (list) – List of alleles.
by ({‘priority’, ‘name’}, default: ‘priority’) – Determines which method to use for sorting alleles:
‘priority’: Report high priority alleles first.
‘name’: Report alleles with a smaller number first.
gene (str) – Target gene. Only required when
method='priority'
.assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly. Only relevant when
method='priority'
.
- Returns
Sorted list of alleles.
- Return type
list
Examples
Assume we have following alleles for the CYP2D6 gene:
>>> alleles = ['*1', '*2', '*4', '*10']
We can sort the alleles by their prioirty with
method='priority'
:>>> import pypgx >>> alleles = pypgx.sort_alleles(alleles, by='priority', gene='CYP2D6', assembly='GRCh37') >>> alleles ['*4', '*10', '*1', '*2']
We can restore the original order by sorting again with
method='name'
:>>> alleles = pypgx.sort_alleles(alleles, by='name') >>> alleles ['*1', '*2', '*4', '*10']
Note that we can also sort alleles by name for genes that do not use the star allele nomenclature (e.g. the DPYD gene):
>>> alleles = ['c.557A>G', 'c.2194G>A (*6)', 'c.496A>G', 'Reference', 'c.1627A>G (*5)'] >>> pypgx.sort_alleles(alleles, by='name') ['Reference', 'c.496A>G', 'c.557A>G', 'c.1627A>G (*5)', 'c.2194G>A (*6)']
genotype¶
The genotype submodule is primarily used to make final diplotype calls by interpreting candidate star alleles and/or detected structural variants.
Classes:
|
Genotyper for CYP2A6. |
|
Genotyper for CYP2B6. |
|
Genotyper for CYP2D6. |
|
Genotyper for CYP2E1. |
|
Genotyper for CYP4F2. |
|
Genotyper for G6PD. |
|
Genotyper for GSTM1. |
|
Genotyper for GSTT1. |
|
Genotyper for SLC22A2. |
|
Genotyper for SULT1A1. |
|
Genotyper for genes without SV. |
|
Genotyper for UGT1A4. |
|
Genotyper for UGT2B15. |
|
Genotyper for UGT2B17. |
Functions:
|
Call genotypes for target gene. |
- class pypgx.api.genotype.SimpleGenotyper(df, gene, assembly)[source]¶
Genotyper for genes without SV.
- pypgx.api.genotype.call_genotypes(alleles=None, cnv_calls=None)[source]¶
Call genotypes for target gene.
- Parameters
alleles (str or pypgx.Archive, optional) – Archive file or object with the semantic type SampleTable[Alleles].
cnv_calls (str or pypgx.Archive, optional) – Archive file or object with the semantic type SampleTable[CNVCalls].
- Returns
Archive object with the semantic type SampleTable[Genotypes].
- Return type
pypgx.Archive
pipeline¶
The pipeline submodule is used to provide convenient methods that combine multiple PyPGx actions and automatically handle semantic types.
Functions:
|
Run genotyping pipeline for chip data. |
|
Run genotyping pipeline for long-read sequencing data. |
|
Run genotyping pipeline for NGS data. |
- pypgx.api.pipeline.run_chip_pipeline(gene, output, variants, assembly='GRCh37', panel=None, impute=False, force=False, samples=None, exclude=False)[source]¶
Run genotyping pipeline for chip data.
- Parameters
gene (str) – Target gene.
output (str) – Output directory.
variants (str) – Input VCF file must be already BGZF compressed (.gz) and indexed (.tbi) to allow random access. Statistical haplotype phasing will be skipped if input VCF is already fully phased.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
panel (str, optional) – VCF file corresponding to a reference haplotype panel (compressed or uncompressed). By default, the 1KGP panel in the
pypgx-bundle
directory will be used.impute (bool, default: False) – If True, perform imputation of missing genotypes.
force (bool, default : False) – Overwrite output directory if it already exists.
samples (str or list, optional) – Subset the VCF for specified samples. This can be a text file (.txt, .tsv, .csv, or .list) containing one sample per line. Alternatively, you can provide a list of samples.
exclude (bool, default: False) – If True, exclude specified samples.
- pypgx.api.pipeline.run_long_read_pipeline(gene, output, variants, assembly='GRCh37', force=False, samples=None, exclude=False)[source]¶
Run genotyping pipeline for long-read sequencing data.
- Parameters
gene (str) – Target gene.
output (str) – Output directory.
variants (str) – Input VCF file must be already BGZF compressed (.gz) and indexed (.tbi) to allow random access.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
force (bool, default : False) – Overwrite output directory if it already exists.
samples (str or list, optional) – Subset the VCF for specified samples. This can be a text file (.txt, .tsv, .csv, or .list) containing one sample per line. Alternatively, you can provide a list of samples.
exclude (bool, default: False) – If True, exclude specified samples.
- pypgx.api.pipeline.run_ngs_pipeline(gene, output, variants=None, depth_of_coverage=None, control_statistics=None, platform='WGS', assembly='GRCh37', panel=None, force=False, samples=None, exclude=False, samples_without_sv=None, do_not_plot_copy_number=False, do_not_plot_allele_fraction=False, cnv_caller=None)[source]¶
Run genotyping pipeline for NGS data.
During copy number analysis, if the input data is targeted sequencing, the method will apply inter-sample normalization using summary statistics across all samples. For best results, it is recommended to specify known samples without SV using
samples_without_sv
.- Parameters
gene (str) – Target gene.
output (str) – Output directory.
variants (str, optional) – Input VCF file must be already BGZF compressed (.gz) and indexed (.tbi) to allow random access. Statistical haplotype phasing will be skipped if input VCF is already fully phased.
depth_of_coverage (str, optional) – Archive file or object with the semantic type CovFrame[DepthOfCoverage].
control_statistics (str or pypgx.Archive, optional) – Archive file or object with the semantic type SampleTable[Statistics].
platform ({‘WGS’, ‘Targeted’}, default: ‘WGS’) – Genotyping platform.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
panel (str, optional) – VCF file corresponding to a reference haplotype panel (compressed or uncompressed). By default, the 1KGP panel in the
pypgx-bundle
directory will be used.force (bool, default : False) – Overwrite output directory if it already exists.
samples (str or list, optional) – Subset the VCF for specified samples. This can be a text file (.txt, .tsv, .csv, or .list) containing one sample per line. Alternatively, you can provide a list of samples.
exclude (bool, default: False) – If True, exclude specified samples.
samples_without_sv (list, optional) – List of known samples without SV.
do_not_plot_copy_number (bool, default: False) – Do not plot copy number profile.
do_not_plot_allele_fraction (bool, default: False) – Do not plot allele fraction profile.
cnv_caller (str or pypgx.Archive, optional) – Archive file or object with the semantic type Model[CNV]. By default, a pre-trained CNV caller in the
pypgx-bundle
directory will be used.
plot¶
The plot submodule is used to plot various kinds of profiles such as read depth, copy number, and allele fraction.
Functions:
|
Plot copy number profile from CovFrame[CopyNumber]. |
|
Plot copy number profile with BAM data. |
|
Plot both copy number profile and allele fraction profile in one figure. |
|
Plot allele fraction profile with VCF data. |
|
Plot read depth profile with VCF data. |
- pypgx.api.plot.plot_bam_copy_number(copy_number, fitted=False, path=None, samples=None, ymin=- 0.3, ymax=6.3, fontsize=25)[source]¶
Plot copy number profile from CovFrame[CopyNumber].
- Parameters
copy_number (str or pypgx.Archive) – Archive file or object with the semantic type CovFrame[CopyNumber].
fitted (bool, default: False) – If True, show the fitted line as well.
path (str, optional) – Create plots in this directory (default: current directory). Use
path='-'
to return a list ofmatplotlib.figure.Figure
objects instead of writing files.samples (str or list, optional) – Specify which samples should be included for analysis by providing a text file (.txt, .tsv, .csv, or .list) containing one sample per line. Alternatively, you can provide a list of samples.
ymin (float, default: -0.3) – Y-axis bottom.
ymax (float, default: 6.3) – Y-axis top.
fontsize (float, default: 25) – Text fontsize.
- Returns
Output type depends on
path
.- Return type
None or list
- pypgx.api.plot.plot_bam_read_depth(read_depth, path=None, samples=None, ymin=None, ymax=None, fontsize=25)[source]¶
Plot copy number profile with BAM data.
- Parameters
read_depth (str or pypgx.Archive) – Archive file or object with the semantic type CovFrame[ReadDepth].
path (str, optional) – Create plots in this directory (default: current directory). Use
path='-'
to return a list ofmatplotlib.figure.Figure
objects instead of writing files.samples (str or list, optional) – Specify which samples should be included for analysis by providing a text file (.txt, .tsv, .csv, or .list) containing one sample per line. Alternatively, you can provide a list of samples.
ymin (float, optional) – Y-axis bottom.
ymax (float, optional) – Y-axis top.
fontsize (float, default: 25) – Text fontsize.
- Returns
Output type depends on
path
.- Return type
None or list
- pypgx.api.plot.plot_cn_af(copy_number, imported_variants, path=None, samples=None, ymin=- 0.3, ymax=6.3, fontsize=25)[source]¶
Plot both copy number profile and allele fraction profile in one figure.
- Parameters
copy_number (str or pypgx.Archive) – Archive file or object with the semantic type CovFrame[CopyNumber].
imported_variants (str or pypgx.Archive) – Archive file or object with the semantic type VcfFrame[Imported] or VcfFrame[Consolidated].
path (str, optional) – Create plots in this directory (default: current directory). Use
path='-'
to return a list ofmatplotlib.figure.Figure
objects instead of writing files.samples (str or list, optional) – Specify which samples should be included for analysis by providing a text file (.txt, .tsv, .csv, or .list) containing one sample per line. Alternatively, you can provide a list of samples.
ymin (float, default: -0.3) – Y-axis bottom.
ymax (float, default: 6.3) – Y-axis top.
fontsize (float, default: 25) – Text fontsize.
- Returns
Output type depends on
path
.- Return type
None or list
- pypgx.api.plot.plot_vcf_allele_fraction(imported_variants, path=None, samples=None, fontsize=25)[source]¶
Plot allele fraction profile with VCF data.
- Parameters
imported_variants (str or pypgx.Archive) – Archive file or object with the semantic type VcfFrame[Imported] or VcfFrame[Consolidated].
path (str, optional) – Create plots in this directory (default: current directory). Use
path='-'
to return a list ofmatplotlib.figure.Figure
objects instead of writing files.samples (str or list, optional) – Specify which samples should be included for analysis by providing a text file (.txt, .tsv, .csv, or .list) containing one sample per line. Alternatively, you can provide a list of samples.
fontsize (float, default: 25) – Text fontsize.
- Returns
Output type depends on
path
.- Return type
None or list
- pypgx.api.plot.plot_vcf_read_depth(gene, vcf, assembly='GRCh37', path=None, samples=None, ymin=None, ymax=None)[source]¶
Plot read depth profile with VCF data.
- Parameters
gene (str) – Target gene.
vcf (str) – VCF file.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
path (str, optional) – Create plots in this directory (default: current directory). Use
path='-'
to return a list ofmatplotlib.figure.Figure
objects instead of writing files.samples (str or list, optional) – Specify which samples should be included for analysis by providing a text file (.txt, .tsv, .csv, or .list) containing one sample per line. Alternatively, you can provide a list of samples.
ymin (float, optional) – Y-axis bottom.
ymax (float, optional) – Y-axis top.
- Returns
Output type depends on
path
.- Return type
None or list
utils¶
The utils submodule contains main actions of PyPGx.
Functions:
|
Call phenotypes for target gene. |
|
Combine various results for the target gene. |
|
Calculate concordance between two genotype results. |
|
Compute summary statistics for control gene from BAM files. |
|
Compute copy number from read depth for target gene. |
|
Compute read depth for target gene from BAM files. |
|
Count star alleles from genotype calls. |
|
Create a consolidated VCF file. |
|
Call SNVs/indels from BAM files for all target genes. |
|
Create a BED file which contains all regions used by PyPGx. |
|
Estimate haplotype phase of observed variants with the Beagle program. |
|
Filter Archive for specified samples. |
|
Import read depth data for target gene. |
|
Import SNV/indel data for target gene. |
|
Predict candidate star alleles based on observed SNVs and indels. |
|
Predict CNV from copy number data for target gene. |
|
Prepare a depth of coverage file for all target genes with SV from BAM files. |
|
Print the main data of specified archive. |
|
Print the metadata of specified archive. |
|
Slice BAM file for all genes used by PyPGx. |
|
Test a CNV caller for the target gene. |
|
Train a CNV caller for the target gene. |
- pypgx.api.utils.call_phenotypes(genotypes)[source]¶
Call phenotypes for target gene.
- Parameters
genotypes (str or pypgx.Archive) – Archive file or object with the semantic type SampleTable[Genotypes].
- Returns
Archive object with the semantic type SampleTable[Phenotypes].
- Return type
pypgx.Archive
- pypgx.api.utils.combine_results(genotypes=None, phenotypes=None, alleles=None, cnv_calls=None)[source]¶
Combine various results for the target gene.
- Parameters
genotypes (str or pypgx.Archive, optional) – Archive file or object with the semantic type SampleTable[Genotypes].
phenotypes (str or pypgx.Archive, optional) – Archive file or object with the semantic type SampleTable[Phenotypes].
alleles (str or pypgx.Archive, optional) – Archive file or object with the semantic type SampleTable[Alleles].
cnv_calls (str or pypgx.Archive, optional) – Archive file or object with the semantic type SampleTable[CNVCalls].
- Returns
Archive object with the semantic type SampleTable[Results].
- Return type
pypgx.Archive
- pypgx.api.utils.compare_genotypes(first, second, verbose=False)[source]¶
Calculate concordance between two genotype results.
Only samples that appear in both genotype results will be used to calculate concordance for genotype calls as well as CNV calls.
- Parameters
first (str or pypgx.Archive) – First archive file or object with the semantic type SampleTable[Results].
second (str or pypgx.Archive) – Second archive file or object with the semantic type SampleTable[Results].
verbose (bool, default: False) – If True, print the verbose version of output, including discordant calls.
Examples
>>> import pypgx >>> pypgx.compare_genotypes('results-1.zip', 'results-2.zip') # Genotype Total: 100 Compared: 100 Concordance: 1.000 (100/100) # CNV Total: 100 Compared: 100 Concordance: 1.000 (100/100)
- pypgx.api.utils.compute_control_statistics(gene, bams, assembly='GRCh37', bed=None)[source]¶
Compute summary statistics for control gene from BAM files.
Note that for the arguments
gene
andbed
, the ‘chr’ prefix in contig names (e.g. ‘chr1’ vs. ‘1’) will be automatically added or removed as necessary to match the input BAM’s contig names.- Parameters
gene (str) – Control gene (recommended choices: ‘EGFR’, ‘RYR1’, ‘VDR’). Alternatively, you can provide a custom region (format: chrom:start-end).
bams (str or list) – One or more input BAM files. Alternatively, you can provide a text file (.txt, .tsv, .csv, or .list) containing one BAM file per line.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
bed (str, optional) – By default, the input data is assumed to be WGS. If it’s targeted sequencing, you must provide a BED file to indicate probed regions.
- Returns
Archive object with the semantic type SampleTable[Statistics].
- Return type
pypgx.Archive
- pypgx.api.utils.compute_copy_number(read_depth, control_statistics, samples_without_sv=None)[source]¶
Compute copy number from read depth for target gene.
The method will convert read depth from target gene to copy number by performing intra-sample normalization using summary statistics from control gene.
If the input data was generated with targeted sequencing as opposed to WGS, the method will also apply inter-sample normalization using summary statistics across all samples. For best results, it is recommended to manually specify a list of known reference samples that do not have SV.
- Parameters
read_depth (str or pypgx.Archive) – Archive file or object with the semantic type CovFrame[ReadDepth].
control_statistics (str or pypgx.Archive) – Archive file or object with the semandtic type SampleTable[Statistics].
samples_without_sv (list, optional) – List of known samples without SV.
- Returns
Archive file with the semandtic type CovFrame[CopyNumber].
- Return type
pypgx.Archive
- pypgx.api.utils.compute_target_depth(gene, bams, assembly='GRCh37', bed=None)[source]¶
Compute read depth for target gene from BAM files.
By default, the input data is assumed to be WGS. If it’s targeted sequencing, you must provide a BED file with
bed
to indicate probed regions.- Parameters
gene (str) – Target gene.
bams (str or list) – One or more input BAM files. Alternatively, you can provide a text file (.txt, .tsv, .csv, or .list) containing one BAM file per line.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
bed (str, optional) – BED file.
- Returns
Archive object with the semantic type CovFrame[ReadDepth].
- Return type
pypgx.Archive
- pypgx.api.utils.create_consolidated_vcf(imported_variants, phased_variants)[source]¶
Create a consolidated VCF file.
- Parameters
imported_variants (str or pypgx.Archive) – Archive file or object with the semantic type VcfFrame[Imported].
phased_variants (str or pypgx.Archive) – Archive file or object with the semandtic type VcfFrame[Phased].
- Returns
Archive object with the semantic type VcfFrame[Consolidated].
- Return type
pypgx.Archive
- pypgx.api.utils.create_input_vcf(vcf, fasta, bams, assembly='GRCh37', genes=None, exclude=False, dir_path=None, max_depth=250)[source]¶
Call SNVs/indels from BAM files for all target genes.
To save computing resources, this method will call variants only for target genes whose at least one star allele is defined by SNVs/indels. Therefore, variants will not be called for target genes that have star alleles defined only by structural variation (e.g. UGT2B17).
- Parameters
vcf (str) – Output VCF file. It must have .vcf.gz as suffix.
fasta (str) – Reference FASTA file.
bams (str or list) – One or more input BAM files. Alternatively, you can provide a text file (.txt, .tsv, .csv, or .list) containing one BAM file per line.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
genes (list, optional) – List of genes to include.
exclude (bool, default: False) – Exclude specified genes. Ignored when
genes=None
.dir_path (str, optional) – By default, intermediate files (likelihoods.bcf, calls.bcf, and calls.normalized.bcf) will be stored in a temporary directory, which is automatically deleted after creating final VCF. If you provide a directory path, intermediate files will be stored there.
max_depth (int, default: 250) – At a position, read maximally this number of reads per input file. If your input data is from WGS (e.g. 30X), you don’t need to change this option. However, if it’s from targeted sequencing with ultra-deep coverage (e.g. 500X), then you need to increase the maximum depth.
- pypgx.api.utils.create_regions_bed(assembly='GRCh37', add_chr_prefix=False, merge=False, target_genes=False, sv_genes=False, var_genes=False, genes=None, exclude=False)[source]¶
Create a BED file which contains all regions used by PyPGx.
- Parameters
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
add_chr_prefix (bool, default: False) – Whether to add the ‘chr’ string in contig names.
merge (bool, default: False) – Whether to merge overlapping intervals (gene names will be removed too).
target_genes (bool, default: False) – Whether to only return target genes, excluding control genes and paralogs.
sv_genes (bool, default: False) – Whether to only return target genes whose at least one star allele is defined by structural variation.
var_genes (bool, default: False) – Whether to only return target genes whose at least one star allele is defined by SNVs/indels.
genes (list, optional) – List of genes to include.
exclude (bool, default: False) – Exclude specified genes. Ignored when
genes=None
.
- Returns
BED file.
- Return type
fuc.api.pybed.BedFrame
Examples
>>> import pypgx >>> bf = pypgx.create_regions_bed() >>> bf.gr.df.head() Chromosome Start End Name 0 1 201005639 201084694 CACNA1S 1 1 60355979 60395470 CYP2J2 2 1 47391859 47410148 CYP4A11 3 1 47600112 47618399 CYP4A22 4 1 47261669 47288021 CYP4B1 >>> bf = pypgx.create_regions_bed(assembly='GRCh38') >>> bf.gr.df.head() Chromosome Start End Name 0 1 201036511 201115426 CACNA1S 1 1 59890307 59929773 CYP2J2 2 1 46926187 46944476 CYP4A11 3 1 47134440 47152727 CYP4A22 4 1 46796045 46822413 CYP4B1 >>> bf = pypgx.create_regions_bed(add_chr_prefix=True) >>> bf.gr.df.head() Chromosome Start End Name 0 chr1 201005639 201084694 CACNA1S 1 chr1 60355979 60395470 CYP2J2 2 chr1 47391859 47410148 CYP4A11 3 chr1 47600112 47618399 CYP4A22 4 chr1 47261669 47288021 CYP4B1 >>> bf = pypgx.create_regions_bed(merge=True) >>> bf.gr.df.head() Chromosome Start End 0 1 47261669 47288021 1 1 47391859 47410148 2 1 47600112 47618399 3 1 60355979 60395470 4 1 97540298 98389615
- pypgx.api.utils.estimate_phase_beagle(imported_variants, panel=None, impute=False)[source]¶
Estimate haplotype phase of observed variants with the Beagle program.
- Parameters
imported_variants (str or pypgx.Archive) – Archive file or object with the semantic type VcfFrame[Imported]. The ‘chr’ prefix in contig names (e.g. ‘chr1’ vs. ‘1’) will be automatically added or removed as necessary to match the reference VCF’s contig names.
panel (str, optional) – VCF file corresponding to a reference haplotype panel (compressed or uncompressed). By default, the 1KGP panel in the
pypgx-bundle
directory will be used.impute (bool, default: False) – If True, perform imputation of missing genotypes.
- Returns
Archive object with the semantic type VcfFrame[Phased].
- Return type
pypgx.Archive
- pypgx.api.utils.filter_samples(archive, samples, exclude=False)[source]¶
Filter Archive for specified samples.
- Parameters
archive (str or pypgx.archive) – Archive file or object.
samples (str or list) – Specify which samples should be included for analysis by providing a text file (.txt, .tsv, .csv, or .list) containing one sample per line. Alternatively, you can provide a list of samples.
exclude (bool, default: False) – If True, exclude specified samples.
- Returns
Fitlered Archive object.
- Return type
pypgx.Archive
- pypgx.api.utils.import_read_depth(gene, depth_of_coverage, samples=None, exclude=False)[source]¶
Import read depth data for target gene.
- Parameters
gene (str) – Gene name.
depth_of_coverage (str or pypgx.Archive) – Archive file or object with the semantic type CovFrame[DepthOfCoverage].
samples (str or list, optional) – Specify which samples should be included for analysis by providing a text file (.txt, .tsv, .csv, or .list) containing one sample per line. Alternatively, you can provide a list of samples.
exclude (bool, default: False) – If True, exclude specified samples.
- Returns
Archive object with the semantic type CovFrame[ReadDepth].
- Return type
pypgx.Archive
- pypgx.api.utils.import_variants(gene, vcf, assembly='GRCh37', platform='WGS', samples=None, exclude=False)[source]¶
Import SNV/indel data for target gene.
The method will slice the input VCF for the target gene to create an archive object with the semantic type VcfFrame[Imported] or VcfFrame[Consolidated].
- Parameters
gene (str) – Target gene.
vcf (str or fuc.api.pyvcf.VcfFrame) – Input VCF file must be already BGZF compressed (.gz) and indexed (.tbi) to allow random access. Alternatively, you can provide a VcfFrame object.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
platform ({‘WGS’, ‘Targeted’, ‘Chip’, ‘LongRead’}, default: ‘WGS’) – Genotyping platform used. When the platform is ‘WGS’, ‘Targeted’, or ‘Chip’, the method will assess whether every genotype call in the sliced VCF is haplotype phased (e.g. ‘0|1’). If the sliced VCF is fully phased, the method will return VcfFrame[Consolidated] or otherwise VcfFrame[Imported]. When the platform is ‘LongRead’, the method will return VcfFrame[Consolidated] after applying the phase-extension algorithm to estimate haplotype phase of any variants that could not be resolved by read-backed phasing.
samples (str or list, optional) – Specify which samples should be included for analysis by providing a text file (.txt, .tsv, .csv, or .list) containing one sample per line. Alternatively, you can provide a list of samples.
exclude (bool, default: False) – If True, exclude specified samples.
- Returns
Archive object with the semantic type VcfFrame[Imported] or VcfFrame[Consolidated].
- Return type
pypgx.Archive
- pypgx.api.utils.predict_alleles(consolidated_variants)[source]¶
Predict candidate star alleles based on observed SNVs and indels.
- Parameters
consolidated_variants (str or pypgx.Archive) – Archive file or object with the semantic type VcfFrame[Consolidated].
- Returns
Archive object with the semantic type SampleTable[Alleles].
- Return type
pypgx.Archive
- pypgx.api.utils.predict_cnv(copy_number, cnv_caller=None)[source]¶
Predict CNV from copy number data for target gene.
Genomic positions that are missing copy number because, for example, the input data is targeted sequencing will be imputed with forward filling.
- Parameters
copy_number (str or pypgx.Archive) – Archive file or object with the semantic type CovFrame[CopyNumber].
cnv_caller (str or pypgx.Archive, optional) – Archive file or object with the semantic type Model[CNV]. By default, a pre-trained CNV caller in the
pypgx-bundle
directory will be used.
- Returns
Archive object with the semantic type SampleTable[CNVCalls].
- Return type
pypgx.Archive
- pypgx.api.utils.prepare_depth_of_coverage(bams, assembly='GRCh37', bed=None, genes=None, exclude=False)[source]¶
Prepare a depth of coverage file for all target genes with SV from BAM files.
To save computing resources, this method will count read depth only for target genes whose at least one star allele is defined by structural variation. Therefore, read depth will not be computed for target genes that have star alleles defined only by SNVs/indels (e.g. CYP3A5).
- Parameters
bams (str or list) – One or more input BAM files. Alternatively, you can provide a text file (.txt, .tsv, .csv, or .list) containing one BAM file per line.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
bed (str, optional) – By default, the input data is assumed to be WGS. If it’s targeted sequencing, you must provide a BED file to indicate probed regions. Note that the ‘chr’ prefix in contig names (e.g. ‘chr1’ vs. ‘1’) will be automatically added or removed as necessary to match the input BAM’s contig names.
genes (list, optional) – List of genes to include.
exclude (bool, default: False) – Exclude specified genes. Ignored when
genes=None
.
- Returns
Archive object with the semantic type CovFrame[DepthOfCoverage].
- Return type
pypgx.Archive
- pypgx.api.utils.print_data(input)[source]¶
Print the main data of specified archive.
- Parameters
input (pypgx.Archive) – Archive file.
- pypgx.api.utils.print_metadata(input)[source]¶
Print the metadata of specified archive.
- Parameters
input (pypgx.Archive) – Archive file.
- pypgx.api.utils.slice_bam(input, output, assembly='GRCh37', genes=None, exclude=False)[source]¶
Slice BAM file for all genes used by PyPGx.
- Parameters
input – Input BAM file. It must be already indexed to allow random access.
output (str) – Output BAM file.
assembly ({‘GRCh37’, ‘GRCh38’}, default: ‘GRCh37’) – Reference genome assembly.
genes (list, optional) – List of genes to include.
exclude (bool, default: False) – Exclude specified genes. Ignored when
genes=None
.
- pypgx.api.utils.test_cnv_caller(cnv_caller, copy_number, cnv_calls, confusion_matrix=None, comparison_table=None)[source]¶
Test a CNV caller for the target gene.
- Parameters
cnv_caller (str or pypgx.Archive) – Archive file or object with the semantic type Model[CNV].
copy_number (str or pypgx.Archive) – Archive file or object with the semantic type CovFrame[CopyNumber].
cnv_calls (str or pypgx.Archive) – Archive file or object with the semantic type SampleTable[CNVCalls].
confusion_matrix (str, optional) – Write the confusion matrix as a CSV file where rows indicate actual class and columns indicate prediction class.
comparison_table (str, optional) – Write a CSV file comparing actual vs. predicted CNV calls for each sample.
- pypgx.api.utils.train_cnv_caller(copy_number, cnv_calls, confusion_matrix=None, comparison_table=None)[source]¶
Train a CNV caller for the target gene.
This method will return a SVM-based multiclass classifier that makes CNV calls using the one-vs-rest strategy.
- Parameters
copy_number (str or pypgx.Archive) – Archive file or object with the semantic type CovFrame[CopyNumber].
cnv_calls (str or pypgx.Archive) – Archive file or object with the semantic type SampleTable[CNVCalls].
confusion_matrix (str, optional) – Write the confusion matrix as a CSV file where rows indicate actual class and columns indicate prediction class.
comparison_table (str, optional) – Write a CSV file comparing actual vs. predicted CNV calls for each sample.
- Returns
Archive object with the semantic type Model[CNV].
- Return type
pypgx.Archive