ANNOVAR人类各个数据库变异注释结果表格说明

转自 鳉鲈的博客    转自 omicclassANNOVAR注释结果中各列的表头说明:ID详解Chr染色体Start变异位点在染色体上的起始位置End变异位点在染色体上的结束位置Ref参考基因组碱基型Alt变异碱基型Func.refGene对变异位点所在的区域进行注释(exonic, splicing, UTR5, UTR3, intronic, ncRNA_exonic, ncRNA_intronic, ncRNA_UTR3, ncRNA_UTR5, ncRNA _splicing, upstream, downstream, intergenic)Gene.refGene列出该变异位点相关的转录本(只有功能符合 Func 列的转录本才列出)。如果 Func 为intergenic,此处列出两侧的基因名GeneDetail.refGene描述 UTR、splicing、ncRNA_splicing 或 intergenic 区域的变异情况。当 Func 列的值为exonic、ncRNA_exonic、intronic、ncRNA_intronic、upstream、downstream、upstream;downstream、ncRNA_UTR3、ncRNA_UTR5 时,该列为空;当 Func 列的值为 intergenic 时,该列格式为dist=1366;dist=22344,表示该变异位点距离两侧基因的距离ExonicFunc.refGene外显子区的 SNV or InDel 变异类型(SNV 的变异类型包括 synonymous_SNV, missense_SNV, stopgain_SNV, stopgloss_SNV 和 unknown;Indel 的变异类型包括 frameshift insertion, frameshift deletion, stopgain, stoploss, nonframeshift insertion, nonframeshift deletion 和 unknown)AAChange.refGene氨基酸改变,只有当 Func 列为 exonic 或 exonic;splicing 时,该列才有结果。按照每个转录本进行注释(例如,NADK:NM_001198995:exon10:c.1240_1241insAGG:p.G414delinsEG,其中,NADK 表示该变异所在的基因名称,NM_001198995 表示该变异所在的转录本 ID,exon10 表示该变异位于转录本的第 10 个外显子上,c.1240_1241insAGG 表示该变异引起 cDNA 在第 1240 和 1241 位之间插入 AGG,p.G414delinsEG 表示该变异引起蛋白序列在第 414 位上的氨基酸由 Gly 变为 Gly-Glu。再如, FMN2:NM_020066:exon1:c.160_162del:p.54_54del,表示该变异引起 cDNA 的第 160 到 162 位发生删除,p.54_54del 表示该变异引起蛋白序列在第 54 位上的氨基酸删除)cytoBand该变异位点所处的染色体区段(利用 Giemas 染色观察得到的)genomicSuperDups基因组中的重复片段nci60NCI-60 human tumor cell line panel exome sequencing allele frequency dataesp6500siv2_all国家心肺和血液研究所外显子组测序计划(NHLBI-ESP project,esp6500si_all 数据库中包含SNP 变异、Indel 变异和Y 染色体上的变异)的所有个体中,突变碱基的等位基因频率(alternative allele frequency)。ALL.sites.2015_08给出千人基因组计划数据(2015 年 8 月公布的版本)的所有人群中,该变异位点上突变碱基的等位基因频率EAS.sites.2015_08给出千人基因组计划数据(2015 年 8 月公布的版本)的亚洲人群中,该变异位点上突变碱基的等位基因频率SAS.sites.2015_08给出千人基因组计划数据(2015 年 8 月公布的版本)的南亚洲人群中,该变异位点上突变碱基的等位基因频率avsnp150该变异在 dbSNP中的 IDSIFT_scoreSIFT 分值,表示该变异对蛋白序列的影响,SIFT 分值越小越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大;SIFT_predD: Deleterious (sift<=0.05); T: tolerated (sift>0.05))Polyphen2_HDIV_score利用 PolyPhen2 基于 HumanDiv 数据库预测该变异对蛋白序列的影响,用于复杂疾病,数值越大越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大;damaging (0.453<=pp2_hdiv<=0.956); B: benign (pp2_hdiv<=0.452))Polyphen2_HDIV_predD 或 P 或 B(D: Probably damaging (>=0.957), P: possiblyPolyphen2_HVAR_score利用 PolyPhen2 基于 HumanVar 数据库预测该变异对蛋白序列的影响,用于单基因遗传病。数值越大越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大;Polyphen2_HVAR_predD 或 P 或 B(D: Probably damaging (>=0.909), P: possibly damaging (0.447<=pp2_hvar<=0.909); B: benign (pp2_hvar<=0.446))LRT_scoreLRT 分值,表示该变异对蛋白序列的影响,值越大越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大。LRT_predD、N 或者 U(D: Deleterious; N: Neutral; U: Unknown)。MutationTaster_scoreMutationTaster 分值,表示该变异对蛋白序列的影响,值越大越“有害”,表明该 SNP 导致蛋白结构或功能改变的可能性大。("polymorphism_automatic"MutationTaster_predA ("disease_causing_automatic"); "D" ("disease_causing");"N" ("polymorphism"); "P" (Polymorphism_automatic)MutationAssessor_scoreMutationAssessor预测的致病得分MutationAssessor_predMutationAssessor根据阈值判断得到的预测分类:H为较高可信度的致病位点,M为中等可信的致病位点,L为低可信度的致病位点,N为无害位点FATHMM_scoreFATHMM软件预测的致病性得分FATHMM_predFATHMM根据阈值得到的分类:D为较高可信度的致病位点,P为可信度一般的致病位点RadialSVM_scorehigher score denoting more deleterious variantsRadialSVM_predD: Deleterious; T: ToleratedLR_scorehigher score denoting more deleterious variantsLR_predD: Deleterious; T: ToleratedVEST3_scoreVariant effect scoring tool;Random forest classifier, higher values are more deleteriousCADD_rawCADD raw scoreCADD_phredCADD phred-like score,higher values are more deleteriousGERP++_RSGREP++ "rejected substitutions" (RS) score,higher scores are more deleteriousphyloP46way_placentalhigher scores are more deleteriousphyloP100way_vertebratehigher scores are more deleteriousSiPhy_29way_logOddshigher scores are more deleteriousdgvMerged人类结构变异注释结果:http://dgv.tcag.ca/dgv/app/homephastConsElements100way由 phastCons 程序基于脊椎动物全基因组比对预测得到的保守区域,100way 是指使用的物种数目为 100 个omim_201806孟德尔遗传病数据库注释cosmic70人类癌症体细胞突变影响的数据库,COSM开头为ID可到网站查询https://cancer.sanger.ac.uk/cosmicCLNALLELEIDthe ClinVar Allele IDCLNDNClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDBCLNDISDBTag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNNCLNREVSTATClinVar review status for the Variation IDCLNSIGClinical significance for this single variantgwasCatalog检测变异位点是否在以往的 GWAS 研究中被报导,表示该变异位点与哪些疾病相关联,“.”表示没有 GWAS 报导HGMDHGMD注释结果Allele_frequency样品变异碱基的等位基因频率QUAL变异的质量值FORMAT通常为:GT:AD:DP:GQ:PL,标记样品列属性sample样品信息列详情见:http://www.omicsclass.com/article/6当然关于人类的变异信息ANNOVAR注释的数据库很多,这里只列举了部分内容,下面是网上摘录了一个信息:https://brb.nci.nih.gov/seqtools/colexpanno.htmlWe provide here detailed Description about the files outputted from the mutation annotators via ANNOVAR and SnpEff.ChrChromosome numberStartStart positionEndEnd positionRefReference base(s)AltAlternate non-reference alleles called on at least one of the samplesCOSMIC IDCOSMIC IDFunc.refGeneRegions (e.g., exonic, intronic, non-coding RNA)) that one variant hits; please click here for details.Gene.refGeneGene name associated with one variantExonicFunc.refGeneExonic variant function, e.g., nonsynonymous, synonymous, frameshift insertion.please click here for details.AAChange.refGeneAmino acid change. For example, SAMD11:NM_152486:exon10:c.T1027C:p.W343R stands for gene name, Known RefSeq accession, region, cDNA level change, protein level change.SIFT_scoreSIFT score. See the dbNSFP information table for details.SIFT_predSIFT prediction. See the dbNSFP information table for details.Polyphen2_HDIV_scorePholyphen2 score based on HDIV. See the dbNSFP information table for details.Polyphen2_HDIV_predPholyphen2 prediction based on HDIV. See the dbNSFP information tablefor details.Polyphen2_HVAR_scorePolyphen2 score based on HVAR. See the dbNSFP information table for details.Polyphen2_HVAR_predPolyphen2 prediction based on HVAR. See the dbNSFP information tablefor details.LRT_scoreLRT score. See the dbNSFP information table for details.LRT_predLRT prediction. See the dbNSFP information table for details.MutationTaster_scoreMutationTaster score. See the dbNSFP information table for details.MutationTaster_predMutationTaster prediction. See the dbNSFP information table for details.MutationAssessor_scoreMutationTaster score. See the dbNSFP information table for details.MutationAssessor_predMutationTaster prediction. See the dbNSFP information table for details.FATHMM_scoreFATHMM score. See the dbNSFP information table for details.FATHMM_predFATHMM prediction. See the dbNSFP information table for details.PROVEAN_scorePROVEAN score<. See the dbNSFP information table for details./td>PROVEAN_predPROVEAN prediction. See the dbNSFP information table for details.VEST3_scoreVEST V3 score. See the dbNSFP information table for details.CADD_rawCADD raw score. See the dbNSFP information table for details.CADD_phredCADD phred-like score. See the dbNSFP information table for details.DANN_scoreDANN score. See the dbNSFP information table for details.fathmm-MKL_coding_scorefathmm-MKL score for one coding variant. See the dbNSFP information table for details.fathmm-MKL_coding_predfathmm-MKL prediction for one coding variant. See the dbNSFP information table for details.MetaSVM_scoreMetaSVM score. See the dbNSFP information table for details.MetaSVM_predMetaSVM prediction. See the dbNSFP information table for details.MetaLR_scoreMetaLR score. See the dbNSFP information table for details.MetaLR_predMetaLR prediction. See the dbNSFP information table for details.integrated_fitCons_scorefitCons score<. See the dbNSFP information table for details./td>integrated_confidence_valueconfidence level. See the dbNSFP information table for details.GERP++_RSGREP++ "rejected substitutions" (RS) score. See the dbNSFP information table for details.phyloP7way_vertebratePhylogenetic p-values for 7 vertebrate species. See the dbNSFP information table for details.phyloP20way_mammalianPhylogenetic p-values for 20 mammalian species. See the dbNSFP information table for details.phastCons7way_vertebratePhastCons score for 7 vertebrate species. See the dbNSFP information table for details.phastCons20way_mammalianphastCons p-values for 20 mammalian species. See the dbNSFP information table for details.SiPhy_29way_logOddsSiPhy log odds score for 29 species. See the dbNSFP information tablefor details.SnpEff 注释结果各表头说明CHROMChromosome numberPOSPositionIDsemi-colon separated list of unique identifiers where available. If this is a dbSNP variant it is encouraged to use the rs number(s).REFReference base(s)ALTAlternate non-reference alleles called on at least one of the samplesEFFECTFunctional consequences of one variant, e.g., missense_variant, synonymous_variant. please click here for details.REGIONRegions (e.g., exonic, intronic) that one variant hitsIMPACTPutative impact of the variant (e.g. HIGH, MODERATE or LOW impact).GENEGene name (usually HUGO)GENEIDGene ID)FEATUREThe type of feature is in the next field (e.g. transcript, motif, miRNA, etc.)FEATUREIDTranscript ID (preferably using version number), Motif ID, miRNA, ChipSeq peak, Histone mark, depending on the annotation.BIOTYPEDescription on whether the transcript is {“Coding”, “Noncoding”}. Whenever possible, use ENSEMBL biotypes. .HGVS_CVariant using HGVS notation (DNA level). For example, c.352A>G stands for A to G substitution of nucleotide 352. Click here for details.HGVS_PCoding variant using HGVS notation (Protein level). For example, p.Ile118Val stands for Isoleucine at position number 66 substitution to Valine. p.Ile118Val can be also be represented by p.I118V using the 1-letter symbol here. Click here for details.SIFT_scoreSIFT score. See the dbNSFP information table for details.SIFT_predSIFT prediction. See the dbNSFP information table for details.Polyphen2_HDIV_scorePholyphen2 score based on HDIV. See the dbNSFP information table for details.Polyphen2_HDIV_predPholyphen2 prediction based on HDIV. See the dbNSFP information tablefor details.Polyphen2_HVAR_scorePolyphen2 score based on HVAR. See the dbNSFP information table for details.Polyphen2_HVAR_predPolyphen2 prediction based on HVAR. See the dbNSFP information tablefor details.LRT_scoreLRT score. See the dbNSFP information table for details.LRT_predLRT prediction. See the dbNSFP information table for details.MutationTaster_scoreMutationTaster score. See the dbNSFP information table for details.MutationTaster_predMutationTaster prediction. See the dbNSFP information table for details.MutationAssessor_scoreMutationAssessor score. See the dbNSFP information table for details.MutationAssessor_predMutationAssessor prediction. See the dbNSFP information table for details.FATHMM_scoreFATHMM score. See the dbNSFP information table for details.FATHMM_predFATHMM prediction. See the dbNSFP information table for details.PROVEAN_scorePROVEAN score<. See the dbNSFP information table for details./td>PROVEAN_predPROVEAN prediction. See the dbNSFP information table for details.VEST3_scoreVEST V3 score. See the dbNSFP information table for details.CADD_rawCADD raw score. See the dbNSFP information table for details.CADD_phredCADD phred-like score. See the dbNSFP information table for details.MetaSVM_scoreMetaSVM score. See the dbNSFP information table for details.MetaSVM_predMetaSVM prediction. See the dbNSFP information table for details.MetaLR_scoreMetaLR score. See the dbNSFP information table for details.MetaLR_predMetaLR prediction. See the dbNSFP information table for details.GERP++_NRGREP++ conservation score. See the dbNSFP information table for details.GERP++_RSGREP++ "rejected substitutions" (RS) score. See the dbNSFP information table for details.phyloP100way_vertebratePhylogenetic p-values for 100 vertebrate species. See the dbNSFP information table for details.phastCons100way_vertebratePhastCons score for 7 vertebrate species. See the dbNSFP information table for details.SiPhy_29way_logOddsSiPhy log odds score for 29 species. See the dbNSFP information tablefor details.详细说明 InformationSIFT_predSIFT_scoreSIFTSort intolerated from toleratedP(An amino acid at a position is tolerated | The most frequentest amino acid being tolerated)D: Deleterious (sift<=0.05);T: tolerated (sift>0.05)Pauline Ng, Fred HutchinsonCancer Research Center, Seattle, WashingtonPolyphen2_HDIV_predPolyphen2_HDIV_scorePolyphen v2Polymorphism phenotyping v2D: Probably damaging (>=0.957),P: possibly damaging (0.453<=pp2_hdiv<=0.956),B: benign (pp2_hdiv<=0.452)Probablistic Classifier Training sets: HumDivHavard Medical School/td>Polyphen2_HVAR_predPolyphen2_HVAR_scorePolyphen v2Polymorphism phenotyping v2Machine learning Training sets: HumVarD: Probably damaging (>=0.957),P: possibly damaging (0.453<=pp2_hdiv<=0.956);B: benign (pp2_hdiv<=0.452)Shamil SunyaevHavard Medical SchoolLRT_predLRT_scoreLRTLikelihood ratio testLRT of H0: each codon evolves neutrally vs H1: the codon evovles under negative selectionD: Deleterious;N: Neutral;U: UnknownLower scores are more deleteriousSung Chung, Justin Fay Washington UniversityMutationTaster_predMutationTaster_scoreMutationTasterBayes ClassifierA: (""disease_causing_automatic"");D: (""disease_causing"");N: (""polymorphism [probably harmless]"");P: (""polymorphism_automatic[known to be harmless]"higher values are more deleterious"Markus Schuelkethe Charité - Universitätsmedizin BerlinMutationAssessor_predMutationAssessor_scoreMutationAssessorEntropy of multiple sequence alighnmentH: high;M: medium;L: low;N: neutral.H/M means functional and L/N means non-functional higher values are more deleteriousReva BorisComputation Biology Center Memorial Sloan Kettering Cancer CenterFATHMM_predFATHMM_scoreFATHMMHMMFunctional analysis through hidden markov model HMMD: Deleterious;T: Tolerated;lower values are more deleteriousShihab HashemUniversity of Bristol, UKPROVEAN_predPROVEAN_scoreProtein Variation Effect AnalyzerClustering of homologus sequencesD: Deleterious;N: Neutralhigher values are more deleteriousChoi Y J. Craig Venter InstituteVEST3_scoreVEST V3Variant effect scoring toolRandom forest classifierhigher values are more deleteriousRachel Karchin John Hopkins UniversityCADD_raw CADD_phredCADD Combined annotation dependent depletionLinear kernel SVMhigher values are more deleteriousJay Shendure, Xiaohui Xie University of California - IrvineDANN_scoreDANNDeleterious Annotation of genetic variants using Neural NetworksNeural networkhigher values are more deleteriousJay Shendure, Xiaohui XieUniversity of California - Irvinefathmm-MKL_coding_predFATHMM-MKLpredicting the effects of both coding and non-coding variants using nucleotide-based HMMsClassifier based on multiple kernel learningD: Deleterious;T: ToleratedScore >= 0.5: D;Score < 0.5: TShihab HashemUniversity of Bristol, UKMetaSVM_predMetaSVM_scoreMetaSVMSupport vector machineD: Deleterious; T: Tolerated;higher scores are more deleteriousCoco DongUSC Biostatiscs DepartmentMetaLR_predMetaLR_scoreMetaLRLogistic regressionD: Deleterious;T: Tolerated;higher scores are more deleteriousCoco DongUSC Biostatiscs Departmentintegrated_fitCons_scoreintegrated_confidence_valueFitConsFitness consequences of functional annotationIntegrate functional assays like ChIP-Seq with conservation measure of transcription factor binding siteshigher scores are more deleteriousAbrizaCold Spring Harbor LabGERP++_RSGERP++_NRGenome Evolutionary Rate Profiling ++maximum likelihood estimation procedurehigher scores are more deleteriousEugne DavydovStanford University, CS DepartmentphyloP7way_vertebratePhyloPPhylogentic p-valuesPhylogentic p-values calculated from a LRT, score-based test, GERP test Use 7 specieshigher scores are more deleteriousAdam SiepelUCSCphyloP20way_mammalianPhyloPPhylogentic p-valuesa phylogenetic hidden Markov model (phylo-HMM) Use 20 specieshigher scores are more deleteriousAdam SiepelUCSCphastCons7way_vertebratephastConsA phylogenetic hidden Markov model (phylo-HMM) Use 7 specieshigher scores are more deleteriousAdam SiepelUCSCphastCons20way_mammalianphastConsa phylogenetic hidden Markov model (phylo-HMM) Use 20 specieshigher scores are more deleteriousAdam SiepelUCSCSiPhy_29_waySiPhyProbablistic framework, HMM Use 29 specieshigher scores are more deleterious

(0)

相关推荐