前言:

微博参与话题 #给你四年时间你也学不会生信#

先前的富集分析教程

本文主要针对非模式物种,但是有参考基因组可用

1. R包安装及database下载

# non-model, but have the genome

> source(“https://bioconductor.org/biocLite.R”)

> biocLite(“AnnotationHub”)

> biocLite(“biomaRt”)

# load package

> library(AnnotationHub)

> library(biomaRt)

# make a orgDb

> hub

这里以桔小实蝇为例

# fruit fly = bactrocera dorsalis

> query(hub, “bactrocera”)

搜索后结果如下:

> query(hub, “bactrocera”)

AnnotationHub with 9 records

# snapshotDate(): 2018-04-30

# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/

# $species: Bactrocera (Bactrocera)_dorsalis, Bactrocera (Bactrocera)_latifrons, Bactrocera (Dacul…

# $rdataclass: OrgDb

# additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer,

# rdatadateadded, preparerclass, tags, rdatapath, sourceurl, sourcetype

# retrieve records with, e.g., ‘object[[“AH62538”]]’

title

AH62538 | org.Bactrocera_(Bactrocera)_latifrons.eg.sqlite

AH62539 | org.Bactrocera_latifrons.eg.sqlite

AH62542 | org.Bactrocera_(Daculus)_oleae.eg.sqlite

AH62543 | org.Bactrocera_(Dacus)_oleae.eg.sqlite

AH62544 | org.Bactrocera_oleae.eg.sqlite

AH62568 | org.Bactrocera_(Zeugodacus)_cucurbitae.eg.sqlite

AH62569 | org.Bactrocera_cucurbitae.eg.sqlite

AH62581 | org.Bactrocera_(Bactrocera)_dorsalis.eg.sqlite

AH62582 | org.Bactrocera_dorsalis.eg.sqlite

我们选择AH62582 | org.Bactrocera_dorsalis.eg.sqlite并下载它

> Bactrocera.OrgDb

如果报错,可能是缺少依赖的安装包,可以按照提示依次下载,两种方法

install.packages(“packages”)

2. 查看注释信息

> columns(Bactrocera.OrgDb)

[1] “ACCNUM” “ALIAS” “CHR” “ENTREZID” “EVIDENCE” “EVIDENCEALL” “GENENAME”

[8] “GID” “GO” “GOALL” “ONTOLOGY” “ONTOLOGYALL” “PMID” “REFSEQ”

[15] “SYMBOL”

> Bactrocera.OrgDb

OrgDb object:

| DBSCHEMAVERSION: 2.1

| DBSCHEMA: NOSCHEMA_DB

| ORGANISM: Bactrocera dorsalis

| SPECIES: Bactrocera dorsalis

| CENTRALID: GID

| Taxonomy ID: 27457

| Db type: OrgDb

| Supporting package: AnnotationDbi

Please see: help(‘select’) for usage information

# 查看注释信息的每一列

> head(keys(Bactrocera.OrgDb,keytype = “ALIAS”))

[1] “AAA62341.1” “AAA62342.1” “AAA62343.1” “AAA62344.1” “AAF22478.1” “AAL17758.1”

实际上,ALIAS内包含了“omitted 17518 entries”

3. GO富集分析

# 对BP(Biological process)进行富集分析

# 只需将OrgDb数据库替换为我们下载好的非模式物种库即可。

> enrich.go.BP = enrichGO(gene = DEG.gene_symbol,

OrgDb = Bactrocera.OrgDb,

keyType = ‘ENTREZID’,ont= “BP”,

pvalueCutoff = 0.01,

qvalueCutoff = 0.05,

readable = T)

> barplot(enrich.go.BP)

> dotplot(enrich.go.BP)

p_value: 富集显著性,统计显著性要去小于0.01;

q_value: 对p_value的修正,在多次统计检验时,需要有修正值;

q_value一定大于p_value

4. KEGG富集分析

# 只需将OrgDb数据库替换为我们下载好的非模式物种库即可。

> enrichKEGG(gene = DEG.gene_symbol,

OrgDb = Bactrocera.OrgDb,

keyType = ‘ENTREZID’,

ont = “DO”,

pvalueCutoff = 0.01,

qvalueCutofF = 0.05,

readable = T)

5. GO出图解读

纵轴为GO中每一term,例如Legionellosis;

横轴为GeneRatio,即输入的基因,term在整体基因中所占的百分数;

圆圈大小表示count的数目;

p.adjust:p越小,圆越大,结果越可靠;

c1348043c093

Rplot22.png