admin on 十一月 1st, 2011

R的输入输出主要有几个类型:文本文件,xls文件,二进制文件,数据库文件,R文件,其它统计软件来源的文件,

首先是最简单的,文本文件,包括csv文件。假设我们有一个文件,题为qiuworld.com.txt,内容为

#this is a sample
ArrayDataFile	SourceName	FactorValue
GSM286765.CEL	t_24h_rep1	treated_24h
GSM286759.CEL	t_12h_rep1	treated_12h
GSM286763.CEL	c_24h_rep2	control_24h
GSM286760.CEL	t_12h_rep2	treated_12h
GSM286757.CEL	c_12h_rep2	control_12h
GSM286766.CEL	t_24h_rep2	treated_24h
GSM286756.CEL	c_12h_rep1	control_12h
GSM286762.CEL	c_24h_rep1	control_24h

我们现在需要把它读入R,可以使用read.table命令。read.table还有几个快捷的形式,比如read.delim,read.delim2,read.csv,read.csv2。这几个快捷的方式帮助我们减少参数的书写。一般的,如果是csv文件,它的分隔符是逗号,字符串的两边会加上引号,可以直接使用read.csv(“文件名”)的方式读入数据。

> read.table("qiuworld.com.txt",header=TRUE,sep="\t",quote="",comment.char="#")
  ArrayDataFile SourceName FactorValue
1 GSM286765.CEL t_24h_rep1 treated_24h
2 GSM286759.CEL t_12h_rep1 treated_12h
3 GSM286763.CEL c_24h_rep2 control_24h
4 GSM286760.CEL t_12h_rep2 treated_12h
5 GSM286757.CEL c_12h_rep2 control_12h
6 GSM286766.CEL t_24h_rep2 treated_24h
7 GSM286756.CEL c_12h_rep1 control_12h
8 GSM286762.CEL c_24h_rep1 control_24h
> read.delim("qiuworld.com.txt",header=TRUE,sep="\t",quote="",comment.char="#")
  ArrayDataFile SourceName FactorValue
1 GSM286765.CEL t_24h_rep1 treated_24h
2 GSM286759.CEL t_12h_rep1 treated_12h
3 GSM286763.CEL c_24h_rep2 control_24h
4 GSM286760.CEL t_12h_rep2 treated_12h
5 GSM286757.CEL c_12h_rep2 control_12h
6 GSM286766.CEL t_24h_rep2 treated_24h
7 GSM286756.CEL c_12h_rep1 control_12h
8 GSM286762.CEL c_24h_rep1 control_24h
> read.csv("qiuworld.com.txt",header=TRUE,sep="\t",quote="",comment.char="#")
  ArrayDataFile SourceName FactorValue
1 GSM286765.CEL t_24h_rep1 treated_24h
2 GSM286759.CEL t_12h_rep1 treated_12h
3 GSM286763.CEL c_24h_rep2 control_24h
4 GSM286760.CEL t_12h_rep2 treated_12h
5 GSM286757.CEL c_12h_rep2 control_12h
6 GSM286766.CEL t_24h_rep2 treated_24h
7 GSM286756.CEL c_12h_rep1 control_12h
8 GSM286762.CEL c_24h_rep1 control_24h

有时候我们只需要读取文件的前几行,那么可以在上面的命令当中加入nrows参数,比如

> read.delim("qiuworld.com.txt",comment.char="#",nrows=5)
  ArrayDataFile SourceName FactorValue
1 GSM286765.CEL t_24h_rep1 treated_24h
2 GSM286759.CEL t_12h_rep1 treated_12h
3 GSM286763.CEL c_24h_rep2 control_24h
4 GSM286760.CEL t_12h_rep2 treated_12h
5 GSM286757.CEL c_12h_rep2 control_12h

[......]

Read more

Tags: , ,

admin on 十月 22nd, 2011

我们在分析了差异表达数据之后,经常要生成一种直观图--热图(heatmap)。这一节就以基因芯片数据为例,示例生成高品质的热图。

比如

钢蓝渐白配色的热图

首先还是从最简单的heatmap开始。

> library(ggplot2)
> library(ALL) #可以使用biocLite("ALL")安装该数据包
> data("ALL")
> library(limma)
> eset<-ALL[,ALL$mol.biol %in% c("BCR/ABL","ALL1/AF4")]
> f<-factor(as.character(eset$mol.biol))
> design<-model.matrix(~f)
> fit<-eBayes(lmFit(eset,design)) #对基因芯片数据进行分析,得到差异表达的数据
> selected  <- p.adjust(fit$p.value[, 2]) <0.001 
> esetSel <- eset[selected,] #选择其中一部分绘制热图
> dim(esetSel) #从这尺度上看,数目并不多,但也不少。如果基因数过多,可以分两次做图。
Features  Samples 
      84       47 
> library(hgu95av2.db)
> data<-exprs(esetSel)
> probes<-rownames(data)
> symbol<-mget(probes,hgu95av2SYMBOL,ifnotfound=NA)
> symbol<-do.call(rbind,symbol)
> symbol[is.na(symbol[,1]),1]<-rownames(symbol)[is.na(symbol[,1])]
> rownames(data)<-symbol[probes,1] #给每行以基因名替换探针名命名,在绘制热图时直接显示基因名。
> heatmap(data,cexRow=0.5)

使用heatmap函数默认颜色生成的热图


这个图有三个部分,样品分枝树图和基因分枝树图,以及热图本身。之所以对样品进行聚类分析排序,是因为这次的样品本身并没有分组。如果有分组的话,那么可以关闭对样品的聚类分析。对基因进行聚类分析排序,主要是为了色块好看,其实可以选择不排序,或者使用GO聚类分析排序。上面的这种热图,方便简单,效果非常不错。

接下来我们假设样品是分好组的,那么我们想用不同的颜色来把样品组标记出来,那么我们可以使用ColSideColors参数来实现。同时,我们希望变更热图的渐变填充色,可以使用col参数来实现。
[......]

Read more

Tags: , ,

admin on 十月 21st, 2011

R当中的坐标中断一般都使用plotrix库中的axis.break(), gap.plot(), gap.barplot(), gap.boxplot()等几个函数来实现,例:

gap plot

> library(plotrix)
> opar<-par(mfrow=c(3,2))
> plot(sample(5:7,20,replace=T),main="Axis break test",ylim=c(2,8))
> axis.break(axis=2,breakpos=2.5,style="gap")
> axis.break(axis=2,breakpos=3.5,style="slash")
> axis.break(axis=2,breakpos=4.5,style="zigzag")
> twogrp<-c(rnorm(5)+4,rnorm(5)+20,rnorm(5)+5,rnorm(5)+22)
> gap.plot(twogrp,gap=c(8,16,25,35),
+  xlab="X values",ylab="Y values",xlim=c(1,30),ylim=c(0,25),
+  main="Test two gap plot with the lot",xtics=seq(0,30,by=5),
+  ytics=c(4,6,18,20,22,38,40,42),
+  lty=c(rep(1,10),rep(2,10)),
+  pch=c(rep(2,10),rep(3,10)),
+  col=c(rep(2,10),rep(3,10)),
+  type="b")
> gap.plot(21:30,rnorm(10)+40,gap=c(8,16,25,35),add=TRUE,
+   lty=rep(3,10),col=rep(4,10),type="l")
> gap.barplot(twogrp,gap=c(8,16),xlab="Index",ytics=c(3,6,17,20),
+   ylab="Group values",main="Barplot with gap")
> gap.barplot(twogrp,gap=c(8,16),xlab="Index",ytics=c(3,6,17,20),
+   ylab="Group values",horiz=TRUE,main="Horizontal barplot with gap")
> twovec<-list(vec1=c(rnorm(30),-6),vec2=c(sample(1:10,40,TRUE),20))
>  gap.boxplot(twovec,gap=list(top=c(12,18),bottom=c(-5,-3)),
+  main="Show outliers separately")
> gap.boxplot(twovec,gap=list(top=c(12,18),bottom=c(-5,-3)),range=0,
+  main="Include outliers in whiskers")
> par(opar)

从图像效果上来看,这样的坐标中断只能说实现了坐标中断,但效果上是非常一般的。甚至远不如excel, openoffice当中出图效果好。为此,我们需要对plotrix库中的gap.plot做出修改,以达到满意的效果。

[......]

Read more

Tags: , ,

admin on 十月 21st, 2011

在之前的一节当中,图型名称有些混乱,从这一节开始将做如下统一(不全面):

英文名称 中文名称
bar 条形图
line 线图
area 面积图
pie 饼图
high-low 高低图
pareto 帕累托图
control 控制图
boxplot 箱线图
error bar 误差条图
scatter 散点图
P-P P-P正态概率图
Q-Q Q-Q正态概率图
sequence 序列图
ROC Curve ROC分类效果曲线图
Time Series 时间序列图

好了,言归正传。那么什么又是点柱图(dot histogram)呢?之前我又称之为蜂群图(beeswarm)。还有称之为抖点图(jitter plots)。总之无论如何,在糗世界里我都称之为点柱图吧。

我们先看点柱图效果:

点柱图

以下是代码

> require(beeswarm)
> data(breast)
> head(breast)
            ER      ESR1     ERBB2 time_survival event_survival
100.CEL.gz neg  8.372133 13.085894            39              1
103.CEL.gz pos 10.559356  9.491683            97              0
104.CEL.gz pos 12.299905  9.599574            11              1
105.CEL.gz pos 10.776632  9.681747            99              0
106.CEL.gz pos 10.505124  9.436763            40              1
107.CEL.gz neg 10.377741  8.695576            94              0
> require(plotrix)
> cluster<-cluster.overplot(breast$event_survival, breast$time_survival)
> png("dothist.png",width=1000,height=1000)
> opar<-par(mfrow=c(3,3))
> plot (breast$event_survival, breast$time_survival, main="Multiple points on coordinate",col=as.numeric(breast$ER),xaxt="n",xlim=c(-1,2))
> axis(1,at=c(0,1),labels=c("Censored","Metastasis"))
> plot(jitter(breast$event_survival), breast$time_survival, main="Using Jitter on x-axis",col=as.numeric(breast$ER),xaxt="n",xlim=c(-0.5,1.5))
> axis(1,at=c(0,1),labels=c("Censored","Metastasis"))
> plot(jitter(breast$event_survival), jitter(breast$time_survival), main="Using Jitter on x and y-axis",col=as.numeric(breast$ER),xaxt="n",xlim=c(-0.5,1.5))
> axis(1,at=c(0,1),labels=c("Censored","Metastasis"))
> sunflowerplot(breast$event_survival, breast$time_survival, main="Using Sunflowers",xaxt="n",xlim=c(-0.5,1.5))
> axis(1,at=c(0,1),labels=c("Censored","Metastasis"))
> plot(cluster, main="Using cluster.overplot",col=as.numeric(breast$ER),xaxt="n",xlim=c(-0.5,1.5))
> axis(1,at=c(0,1),labels=c("Censored","Metastasis"))
> count.overplot(jitter(breast$event_survival), jitter(breast$time_survival), main="Using cout.overplot",col=as.numeric(breast$ER),xaxt="n")
> axis(1,at=c(0,1),labels=c("Censored","Metastasis"))
> sizeplot(breast$event_survival, breast$time_survival, main="Using sizeplot",col=as.numeric(breast$ER),xaxt="n",xlim=c(-0.5,1.5))
> axis(1,at=c(0,1),labels=c("Censored","Metastasis"))
> beeswarm(time_survival ~ event_survival, data = breast,
+          method = 'swarm',
+          pch = 16, pwcol = as.numeric(ER),
+          xlab = '', ylab = 'Follow-up time (months)',
+          labels = c('Censored', 'Metastasis'))
> dev.off()
quartz
     2
> par(opar)

以下是解释[......]

Read more

Tags: , ,

admin on 十月 20th, 2011

一,布局

R绘图所占的区域,被分成两大部分,一是外围边距,一是绘图区域。

外围边距可使用par()函数中的oma来进行设置。比如oma=c(4,3,2,1),就是指外围边距分别为下边距:4行,左边距3行,上边距2行,右边距1行。很明显这个设置顺序是从x轴开始顺时针方向。这里的行是指可以显示1行普通字体。所以当我们使用mtext中的line参数时,设置的大小就应该是[0,行数)的开区间。当我们使用mtext在外围边距上书写内容时,设置mtext中的outer=TRUE即可。

绘图区域可使用par()函数中的mfrow, mfcol来进行布局。mfrow和mfcol可以使用绘图区域被区分为多个区域。默认值为mfrow(1,1)。

比如mfrow(2,3)就是指将绘图区域分成2行3列,并按行的顺序依次绘图填充;
比如mfcol(3,2)就是指将绘图区域分成3行2列,并按列的顺序依次绘图填充;

我们将每一个细分的绘图区域分为两个部分,一是绘图边距,一是主绘图。

绘图边距需要容纳的内容有坐标轴,坐标轴标签,标题。通常来讲,我们都只需要一个x轴,一个y轴,所以在设置时,一般的下边距和左边距都会大一些。如果多个x轴或者y轴,才考虑将上边距或者右边距放大一些。绘图边距可以使用par()函数中mar来设置。比如mar=c(4,3,2,1),与外围边距的设置类似,是指绘图边距分别为下边距:4行,左边距3行,上边距2行,右边距1行。很明显这个设置顺序是从x轴开始顺时针方向。行的概念与之前的相同。也可以使用mai来设置。mai与mar唯一不同之处在于mai不是以行为单位,而是以inch为单位。

SOUTH< -1; WEST<-2; NORTH<-3; EAST<-4;
 
GenericFigure <- function(ID, size1, size2)
{
  plot(0:10, 0:10, type="n", xlab="X", ylab="Y")
  text(5,5, ID, col="red", cex=size1)
  box("plot", col="red")
  mtext(paste("cex",size2,sep=""), SOUTH, line=3, adj=1.0, cex=size2, col="blue")
  title(paste("title",ID,sep=""))
}
 
MultipleFigures <- function()
{
  GenericFigure("1", 3, 0.5)
  box("figure", lty="dotted", col="blue")
 
  GenericFigure("2", 3, 1)
  box("figure", lty="dotted", col="blue")
 
  GenericFigure("3", 3, 1.5)
  box("figure", lty="dotted", col="blue")
 
  GenericFigure("4", 3, 2)
  box("figure", lty="dotted", col="blue")
}
 
par(mfrow=c(2,2),mar=c(6,4,2,1),oma=c(4,3,2,1))
 
MultipleFigures()
 
box("inner", lty="dotted", col="green")
box("outer", lty="solid", col="green")
 
mtext("Outer Margin Area (oma) of South: 6", SOUTH, line=1, cex=1, outer=TRUE)
 
plotline<-function(n,direc){
  for(i in 0:n){
    mtext(paste("line",i,sep=""), direc, line=i, cex=1, col="black", adj=1, outer=TRUE)
  }
}
plotline(4,SOUTH)

R绘图布局

UTF8_E[......]

Read more

Tags: , ,

任务:有MoEx-1_0-st-v1芯片的注释文件,MoEx-1_0-st-v1.na31.mm9.probeset.csv。现在需要为其创建一个注释包,以方便注释及使用。

MoEx-1_0-st-v1.na31.mm9.probeset.csv可以从http://www.affymetrix.com/estore/browse/products.jsp?productId=131474&a[......]

Read more

Tags: , ,

admin on 九月 28th, 2011

在安装了graphviz之后,在R当中安装Rgraphviz。命令:

[ouj@qiuworld ~]$ sudo R CMD INSTALL Rgraphviz_1.30.1.tar.gz 
[sudo] password for ouj: 
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables... 
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for stdbool.h that conforms to C99... yes
checking for _Bool... yes
checking for whether compiler has bool... yes
configure: No --with-graphviz option was specified. Trying to find Graphviz using other methods.
checking for pkg-config... /usr/bin/pkg-config
Package libgvc was not found in the pkg-config search path.
Perhaps you should add the directory containing `libgvc.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libgvc' found
configure: pkg-config was not able to find the Graphviz library libgvc. This either indicates that Graphviz is old or that something is wrong. Verify Graphviz is installed and that PKG_CONFIG_PATH is correct.
checking for dotneato-config... no
configure: dotneato-config not found in PATH.
configure: Using default directory /usr/local, consider specifiying --with-graphviz
configure: Found Graphviz version '2.28.0'.
configure: Graphviz major version is '2' and minor version is '28'.
configure: Using the following compilation and linking flags for Rgraphviz
configure:    PKG_CPPFLAGS=-I/usr/local/include/graphviz
configure:    PKG_LIBS=-L/usr/local/lib/graphviz -L/usr/local/lib -lgvc
configure:    GVIZ_DEFS= -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28 
configure: Setting Graphviz Build version to '2.28.0'.
configure: creating ./config.status
config.status: creating R/graphviz_build_version.R
config.status: creating src/Makevars
** libs
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c LL_funcs.c -o LL_funcs.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c Rgraphviz.c -o Rgraphviz.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c RgraphvizInit.c -o RgraphvizInit.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c agopen.c -o agopen.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c agread.c -o agread.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c agwrite.c -o agwrite.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c bezier.c -o bezier.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c buildEdgeList.c -o buildEdgeList.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c buildNodeList.c -o buildNodeList.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c doLayout.c -o doLayout.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c graphvizVersion.c -o graphvizVersion.o
gcc -std=gnu99 -I/usr/local/lib64/R/include -I/usr/local/include/graphviz  -DHAVE_STDBOOL_H=1 -DHAVE_BOOL=1 -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=28  -I/usr/local/include    -fpic  -g -O2 -c init.c -o init.o
gcc -std=gnu99 -shared -L/usr/local/lib64 -o Rgraphviz.so LL_funcs.o Rgraphviz.o RgraphvizInit.o agopen.o agread.o agwrite.o bezier.o buildEdgeList.o buildNodeList.o doLayout.o graphvizVersion.o init.o -L/usr/local/lib/graphviz -L/usr/local/lib -lgvc
installing to /usr/local/lib64/R/library/Rgraphviz/libs
** R
** inst
** preparing package for lazy loading
Creating a new generic function for "head" in "Rgraphviz"
Creating a new generic function for "tail" in "Rgraphviz"
Creating a new generic function for "lines" in "Rgraphviz"
Creating a new generic function for "plot" in "Rgraphviz"
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded
Error : .onLoad failed in loadNamespace() for 'Rgraphviz', details:
  call: value[[3L]](cond)
  error: unable to load shared object '/usr/local/lib64/R/library/Rgraphviz/libs/Rgraphviz.so':
  libgvc.so.6: cannot open shared object file: No such file or directory
 
  Check that (1) graphviz is installed on your system; (2) the
  installed version of graphviz matches '2.28.0'; this is the version
  used to build this Rgraphviz package; (3) graphviz is accessible to
  R, e.g., the path to the graphviz 'bin' directory is in the system
  'PATH' variable.  See additional instructions in the 'README' file of
  the Rgraphviz 'source' distribution, available at
 
  http://bioconductor.org/packages/release/bioc/html/Rgraphviz.html
 
  Ask further questions on the Bioconductor mailing list
 
  http://bioconductor.org/docs/mailList.html
 
 
Error: loading failed
Execution halted
ERROR: loading failed

不知道如何解决。在网上搜索了两天之后,并在邮件组里分问,分析认为是动态链接库的问题。解压原代码安装包,测试:

[ouj@qiuworld ~]$ R CMD ldd /usr/local/lib/libgvc.so.6
	linux-vdso.so.1 =>  (0x00007fff173fc000)
	libxdot.so.4 => /usr/local/lib/libxdot.so.4 (0x00002b5c100e0000)
	libgraph.so.5 => /usr/local/lib/libgraph.so.5 (0x00002b5c102e4000)
	libcdt.so.5 => /usr/local/lib/libcdt.so.5 (0x00002b5c104f0000)
	libpathplan.so.4 => /usr/local/lib/libpathplan.so.4 (0x00002b5c106f5000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00002b5c10914000)
	libexpat.so.0 => /lib64/libexpat.so.0 (0x00002b5c10b18000)
	libz.so.1 => /usr/lib64/libz.so.1 (0x00002b5c10d3a000)
	libm.so.6 => /lib64/libm.so.6 (0x00002b5c10f4f000)
	libc.so.6 => /lib64/libc.so.6 (0x00002b5c111d2000)
	/lib64/ld-linux-x86-64.so.2 (0x000000349cc00000)
[ouj@qiuworld ~]$ sudo R CMD ldd Rgraphviz/src/Rgraphviz.so
[sudo] password for ouj: 
	linux-vdso.so.1 =>  (0x00007fff58b40000)
	libgvc.so.6 => not found
	libc.so.6 => /lib64/libc.so.6 (0x00002aacd51ae000)
	/lib64/ld-linux-x86-64.so.2 (0x000000349cc00000)

果然。可是没有办法使用export 为sudo 输出共亨链接库,只好先使用

[ouj@qiuworld ~]$ sudo LD_LIBRARY_PATH=/usr/lib:/usr/local/lib R CMD INSTALL Rgraphviz_1.30.1.tar.gz
...

先安装上。接下来的问题是如何将LD_LIBRARY_PATH=LD_LIBRARY_PATH:/usr/local/lib写给每个用户,并在每次登录的时候都可以正常使用。否则在使用Rgraphviz包时,都需要LD_LIBRARY_PATH=/usr/lib:/usr/local/lib R来启动R。
[......]

Read more

Tags: , , ,

首先安装xps和oligo。

安装oligo比较简单。直接

 source("http://www.bioconductor.org/biocLite.R")
 biocLite("oligo")

就可以了。

而安装xps可能麻烦一点,首先要安装ROOT,其主页为http://root.cern.ch/drupal/

而后是安装xps,为了确保安装效果,请使用:

 source("http://www.bioconductor.org/biocLite.R")
 biocLite("xps",type="source")

安装成功之后,就可以使用library(xps)调用了。

好了,我们先使用oligo来分析。[......]

Read more

Tags: ,

admin on 六月 10th, 2011

在一个图上画多个平滑曲线:

d1 <- cbind(rnorm(100), rnorm(100,3,1))
d2 <- cbind(rnorm(100), rnorm(100,1,1))
plot(d1[,1], d1[,2], xlim=range(c(d1[,1], d2[,1])), ylim=range(c(d1[,2], d2[,2])), col="blue", xlab="X", ylab="Y")
points(d2[,1], d2[,2], col="red")
points(loess.smooth(d1[,1], d1[,2]), type="l", col="blue")
points(loess.smooth(d2[,1], d2[,2]), type="l", col="red")

随机取样,缩小绘图时的样本量

# take a random sample of size 50 from a dataset mydata
# sample without replacement
mysample <- mydata[sample(1:nrow(mydata), 50,
   replace=FALSE),]

在一个图上画多个丰度曲线(柱状图)

d1 <- cbind(rnorm(100), rnorm(100,3,1))
d2 <- cbind(rnorm(100), rnorm(100,1,1))
d3 <- cbind(rnorm(100), rnorm(100,5,1))
ds1<-density(d1)
ds2<-density(d2)
ds3<-density(d3)
xr<-range(c(ds1$x,ds2$x,ds3$x))
yr<-range(c(ds1$y,ds2$y,ds3$y))
plot(ds1,xlim=xr,ylim=yr,col='black')
lines(ds2,col='blue')
lines(ds3,col='red')

逐行比较两个矩阵

d49ffe5f39dc7485c05a58f[......]

Read more

Tags: ,

在《》中,我们提到:

在alternative splicing的分析中,起决定性作用的,主要是核心探针组的注释和基因水平表达定量。如果这两个不一样,就会造成结果的巨大差错。

所以在综合应用当中,我们必须明确这两个问题。

  1. 使用什么样的CDF文件?是不是越全面越好?
  2. 如何确定基因水平表达量?

我们知道探针组被区分为:核心探针组(Core probesets,被RefSeq mRNAs支持的exon),扩展探针组(Extended probesets,被ESTs或者partial mRNAs支持的exons),以及全探针组(Full probesets,基于计算机预测的exons)。对于问题1,很明显,并不是使用越全面的cdf文件就越好,应该依照自己的要求来选用相应的cdf文件。一般的,在确定基因水平的表达定量时,我们使用core probesets就可以了;在确定exon水平的表达定量时,我们可以使用更为全面的cdf文件,比如extened probesets,甚至是full probesets。但是,并不是这些外显子探针都会进入最后的外显子替换的计算,而那些与core probesets有关联的才会被计入。要记住,对alternative splicing isoform的确认,最关键的还是看具有功能的基因,而不是仅依靠计算机预测。

对于如何确定基因水平表达量,大约的方法前文已经介绍过,最基本的思路还是将针对同一个基因的所有外显子的信号都综合起来考虑。前面讲过了iterPlier的方法。因为plier会考虑到所有的exon,而有些exon是不表达的,所以iterPlier为了防止那些不表达的exon拖低实际的基因表达的水平,就使用了两次分类重新计算plier的办法来去除那些实际上是做为背景存在的exon信号。而大多数其它的文章中对于基因水平表达定量的办法也多与之类似。因为对于同一基因的探针组的数目的多少会非常影响各种分类算法的结果,从而使用背景信号的辨识出现较大的错误,所以人们一般都会规定探针组数目的多少,如果探针数总数加在一起少于11,那多半会不进行背景信号识别,而直接综合所有的探针信号,给出平均。因此,我们需要有这样一个意识,如果探针组过少,而其最后alternative splicing events的得分很高的话,其实并不一定就是真实的,很 有可能是基因表达定量的问题。

接着,我们就依照下图的实验流程来实现一次完整的外显子分析的综合应用:

外显子替换剪切分析流程

外显子替换剪切分析流程


[......]

Read more

Tags: , , , ,