文章预览
在前面的笔记里面: 有一些错误在图片上面显示不出来 ,我们提到了一个数据挖掘文章把tcga数据库的肝癌转录组测序数据集里面的差异分析弄反了,方法学描述是:using the package DEG- seq2, Adj. p value < 0.05 and |logFC| > 2 were regarded as the cut-off criteria. This identified 2162 genes met the standards 实际上我们很容易去复现,但是很难拿到同样的差异情况。 首先看看表达量矩阵 # 魔幻操作,一键清空 rm(list = ls()) options(stringsAsFactors = F ) library (data.table) a1=fread( 'input/TCGA-LIHC.htseq_counts.tsv.gz' , data.table = F ) dim(a1) a1[ 1 : 4 , 1 : 4 ] a1[(nrow(a1)- 5 ):nrow(a1), 1 : 4 ] dim(a1) # all data is then log2(x+1) transformed. #length(unique(a1$AccID)) #length(unique(a1$GeneName)) mat= a1[, 2 :ncol(a1)] mat[ 1 : 4 , 1 : 4 ] mat=mat[ 1 :(nrow(a1)- 4 ),] mat=ceiling( 2 ^(mat)- 1 ) #log2(x+1) transformed. mat[ 1 : 4 ,
………………………………