• 如何处理批次效应(batch effect

    千次阅读 2019-09-09 18:45:08
    1、如何处理批次效应(batch effect) https://www.plob.org/article/14410.html 2、基于多数据集分析ANLN在宫颈癌所起到的功能 https://www.omicsclass.com/article/769

    1、如何处理批次效应(batch effect)




  • 论文题目:Batch Effect Correction of RNA-seq Data through Sample 2 Distance Matrix Adjustment scholar 引用:0 页数:25 发表时间:June 2019 发表刊物:preprint 作者:Teng Feiand Tianwei Yu Emory ...

    论文题目:Batch Effect Correction of RNA-seq Data through Sample 2 Distance Matrix Adjustment

    scholar 引用:0


    发表时间:June 2019


    作者:Teng Fei and Tianwei Yu

    Emory University


    Batch effect is a frequent challenge in deep sequencing data analysis that can lead to misleading conclusions. We present scBatch, a numerical algorithm that conducts batch effect correction on the count matrix of RNA sequencing (RNA-seq) data. Different from traditional methods, scBatch starts with establishing an ideal correction of the sample distance matrix that effectively reflect the underlying biological subgroups, without considering the actual correction of the raw count matrix itself. It then seeks an optimal linear transformation of the count matrix to approximate the established sample pattern. The benefit of such an approach is the final result is not restricted by assumptions on the mechanism of the batch effect. As a result, the method yields good clustering and gene differential expression (DE) results. We compared the new method, scBatch, with leading batch effect removal methods ComBat and mnnCorrect on simulated data, real bulk RNA-seq data, and real single-cell RNA-seq data. The comparisons demonstrated that scBatch achieved better sample clustering and DE gene detection results.

    本文提出了一个新的解决batch effect的方法:scBatch,与业界较认可的两种方法:ComBat和mnnCorrect进行了比较,新方法取得了更好的结果。


    • the proposed method, scBatch, can obtain better clustering pattern, maintain crucial marker information and detect more DE genes. 这个方法,可以获得更好的聚类模式,维护重要的标记信息,检测更多的差异基因(差异基因检测的越多越好吗?)

    • The method assumes roughly balanced sample population among batches.这个方法有一个假设,各批次的样本数量大致平衡,有文献表明该假设是合理的。即便这个假设不那么准确,该方法也具有一定的鲁棒性。

    • 与Combat和MNN相比,scBatch需要更多的时间来获得最优结果。这个时间跟sample size也是有关系,非线性。即便sample size同样,计算时间跟batch effect的复杂程度有关。虽然这个时间长,但是在可接受的范围内。

    • 这个方法还有很多提升的空间。主要分为两个方面:1. 这个方法用的是simplest linear transformation of raw count matrix,但是其实可以尝试一些non-linear transformation的方法;2. 这个衡量距离用的是Pearson correlation matrix, 因为这个易于解释,便于进行梯度计算,但是其他的distance metric比如说Spearman correlation应该也可以尝试一下,也许可以带来新的视野。a more universal numerical gradient descent algorithm may be applied。


    • RNA-seq, a major tool for transcriptomics
    • the limitation of sequencing technology and sample preparations, technical variations exist among reads from different batches of experiments. batch effect 可能的原因
    • the severity of batch effects varies in different datasets 不同的数据集,批次效应的严重程序有所不同
    • The correction of the batch effects can yield better clustering results 

    • Johnson et al. (2007) proposed an empirical Bayes algorithm, ComBat  2007年提出的ComBat, 在2020年的review中还在讨论这个方法的效果最好。。。是不是技术有点更新的慢?which continued to be a successful method in RNA-seq data. 

    • Combat 主要的作用:to normalize the data by removing additive and multiplicative batch effects

    • Researchers also attempted to find and correct unknown batch effects by utilizing control genes in microarray (Gagnon-Bartsch and Speed, 2012) and RNA-seq data (Leek, 2014; Risso et al., 2014; Chen and Zhou, 2017). 也有很多研究者用control genes的方法去发现和减弱一些未知的批次效应。这些方法和Combat都是基于回归方法的.

    • 近年的很多方法提出的策略allow for more complex batch effect mechanisms

    • To achieve better clustering performance, Fei et al. (2018) developed a non-parametric approach, named QuantNorm, to correct sample distance matrix by quantile normalization;  2018年提出的QuantNorm, 用分位数归一化,但是后面新方法对比的时候却没有跟这个方法进行?QuantNorm不支持DE detection

    • Haghverdi et al. (2018) utilized the mutual nearest neighbor relationships among samples from different batches to establish the MNN correction scheme. 2018年提出来的MNN方法。不同批次的相互最近邻关系。

    • 这两个方法,作者认为在sample pattern detection方面目前已经获得了reasonable performances。比如说finding clusters或者conducting dimension reduction。

    • 但是最近的方法不是很关注DE detection。比如QuantNorm。MNN虽然有支持DE tests,但是作者不推荐用校正后的矩阵去做DE analysis。

    • 本文挑战了DE detection,提出一种新的方法,to utilize the corrected sample distance matrix to further correct the count matrix.

    • we seek a linear transformation to the count matrix, such that the Pearson correlation matrix of the transformed matrix approximates the corrected correlation matrix obtained from QuantNorm. 

    • we propose a random block coordinate descent algorithm to conduct linear transformation on the 𝑝 (genes) ×𝑛 (samples) count matrix.

    • Simulation studies demonstrate that in terms of DE gene detection, our method corrects the count matrix better compared to ComBat and MNN, with consistently higher area under the receiver operating characteristic curve (AUC) and area under the precision-recall curve (PRAUC).

    • In real data analyses, the proposed method also show strong performances in clustering and DE detection in a bulk RNA-seq dataset (Lin et al., 2014) and two scRNA-seq datasets (Usoskin et al., 2015; Xin et al., 2016). 经过了很多数据集的测试。主要就是这两个clustering and DE detection。 


    1. Introduction

    2. Results

    2.1 Batch effect correction based on corrected sample correlation matrix

    2.2 scBatch achieve better DE. detection in simulation

    2.3 scBatch obtained better sample patterns for bulk RNA-seq data

    2.4 scBatch shows strong performance in cell heterogeneity investigation

    2.5 Mouse neuron dataset GSE59739

    2.6 Human pancreas data GSE81608

    3. Discussion

    4. Methods

    4.1 Main algorithm

    4.2 Simulation design

    4.3 Datasets and preprocessing

    4.4 Analysis and performance evaluation scheme


    • 也不是说简单的看图,DE detection或者clustering效果好,还有指标的。
  • 论文题目:Deep learning enables accurate clustering and batch effect removal in single-cell RNA-seq analysis scholar 引用:0 页数:14 发表时间:2019.1.25 发表刊物:preprint 作者:Xiangjie Li1,2, ...

    论文题目:Deep learning enables accurate clustering and batch effect removal in single-cell RNA-seq analysis

    scholar 引用:0




    作者:Xiangjie Li1,2, Yafei Lyu1, Jihwan Park3, Jingxiao Zhang2, Dwight Stambolian4, Katalin Susztak3,5  Gang Hu1,5*, Mingyao Li1*

    University of Pennsylvania Perelman School of Medicine


    Single-cell RNA sequencing (scRNA-seq) can characterize cell types and states through unsupervised clustering, but the ever increasing number of cells imposes computational challenges. We present an unsupervised deep embedding algorithm for single-cell clustering (DESC) that iteratively learns cluster-specific gene expression signatures and cluster assignment. DESC significantly improves clustering accuracy across various datasets and is capable of removing complex batch effects while maintaining true biological variations.



    • An open-source implementation of the DESC algorithm can be downloaded from https://eleozzr.github.io/desc/.

    • ScRNA-seq clustering and batch effect removal are typically addressed through separate analyses. Commonly used approaches to remove batch effect include Seurat’s Canonical Correlation Analysis3 (CCA) or Mutual Nearest Neighbors (MNN) approach4. 在ScRNA-seq中常用的消除批次效应的方法:CCA和MNN

    • After removing batch effect, clustering analysis is performed to identify cell clusters using methods such as Louvain’s method5, Infomap6, graph-based clustering7, shared nearest neighbor8, or consensus clustering with SC39. 消除了批次效应以后用聚类方法

    • Since some cell types are more vulnerable to batch effect than others, batch effect removal should be performed jointly with clustering to achieve optimal performance. 批次效应有时候应该结合聚类方法来获取最佳效果

    • However, none of the existing methods are capable of simultaneously clustering cells and removing batch effect.目前,尚不存在这种方法

    • We developed DESC, an unsupervised deep learning algorithm that iteratively learns cluster-specific gene expression representation and cluster assignments for scRNA-seq data clustering (Fig. 1a). Using a deep neural network, DESC initializes clustering obtained from an autoencoder and learns a non-linear mapping function from the original scRNA-seq data space to a low-dimensional feature space by iteratively optimizing a clustering objective function. This iterative procedure moves each cell to its nearest cluster, balances biological and technical differences between clusters, and reduces the influence of batch effect. DESC also enables soft clustering by assigning cluster-specific probabilities to each cell, facilitating the clustering of cells with high-confidence. DESC的主要原理

    • We benchmarked DESC’s performance by analyzing the multi-tissue gene expression data in GTEx10. 评估算法性能的数据集,一个模拟数据集,(n=11,688)

    • adjusted rand index (ARI)

    • In summary, we have developed a deep learning algorithm that clusters scRNA-seq data by iteratively optimizing a clustering objective function with a self-training target distribution.

    • DESC’s memory usage and running time increase linearly with the number of cells, thus making it scalable to large datasets (Fig. 3e). DESC can further speed up computation by GPUs.

    • We analyzed a mouse brain dataset with 1.3 million cells generated by 10X, which only took about 3.5 hours with one NVIDIA TITAN Xp GPU (Supplementary Note 6).

    • Compared to existing scRNA-seq clustering methods DESC improves clustering by iteratively learning cluster-specific gene expression features from cells clustered with high confidence.

    • This iterative clustering also removes batch effect and maintains true biological differences between clusters.

    • As the growth of single-cell studies increases, DESC will be a more precise tool for clustering of large datasets.

  • 不同有机碳源的添加对厌氧-低氧序批式反应器处理实际污水的影响,郑雄,李洪静,本文研究了在厌氧-低氧(溶解氧浓度在0.15-0.45 mg/L)序批式反应器(SBR)中添加不同有机碳源(乙酸或剩余污泥碱性发酵液)对实际...
  • Batch Norm

    2018-12-08 14:13:33
    Batch Norm 本文总结自吴恩达深度学习系列视频:优化深层神经网络的Batch Norm部分,有所删减。 在tensorflow中,实现Batch Norm只需要一行代码: tf.nn.batch_normalization 下面我们就了解一下Batch Norm的...

    Batch Norm

    本文总结自吴恩达深度学习系列视频:优化深层神经网络的Batch Norm部分,有所删减。

    在tensorflow中,实现Batch Norm只需要一行代码:
    Bactch Normalization通过标准化让激活函数分布在线性区间,结果就是加大了梯度,让模型更大胆的进行梯度下降。

    下面我们就了解一下Batch Norm的基本原理和计算方法。

    Normalizing inputs to speed up learning


    那我们能不能对激活层计算出的激活值也应用Normalization呢? 答案是可以的。

    Implementing Batch Norm

    Given some intermediate values in NN Z ( 1 ) , . . . Z ( m ) Z^{(1)},...Z^{(m)} Z(1),...Z(m)

    μ = 1 m ∑ i ( Z i − μ ) 2 \mu=\frac{1}{m}\sum_i^(Z_i-\mu)^2 μ=m1i(Ziμ)2

    Z n o r m ( i ) = Z ( i ) − μ σ 2 + ϵ Z_{norm}^{(i)}=\frac{Z^{(i)-\mu}}{\sqrt{\sigma^2+\epsilon}} Znorm(i)=σ2+ϵ Z(i)μ

    Z ~ ( i ) = γ Z n o r m ( i ) + β \widetilde{Z}^{(i)}=\gamma Z_{norm}^{(i)}+\beta Z (i)=γZnorm(i)+β

    参与网络计算的时候,我们使用 Z ~ [ l ] ( i ) \widetilde{Z}^{[l](i)} Z [l](i)而不是 Z [ l ] ( i ) Z^{[l](i)} Z[l](i)

    如果 γ = σ 2 + ϵ \gamma=\sqrt{\sigma^2+\epsilon} γ=σ2+ϵ β = μ \beta=\mu β=μ,那么 Z ~ [ l ] ( i ) = Z [ l ] ( i ) \widetilde{Z}^{[l](i)}=Z^{[l](i)} Z [l](i)=Z[l](i),退化到没有对激活层使用normalization。

    所以这跟对输入层规范化的区别在于,我们不希望隐藏层被强制成mean 0和variance 1。我们使用两个参数 γ \gamma γ β \beta β来控制mean和variance,使得隐藏层有不同的计算分布,上述取值不会使用。



    因为我们减去了平均值 μ \mu μ,那么parameter b b b可以略去,因为不管 b b b取任何值都在Normalization过程中被减掉了。



    从别的角度理解Batch Normalization

    Covariate Shift

    x → y x\rightarrow y xy



    Batch Normd修正了convariate shift

    从网络第三层的角度来看,它之前的隐藏值随着时间不断发生变化,所以网络存在covariate shift的问题。

    Batch Norm所做的事情是,它减小了这些隐藏层变动的幅度。

    执行Batch Norm意味着,从本例第三层角度来看,它通过将前面的层的值使用两个参数 γ \gamma γ β \beta β限制在同一mean和variance,从而减小了这些值的偏移。这使得其之后层的学习变得更容易进行。

    Batch Norm与正则化


    • 每一个mini-batch都被计算出的mean/variance缩放了。
    • 这在计算 Z [ l ] Z^{[l]} Z[l]的值时,加入了一些噪音。仅从加入噪音这点而言,与dropout类似。
    • 但是其加入噪音的程度仅有一点点,很轻微,只能被视作是小小的副作用,不能认为也同时进行了regularization。所以Batch Norm和dropout会同时使用。

    Because by adding noise to the hidden units, it’s forcing the downstream hidden units not to rely on too much on any one hidden unit.
    And so similar to dropout, it adds noise to the hidden layers and therefore has a very slight regularization effect.
    Because the noise added is quite small, this is not a huge regularization effect and you might choose to use batch norm together with dropout if you want the more powerful regularization effect of dropout.

  • Batch Normalization

    千次阅读 2018-07-20 08:52:13
    BN应该放在非线性激活层的前面还是后面? Deep Learning for Computer Vision with Python:   ...Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate...
  • Update Batch

    千次阅读 2013-08-17 21:04:17
    standard batching, has no effect on pending statement batches that have not been processed Clearing the Batch To clear the current batch of operations instead of ...
  • Linux Batch

    2019-09-30 21:46:16
    The shift command reassigns the positional parameters, in effect shifting them to the left one notch. $1 $2, $2 $3, $3 $4, etc.  表 1. 一些常见的文件测试 操作符 特征 -d directory -e ...
  • Using Batch Parameters

    2009-12-05 10:47:00
    Using batch parameters You can use batch parameters anywhere within a batch file to extract information about your environment settings. Cmd.exe provides the batch parameter expansion variables %0 t...
  • ML~BatchNormalization

    2021-03-23 22:27:31
    另一方面,在SGD学习的过程中,每一格batch之间的分布若不同,网络需要不断适应当前batch的数据,降低了训练速度。 批标准化就是将隐层中的输出进行归一化的过程,可以提高训练速度。 在BN中,用mini-batch求取的...
  • Matrix Effect Using Notepad

    2014-06-23 23:27:37
    Matrix Effect Using Notepad 23:14 Posted by Nakib Momin  Labels: BATCH PROGRAMMING, INTERESTING TRICKS, NOTEPAD TRICKS, WINDOWS  This is amazing notepad trick which will ...
  • BatchNormalization的理解

    千次阅读 2019-07-18 15:00:23
    Batch Normalization作为最近一年来DL的重要成果,已经广泛被证明其有效性和重要性。虽然有些细节处理还解释不清其理论原因,但是实践证明好用才是真的好,别忘了DL从Hinton对深层网络做Pre-Train开始就是一个经验...
  • Batch Normalization论文翻译——中英文对照

    千次阅读 多人点赞 2017-09-28 15:59:10
    文章作者:Tyan 博客:noahsnail.com  |  CSDN  |  简书 ...声明:作者翻译论文仅为学习,如有侵权请联系作者删除博文,谢谢!...Batch Normalization: Acceleratin...
  • chapter10-batch effects

    2020-01-31 21:42:00
    chapter10-batch effects 1.Introduction to batch effects [Rmd] batch effects 产生的原因:measurements are affected by laboratory conditions, reagent lots 试剂批号, and personnel differences. 本章...
  • Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Abstract Training Deep Neural Networks is complicated by the fact that the distribution of each laye...
  • Cross-Iteration Batch Normalization 批量归一化的一个众所周知的问题是,在mini-batch的情况下,它的有效性大大降低。当一个mini-batch包含很少的例子时,在训练迭代中,无法可靠地估计归一化所依据的统计数据。...
  • 在看mask rcnn的代码中,我看到了网络结构中有个重写batch normalization的类: class BatchNorm(KL.BatchNormalization): """Extends the Keras BatchNormalization class to allow a central place to make ...
  • Batch Normalization 笔记

    2019-03-29 17:00:50
    then the gradient descent step may attempt to update the parameters in a way that requires the normalization to be updated, which reduces the effect of the gradient step. 但是,当这些modification被...
  • 一、Batch Effect 1.1 什么是批次效应(batch effect)? 批次效应是测量结果中的一部分,它们因为实验条件的不同而具有不同的表现形式,并且与我们研究的变量没有关系。 不同平台的数据,同一平台的不同时期的数据,...
  • 首先我们明确一下,insert吞吐量其实并不是指的IPS(insert per second),而是指的RPS(effect rows per second)。 其次我们再说一下batch insert,其实顾名思义,就是批量插入。这种优化思想是很基本的,MySQL中...
  • Spring Batch 批量处理策略

    千次阅读 2019-01-14 03:23:05
    这个类型的批量应用程序可以是正规转换工具模块中的一部分,也可以是整个的转换工具模块(请查看:基本的批量服务(Basic Batch Services))。 校验应用程序(Validation Applications) :校验应用程序能够保证...
  • 【深度学习】深入理解Batch Normalization批标准化 :https://www.cnblogs.com/guoyaohua/p/8724433.html 【深度学习】批归一化(Batch Normalization) :https://www.cnblogs.com/skyfsm/p/8453498.html 使用的...
  • Batch Normalization 导读

    千次阅读 2016-06-12 13:58:06
    机器学习领域有个很重要的假设:IID独立同分布假设,就是假设训练数据和测试数据是满足相同分布的... Batch Normalization also has a beneficial effect on the gradient flow through the network, by reducing 
  • Batch Normalization原文详细解读

    千次阅读 2019-07-03 15:25:05
    一部分是[3]中对于BN(Batch Normalization的缩写)的大致描述 一部分是原文[1]中的完整描述 、####################先说下书籍[3]############################################ Batch Normalization首次在[1]中提出,...
  • Batch Normalization:Accelerating Deep Network Traning by Reducing Internal Covariate Shift 论文地址 概括 BN减少了internal covariate shift,后者指代训练过程中数据经过每层后发生的分布变化,因为每层微小的...
  • Batch Norm 为什么奏效?(Why does Batch Norm work?) 为什么Batch归一化会起作用呢? 一个原因是,你已经看到如何归一化输入特征值x,使其均值为0,方差1. 它又是怎样加速学习的,有一些从0到1而不是从1到1000的...



1 2 3 4 5 ... 20
收藏数 7,402
精华内容 2,960