• ## rdkit

2020-12-09 14:04:11
<div><p>not sure how to rebase yet, so just cherry-picked from another branch (rdkit).</p><p>该提问来源于开源项目：ParmEd/ParmEd</p></div>
• RDKit RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python. BSD license - a business friendly license for open source Core data structures and ...
• 基于RDKit绘制化学反应 导入库 from rdkit import RDConfig import unittest import random from rdkit import Chem from rdkit.Chem import Draw, AllChem from rdkit.Chem.Draw import rdMolDraw2D from rdkit ...
基于RDKit绘制化学反应

导入库

from rdkit import RDConfig
import unittest
import random
from rdkit import Chem
from rdkit.Chem import Draw, AllChem
from rdkit.Chem.Draw import rdMolDraw2D
from rdkit import Geometry
%matplotlib inline
from numpy.polynomial.polynomial import polyfit
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib
from IPython.display import SVG, display
import seaborn as sns; sns.set(color_codes=True)

定义反应

rxn = AllChem.ReactionFromSmarts('[CH3:1][C:2](=[O:3])[OH:4].[CH3:5][NH2:6]>CC(O)C.[Pt]>[CH3:1][C:2](=[O:3])[NH:6][CH3:5].[OH2:4]',useSmiles=True)
d = Draw.MolDraw2DSVG(900, 300)
d.DrawReaction(rxn)
d.FinishDrawing()

绘制反应

定义绘制反应并高亮

rxn = AllChem.ReactionFromSmarts('[CH3:1][C:2](=[O:3])[OH:4].[CH3:5][NH2:6]>CC(O)C.[Pt]>[CH3:1][C:2](=[O:3])[NH:6][CH3:5].[OH2:4]',useSmiles=True)
colors=[(0.3,0.7,0.9),(0.9,0.7,0.9),(0.6,0.9,0.3),(0.9,0.9,0.1)]
d = Draw.MolDraw2DSVG(900, 300)
d.DrawReaction(rxn,highlightByReactant=True,highlightColorsReactants=colors)
d.FinishDrawing()

svg = d.GetDrawingText()
svg2 = svg.replace('svg:','')
svg3 = SVG(svg2)
display(svg3)

参考

1.https://www.rdkit.org/docs/source/rdkit.Chem.rdChemReactions.html

2. https://nodepit.com/node/org.rdkit.knime.nodes.onecomponentreaction2.RDKitOneComponentReactionNodeFactory

3. https://www.kesci.com/home/project/5c7685191ce0af002b556cc5


展开全文
• <div><p>My article uses RDKit, how do I add references?</p><p>该提问来源于开源项目：rdkit/rdkit</p></div>
• s a bunch of cool stuff in rdkit which isn't even close to being in MDA (and vice versa) so rather than reinvent wheels, it would be cool to do: <pre><code>python import MDAnalysis as mda from ...
• Mol对象的组成 由于Mol对象是分子，因此它们自然是由原子组成的。分子是通过原子间键的形成而形成的。 导入库 ...from rdkit import rdBase, Chem from rdkit.Chem import AllChem, Draw ...rdkit versio...
Mol对象的组成

由于Mol对象是分子，因此它们自然是由原子组成的。分子是通过原子间键的形成而形成的。

导入库

from rdkit import rdBase, Chem
from rdkit.Chem import AllChem, Draw
print('rdkit version: {}'.format(rdBase.rdkitVersion))

rdkit version: 2019.09.3

载入数据

suppl = Chem.SDMolSupplier('sdf_20200114172835.sdf')
mols = [x for x in suppl if x is not None]
len(mols)

Atom对象

Mol.GetAtoms()Mol.GetAtomWithIdx(idx)

for mol in mols[:5]:
print(mol.GetProp('PRODUCT_NAME'))
for a in mol.GetAtoms():
if a.GetSymbol() == 'C' and str(a.GetHybridization()) == 'SP3':
print('index for sp3 carbon: {}'.format(a.GetIdx()))
print('###')

Methyl Acetylsalicylate
index for sp3 carbon: 9
index for sp3 carbon: 12
###
Methyl o-Anisate
index for sp3 carbon: 9
index for sp3 carbon: 11
###
Dimethyl 4-Acetoxyisophthalate
index for sp3 carbon: 12
index for sp3 carbon: 13
index for sp3 carbon: 16
###
Ethyl Acetylsalicylate
index for sp3 carbon: 10
index for sp3 carbon: 11
index for sp3 carbon: 13
###
(+)-Bicuculline
index for sp3 carbon: 2
index for sp3 carbon: 4
index for sp3 carbon: 5
index for sp3 carbon: 8
index for sp3 carbon: 10
index for sp3 carbon: 16
index for sp3 carbon: 25
###

Bond对象

Mol.GetBonds()
Mol.GetBondWithIdx(idx)
Mol.GetBondBetweenAtoms()

for mol in mols[-3:]:
print(mol.GetProp('PRODUCT_NAME'))
for b in mol.GetBonds():
if b.GetIsAromatic():
print('bond between {}-{} is aromatic.'.
format(b.GetBeginAtomIdx(),b.GetEndAtomIdx()))
print('###')

2-Ethylhexyl Salicylate
bond between 3-4 is aromatic.
bond between 3-8 is aromatic.
bond between 4-5 is aromatic.
bond between 5-6 is aromatic.
bond between 6-7 is aromatic.
bond between 7-8 is aromatic.
###
Propyl Salicylate
bond between 3-4 is aromatic.
bond between 3-8 is aromatic.
bond between 4-5 is aromatic.
bond between 5-6 is aromatic.
bond between 6-7 is aromatic.
bond between 7-8 is aromatic.
###
3,3,5-Trimethylcyclohexyl Salicylate (cis- and trans- mixture)
bond between 0-1 is aromatic.
bond between 1-2 is aromatic.
bond between 2-3 is aromatic.
bond between 3-4 is aromatic.
bond between 4-5 is aromatic.
bond between 0-5 is aromatic.
###

属性

GetProp(name)SetProp(name, value)GetPropNames()

可以将称为属性的任意值添加到Mol对象。使用SetProp进行设置，并使用GetProp作为参考。如果从头开始在原始SDF中设置属性，则在创建对象的同时设置它们的值。

for p in mols[0].GetPropNames():
print('{}: {}'.format(p, mols[0].GetProp(p)))

PRODUCT_NUMBER: A0114
PRODUCT_NAME: Methyl Acetylsalicylate
MOLECULAR_FORMULA: C10H10O4
MOLECULAR_WEIGHT: 194.19
CAS_NUMBER: 580-02-9
MDL_NUMBER: MFCD00014978

分子的2D结构图

1) 一个分子绘制
a. Chem.Draw.MolToImage(mol)
b. Chem.Draw.MolToFile(mol, file_name)

2) 多个分子的绘制
a. Chem.Draw.MolsToImage(mols)
b. Chem.Draw.MolsToGridImage(mols, molsPerRow, subImgSize, legends)

处理的所有分子都具有原子坐标，但是，如果未初始化它们，则需要使用AllChem.Compute2DCoordinate计算。

Draw.MolToImage(mols[0])

Draw.MolsToGridImage(mols[:9], molsPerRow=3, subImgSize=(300,200),
legends=[x.GetProp('PRODUCT_NUMBER') for x in mols[:9]])

基于模板结构并行化结构

由于化合物的排列在网格图像中不匹配，因此一眼就很难理解类似的结构。因此，让我们以水杨酸为模板旋转分子。在Chem.AllChem中使用GenerateDepictionMatching2DStructure。所要做的就是为模板创建一个Mol对象并计算2D坐标。

import pubchempy as pcp
tmp = pcp.get_compounds('salicylic acid', 'name')
tmp = tmp[0]
tmp_smiles = tmp.canonical_smiles
template = Chem.MolFromSmiles(tmp_smiles)
AllChem.Compute2DCoords(template)

for mol in mols:
if mol.HasSubstructMatch(template):
AllChem.GenerateDepictionMatching2DStructure(mol, template)

Draw.MolsToGridImage(mols[:9], molsPerRow=3, subImgSize=(300,200),
legends=[x.GetProp('PRODUCT_NUMBER') for x in mols[:9]])

亚结构的处理

aspirin = pcp.get_compounds('aspirin', 'name')
aspirin = aspirin[0]
aspirin_sm = aspirin.canonical_smiles
aspirin_mol = Chem.MolFromSmiles(aspirin_sm)
AllChem.Compute2DCoords(aspirin_mol)

match = []
for mol in mols:
if mol.HasSubstructMatch(aspirin_mol):
match.append(mol)
print(len(match)) # 6
Draw.MolsToGridImage(match, molsPerRow=3, subImgSize=(300,200),
legends=[x.GetProp('PRODUCT_NAME') for x in match])

参考

1. http://www.rdkit.org/docs/index.html

2. http://www.rdkit.org/docs/api-docs.html

DrugAI

展开全文
• 基于Python和RDKit对化合物数据进行预处理。 环境 MolVS是专门用于化合物预处理的库。 rdkit 2020.03 molvs 0.1.1 化合物（Compound）预处理 RDKit：SanitizeMol Kekure的形成，化合价的确认，芳香性的...
基于Python和RDKit对化合物数据进行预处理。

环境

MolVS是专门用于化合物预处理的库。

rdkit 2020.03
molvs 0.1.1

化合物（Compound）预处理

RDKit：SanitizeMol

Kekure的形成，化合价的确认，芳香性的设定，结合等。

参考：http://rdkit.org/docs/source/rdkit.Chem.rdmolops.html

MolVS : Normarize

进行了一系列转换，以修复常见错误并标准化特征组。

from rdkit import Chem
from molvs.normalize import Normalizer, Normalization

old_smiles = "[Na]OC(=O)c1ccc(C[S+2]([O-])([O-]))cc1"
print("PREV:" + old_smiles)
old_mol = Chem.MolFromSmiles(old_smiles)
normalizer = Normalizer(normalizations=[Normalization('Sulfone to S(=O)(=O)', '[S+2:1]([O-:2])([O-:3])>>[S+0:1](=[O-0:2])(=[O-0:3])')])
new_mol = normalizer.normalize(old_mol)
new_smiles = Chem.MolToSmiles(new_mol)
print("NEW:" + new_smiles)

以上，选择性地执行在“S（＝ O）（＝ O）”中定义的归一化处理。
结果如下，硫原子和氧原子的电荷发生了变化。
如果生成不带参数的规范化器，则将执行MolVS中预先定义的所有规范化过程。

PREV:[Na]OC(=O)c1ccc(C[S+2]([O-])([O-]))cc1
NEW:O=C(O[Na])c1ccc(C[S](=O)=O)cc1

MolVS : TautomerCanonicalizer

互变异构体似乎是一组通过氢原子的移动而易于彼此交换的分子。

from rdkit import Chem
from molvs.tautomer import TAUTOMER_TRANSFORMS, TAUTOMER_SCORES, MAX_TAUTOMERS, TautomerCanonicalizer, TautomerEnumerator, TautomerTransform


tautomerCanonicalizer = TautomerCanonicalizer((
TautomerTransform('1,7 aromatic heteroatom H shift r', '[#7,S,O,Se,Te,CX4;!H0]-[#6,#7X2]=[#6]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[NX2,S,O,Se,Te]'),
))

mol = Chem.MolFromSmiles("O=C1CC=CC=C1")
print("prev:" + Chem.MolToSmiles(mol))
mol2 = tautomerCanonicalizer.canonicalize(mol)
print("after: "+ Chem.MolToSmiles(mol2))

prev:O=C1C=CC=CC1
after: Oc1ccccc1

MolVS : LargestFragmentChooser

当包含多个分子时，它返回最大的分子。

from rdkit import Chem
from molvs.fragment import LargestFragmentChooser

flagmentChooser1 = LargestFragmentChooser()
old_smiles = "O=S(=O)(Cc1[nH]c(-c2ccc(Cl)s2)c[s+]1)c1cccs1.[Br-]"
print("prev:" + old_smiles)
mol = Chem.MolFromSmiles(old_smiles)
mol2 = flagmentChooser1(mol)
print("after:" + Chem.MolToSmiles(mol2))

prev:O=S(=O)(Cc1[nH]c(-c2ccc(Cl)s2)c[s+]1)c1cccs1.[Br-]
after:O=S(=O)(Cc1[nH]c(-c2ccc(Cl)s2)c[s+]1)c1cccs1

MolVS : Uncharger

试图中和分子上的离子化酸和碱。

from molvs.charge import Reionizer, Uncharger

uncharger = Uncharger()
mol = Chem.MolFromSmiles("c1cccc[nH+]1")
print("prev:" + Chem.MolToSmiles(mol))
mol2 = uncharger(mol)
print("after:" + Chem.MolToSmiles(mol2))

prev:c1cc[nH+]cc1
after:c1ccncc1


展开全文
• 不夸张，我用过最香的rdkit安装 引用：https://anaconda.org/rdkit/rdkit

不夸张，我用过最香的rdkit安装
引用：https://anaconda.org/rdkit/rdkit


展开全文
• RDKit 是用C ++和Python编写的化学信息学和机器学习软件的集合。 -开源的商业友好许可证 C ++中的核心数据结构和算法 使用Boost.Python生成的 用SWIG生成的Java和C＃包装器 2D和3D分子操作 用于机器学习的和生成 ...
• import rdkit import pandas as pd from rdkit import Chem from rdkit.Chem import Descriptors from rdkit.ML.Descriptors import MoleculeDescriptors path=‘POSCAR-2’ mols=[] files= os.listdir(path) for ...
• <p>Failure: ImportError (No module named rdkit) ... ERROR Failure: ImportError (No module named rdkit) ... ERROR Failure: ImportError (No module named rdkit) ... ERROR Failure: ImportError (No module ...
• pip\conda安装RDKit 一直报错，尝试以下命令 conda install -c conda-forge rdkit
• 基于RDKit的SMILES String转canonical SMILESString 导入库 from rdkit import Chem from rdkit.Chem import Draw from rdkit.Chem.Draw import IPythonConsole SMILES转RDKit的Mol对象 testsmi = '[H][C@@]12...
• from rdkit import RDConfig from rdkit.Chem import AllChem from rdkit import Chem fdef = AllChem.BuildFeatureFactory(os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')) print(fd...
• 环境 系统：Windows 10 （x64） Python： Python3.7 RDKit：2019.09.3 基于RDKit的随机smiles生成 ...from rdkit import Chem from rdkit.Chem import Draw from rdkit.Chem.Draw import IPythonCon...
• from rdkit import rdBase, Chem from rdkit.Chem import AllChem, Draw print('rdkit version: {}'.format(rdBase.rdkitVersion)) 载入数据 suppl = Chem.SDMolSupplier('sdf_20191011152835.sdf') m...
• USRCAT USRCAT是基于形状的方法，它的工作速度非常快。代码是免费提供的，如果要使用代码，用户需要安装它。 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3505738/ RDKit代码： ...from rdkit import Chem...
• ## rdkit入门

千次阅读 2019-06-26 20:31:40
Python中的RDkit包，是将化学与机器学习联系起来的、非常实用的库。可以在很多种化学文件如mol2，mol，Smiles，sdf等之间互相转化，并能将其展示成2D、3D等形式供开发人员使用。 1.生成描述：2D分子 from rdkit...
• 化合物的预处理 ...from rdkit import Chem from rdkit.Chem import MolStandardize 载入数据 smis = ("c1cccc[nH+]1", "C[N+](C)(C)C", "c1ccccc1[NH3+]", "CC(=O)[O-]", "c1ccccc1[O-]", "CCS...
• 基于RDKit绘制黑白颜色的分子 导入库 from rdkit import Chem from rdkit.Chem import Draw from rdkit.Chem.Draw import IPythonConsole import rdkit rdkit.__version__ 2020.03.1 载入数据 ms = ...
• RDKit中有多个绘制引擎，通过使用不同的方法绘制的结构在外观上有所不同。这次将深入研究RDKit的结构图，并说明SVG格式的绘制方法，该方法自2015.03更新起可用。可能有很多细节，但是了解幕后发生的事情通常会很有...
• from rdkit import Chem, DataStructs, RDConfig from rdkit.Chem import AllChem from rdkit.Chem.Pharm2D import Gobbi_Pharm2D, Generate 载入数据，产生3D结构 mol = Chem.MolFromSmiles( '...
• from rdkit import Chem from rdkit.Chem import AllChem from rdkit.Chem import rdDistGeom as molDG mol = Chem.MolFromSmiles('CCC') bm = molDG.GetMoleculeBoundsMatrix(mol) bm Out[]: array([[0. ...
• from rdkit import Chem from rdkit.Chem import Draw from rdkit.Chem.Draw import IPythonConsole IPythonConsole.ipython_useSVG = True 载入数据 peptide_smiles = Chem.MolToSmiles(Chem.MolFromFASTA(...
• import os import numpy as np import igraph from py2cytoscape import util from cyjupyter import Cytoscape from rdkit import Chem from rdkit.Chem import DataStructs from rdkit.Chem import AllChem fr...
• RDKit2020.09.1 Python=3.7.9 RDKit操纵分子结构 导入库 import pandas as pd import numpy as np from rdkit import Chem from rdkit.Chem import AllChem, Draw Mol对象和SMILES之间转换 mol = Chem....
• 从分子中提取3D药效团特征 导入库 import os from rdkit import Geometry from rdkit import RDConfig from rdkit.Chem import AllChem from rdkit.Chem import ...from rdkit.Chem.Pharm3D import Pharmacophore...
• rdkit_nim：用于C ++化学格式工具RDKit的Nim绑定
• RDKit（2019.09）新增相似性图函数 导入库 from rdkit import Chem from rdkit.Chem import Draw from rdkit.Chem.Draw import SimilarityMaps from IPython.display import SVG import numpy as np import rdkit...

...