Manual

openabc.forcefields

class CGModel

The general class with general methods that can be inherited by any other CG model classes.

Each child class of CGModel has to set self.bonded_attr_names, which is used to append molecules and set rigid bodies.

add_plumed(plumed_script_path)

Add OpenMM plumed.

Parameters

plumed_script_pathstr

Input plumed script path.

Reference

PLUMED website: https://www.plumed.org/

add_reporters(report_interval, output_dcd='output.dcd', report_dcd=True, report_state_data=True)

Add reporters for OpenMM simulation.

Parameters

report_intervalint

Report dcd and report state interval.

output_dcdstr

Output dcd file path.

report_dcdbool

Whether to output dcd file.

report_state_databool

Whether to output simulation state and data.

append_mol(new_mol, verbose=False)

The method can append new molecules by concatenating atoms and bonded interaction information saved in dataframes.

Parameters

new_mola consistent parser object or CG model object

The object of a new molecule including atom and bonded interaction information.

verbosebool

Whether to report the appended attributes.

atoms_to_pdb(cg_pdb, reset_serial=True)

Save atoms to pdb as topology will be read from the pdb file. PDB file is required for OpenMM simulation.

Parameters

cg_pdbstr

Output path for CG PDB file.

reset_serialbool

Reset serial to 1, 2, …

create_system(top, use_pbc=True, box_a=500, box_b=500, box_c=500, remove_cmmotion=True)

Create OpenMM system for simulation. Need to further add forces to this OpenMM system.

Parameters

topOpenMM Topology

The OpenMM topology.

use_pbcbool

Whether to use periodic boundary condition (PBC).

box_afloat or int

Box length along x-axis.

box_bfloat or int

Box length along y-axis.

box_cfloat or int

Box length along z-axis.

remove_cmmotionbool

Whether to add CMMotionRemover to remove center of mass motions.

move_COM_to_box_center()

Move center of mass (COM) to simulation box center.

save_state(state_xml='state.xml')

Save simulation state to readable xml format.

Parameters

state_xmlstr

Output state xml file path.

save_system(system_xml='system.xml')

Save system to readable xml format.

Parameters

system_xmlstr

Output system xml file path.

set_rigid_bodies(rigid_coord, rigid_bodies, keep_unchanged=[])

Set rigid bodies and remove bonded interactions within the same rigid body.

Directly run createRigidBodies if the user does not want to change any bonded interactions or exclusions.

Parameters

rigid_coordQuantity, shape = (n_atoms, 3)

Atom coordinates to build rigid body restrictions.

rigid_bodieslist

A list includes many sublists. Atoms in each sublist are recognized as one rigid body.

keep_unchangedlist

A list including attribute names that user intends to keep unchanged.

set_simulation(integrator, platform_name='CPU', properties={'Precision': 'mixed'}, init_coord=None)

Set OpenMM simulation.

Parameters

integratorOpenMM Integrator

OpenMM integrator.

platform_namestr

OpenMM simulation platform name. It can be one from: Reference or CPU or CUDA or OpenCL.

propertiesdict

OpenMM simulation platform properties.

init_coordNone or array-like

Initial coordinate.

class MOFFMRGModel

Bases: CGModel

A class for MOFF+MRG model that represents a mixture of MOFF proteins and MRG DNA.

add_all_default_forces()

Add all the forces with default settings.

add_contacts(eta=Quantity(value=0.7, unit=/angstrom), r0=Quantity(value=8, unit=angstrom), cutoff=Quantity(value=2.0, unit=nanometer), alpha_protein_dna=0.0016264, alpha_dna_dna=1.678e-05, epsilon_protein_dna=0, epsilon_dna_dna=0, force_group=8)

Add nonbonded contacts for MOFF protein and MRG DNA.

For amino acids, the CA atom type indices are 0-19, and CG nucleotide atom type index is 20.

The potential expression is: alpha/r^12 - 0.5*epsilon*(1 + tanh(eta*(r0 - r)))

Parameters

etaQuantity

Parameter eta.

r0Quantity

Parameter r0.

cutoffQuantity

Cutoff distance

alpha_protein_dnafloat or int

Protein-DNA interaction parameter alpha.

alpha_dna_dnafloat or int

DNA-DNA interaction parameter alpha.

epsilon_protein_dnafloat or int

Protein-DNA interaction parameter epsilon.

epsilon_dna_dnafloat or int

DNA-DNA interaction parameter epsilon.

force_groupint

Force group.

add_dna_angles(force_group=6)

Add DNA angles.

Parameters

force_groupint

Force group.

add_dna_bonds(force_group=5)

Add DNA bonds.

Parameters

force_groupint

Force group.

add_dna_fan_bonds(force_group=7)

Add DNA fan bonds.

Parameters

force_groupint

Force group.

add_elec_switch(salt_conc=Quantity(value=150.0, unit=millimolar), temperature=Quantity(value=300.0, unit=kelvin), cutoff1=Quantity(value=1.2, unit=nanometer), cutoff2=Quantity(value=1.5, unit=nanometer), switch_coeff=[1, 0, 0, -10, 15, -6], add_native_pair_elec=True, force_group=9)

Add electrostatic interaction with switch function.

The switch function switches potential to zero within range cutoff1 < r <= cutoff2.

Parameters

salt_concQuantity

Monovalent salt concentration.

temperatureQuantity

Temperature.

cutoff1Quantity

Cutoff distance 1.

cutoff2Quantity

Cutoff distance 2.

switch_coeffsequence-like

Switch function coefficients.

add_native_pair_elecbool

Whether to add electrostatic interactions between native pairs.

force_groupint

Force group.

add_native_pairs(force_group=4)

Add native pairs.

Parameters

force_groupint

Force group.

add_protein_angles(threshold=2.2689280275926285, clip=False, force_group=2, verbose=True)

Add protein angles.

Note that the angle potential is a harmonic angle potential, which may lead to unstable simulation if harmonic potential center theta0 is too large.

Based on some tests, theta0 <= 130 degrees (130*np.pi/180 radians) can facilitate 10 fs timestep.

Parameters

thresholdfloat or int

The suggested upper limit value of theta0.

clipbool

Whether to change theta0 values larger than threshold to threshold. If True, all the theta0 values larger than threshold will be changed to threshold. If False, do not change theta0 values.

force_groupint

Force group.

verbose: bool

Whether to show warnings if theta0 is larger than the threshold.

add_protein_bonds(force_group=1)

Add protein bonds.

Parameters

force_groupint

Force group.

add_protein_dihedrals(force_group=3)

Add protein dihedrals.

Parameters

force_groupint

Force group.

class HPSModel

Bases: CGModel

The class for HPS model that represents a mixture of HPS model proteins.

This class inherits CGModel class.

add_all_default_forces()

Add all the forces with default settings.

add_contacts(hydropathy_scale='Urry', epsilon=0.8368000000000001, mu=1, delta=0.08, force_group=2)

Add nonbonded contacts.

The raw hydropathy scale is scaled and shifted by: mu*lambda - delta

Parameters

hydropathy_scalestr

Hydropathy scale, can be KR or Urry.

epsilonfloat or int

Contact strength.

mufloat or int

Hydropathy scale factor.

deltafloat or int

Hydropathy shift factor.

force_groupint

Force group.

add_dh_elec(ldby=Quantity(value=1, unit=nanometer), dielectric_water=80.0, cutoff=Quantity(value=3.5, unit=nanometer), force_group=3)

Add Debye-Huckel electrostatic interactions.

Parameters

ldbyQuantity

Debye length.

dielectric_waterfloat or int

Dielectric constant of water.

cutoffQuantity

Cutoff distance.

force_groupint

Force group.

add_protein_bonds(force_group=1)

Add protein bonds.

Parameters

force_groupint

Force group.

class MpipiModel

Bases: CGModel

The class for Mpipi protein and RNA models. This class inherits CGModel class. We follow the parameters provided in the Mpipi LAMMPS input file.

add_contacts(cutoff_to_sigma_ratio=3, force_group=3)

Add nonbonded contacts, which is of Wang-Frenkel functional form.

Parameters

add_dh_elec(ldby=Quantity(value=0.7936507936507936, unit=nanometer), dielectric_water=80.0, cutoff=Quantity(value=3.5, unit=nanometer), force_group=4)

Add Debye-Huckel electrostatic interactions.

Parameters

ldbyQuantity

Debye length.

dielectric_waterfloat or int

Dielectric constant of water.

cutoffQuantity

Cutoff distance.

force_groupint

Force group.

add_protein_bonds(force_group=1)

Add protein bonds.

Parameters

force_groupint

Force group.

add_rna_bonds(force_group=2)

Add RNA bonds.

Parameters

force_groupint

Force group.

openabc.forcefields.parsers

class MOFFParser(atomistic_pdb, ca_pdb, default_parse=True)

MOFF protein parser.

classmethod from_atomistic_pdb(atomistic_pdb, ca_pdb, write_TER=False, default_parse=True)

Initialize a MOFF model protein from atomistic pdb.

Parameters

atomistic_pdbstr

Atomistic pdb file path.

ca_pdbstr

CA pdb file path.

write_TERbool

Whether to write TER between two chains.

default_parsebool

Whether to parse with default settings.

Returns

resultclass instance

A class instance.

parse_exclusions(exclude12=True, exclude13=True, exclude14=True, exclude_native_pairs=True)

Parse nonbonded exclusions based on bonds, angles, dihedrals, and native pairs.

Parameters

exclude12bool

Whether to exclude nonbonded interactions between 1-2 atom pairs.

exclude13bool

Whether to exclude nonbonded interactions between 1-3 atom pairs.

exclude14bool

Whether to exclude nonbonded interactions between 1-4 atom pairs.

exclude_native_pairsbool

Whether to exclude nonbonded interactions between native pairs.

parse_mol(get_native_pairs=True, frame=0, radius=0.1, bonded_radius=0.05, cutoff=0.6, box=None, use_pbc=False, exclude12=True, exclude13=True, exclude14=True, exclude_native_pairs=True, mass_dict={'ALA': 71.08, 'ARG': 156.2, 'ASN': 114.1, 'ASP': 115.1, 'CYS': 103.1, 'GLN': 128.1, 'GLU': 129.1, 'GLY': 57.05, 'HIS': 137.1, 'ILE': 113.2, 'LEU': 113.2, 'LYS': 128.2, 'MET': 131.2, 'PHE': 147.2, 'PRO': 97.12, 'SER': 87.08, 'THR': 101.1, 'TRP': 186.2, 'TYR': 163.2, 'VAL': 99.07}, charge_dict={'ALA': 0.0, 'ARG': 1.0, 'ASN': 0.0, 'ASP': -1.0, 'CYS': 0.0, 'GLN': 0.0, 'GLU': -1.0, 'GLY': 0.0, 'HIS': 0.25, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 1.0, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'VAL': 0.0}, ss=None, ordered_ss_names=['H', 'E'])

Parse molecule.

Parameters

get_native_pairsbool

Whether to get native pairs from atomistic pdb with shadow algorithm.

frameint

Frame index to compute native structure parameters for both CG and atomistic models.

radiusfloat or int

Shadow algorithm radius.

bonded_radiusfloat or int

Shadow algorithm bonded radius.

cutofffloat or int

Shadow algorithm cutoff.

box : None or np.ndarray Specify box shape. Note setting use_pbc as False is only compatible with orthogonal box. If use_pbc is False, then set box as None. If use_pbc is True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).

use_pbcbool

Whether to use periodic boundary condition (PBC) for searching native pairs.

exclude12bool

Whether to exclude nonbonded interactions between 1-2 atom pairs.

exclude13bool

Whether to exclude nonbonded interactions between 1-3 atom pairs.

exclude14bool

Whether to exclude nonbonded interactions between 1-4 atom pairs.

exclude_native_pairsbool

Whether to exclude nonbonded interactions between native pairs.

mass_dictdict

Mass dictionary.

charge_dictdict

Charge dictionary.

ssNone or sequence-like

Secondary structure type of each residue. If None, no secondary structure information is provided and all the native pairs are kept. If not None, then only native pairs within continuous ordered domains are kept.

ordered_ss_namessequence-like

The names of ordered secondary structures. The default value is [‘H’, ‘E’], where ‘H’ represents alpha-helix, and ‘E’ represents beta strand.

class MRGdsDNAParser(cg_pdb, default_parse=True)

MRG dsDNA parser. Note this parser only works on one dsDNA!

classmethod from_atomistic_pdb(atomistic_pdb, cg_pdb, write_TER=False, default_parse=True)

Initialize an MRG model dsDNA from atomistic pdb.

Parameters

atomistic_pdbstr

Path for the atomistic dsDNA pdb file.

cg_pdbstr

Output path for the CG dsDNA pdb file.

write_TERbool

Whether to write TER between two chains.

default_parsebool

Whether to parse with default settings.

Returns

resultclass instance

A class instance.

parse_exclusions(exclude12=True, exclude13=True)

Parse nonbonded exclusions based on bonds and angles. Note nonbonded interactions are not excluded for atom pairs with fan bonds.

Parameters

exclude12bool

Whether to exclude nonbonded interactions between 1-2 atom pairs.

exclude13bool

Whether to exclude nonbonded interactions between 1-3 atom pairs.

parse_mol(exclude12=True, exclude13=True, bonded_energy_scale=0.9, mass=325, charge=-1)

Parse molecule.

Parameters

exclude12bool

Whether to exclude nonbonded interactions between 1-2 atom pairs.

exclude13bool

Whether to exclude nonbonded interactions between 1-3 atom pairs.

bonded_energy_scalefloat or int

Scale factor for all the bonded energies.

massfloat or int

Mass of CG nucleotide bead.

chargefloat or int

Charge of each CG nucleotide bead.

class HPSParser(ca_pdb, default_parse=True)

HPS protein parser.

classmethod from_atomistic_pdb(atomistic_pdb, ca_pdb, write_TER=False, default_parse=True)

Initialize an HPS model protein from atomistic pdb.

Parameters

atomistic_pdbstr

Path for the atomistic pdb file.

ca_pdbstr

Output path for the CA pdb file.

write_TERbool

Whether to write TER between two chains.

default_parsebool

Whether to parse with default settings.

Returns

resultclass instance

A class instance.

parse_mol(exclude12=True, mass_dict={'ALA': 71.08, 'ARG': 156.2, 'ASN': 114.1, 'ASP': 115.1, 'CYS': 103.1, 'GLN': 128.1, 'GLU': 129.1, 'GLY': 57.05, 'HIS': 137.1, 'ILE': 113.2, 'LEU': 113.2, 'LYS': 128.2, 'MET': 131.2, 'PHE': 147.2, 'PRO': 97.12, 'SER': 87.08, 'THR': 101.1, 'TRP': 186.2, 'TYR': 163.2, 'VAL': 99.07}, charge_dict={'ALA': 0.0, 'ARG': 1.0, 'ASN': 0.0, 'ASP': -1.0, 'CYS': 0.0, 'GLN': 0.0, 'GLU': -1.0, 'GLY': 0.0, 'HIS': 0.5, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 1.0, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'VAL': 0.0})

Parse molecule.

Parameters

exclude12bool

Whether to exclude nonbonded interactions between 1-2 atoms.

mass_dictdict

Mass dictionary.

charge_dictdict

Charge dictionary.

class MpipiProteinParser(ca_pdb, default_parse=True)

Mpipi protein parser.

classmethod from_atomistic_pdb(atomistic_pdb, ca_pdb, write_TER=False, default_parse=True)

Initialize an Mpipi model protein from atomistic pdb.

Parameters

atomistic_pdbstr

Atomistic pdb file path.

ca_pdbstr

Output CA pdb file path.

write_TERbool

Whether to write TER between two chains.

default_parsebool

Whether to parse with default settings.

parse_mol(exclude12=True, mass_dict={'A': 329.2, 'ALA': 71.08, 'ARG': 156.2, 'ASN': 114.1, 'ASP': 115.1, 'C': 305.2, 'CYS': 103.1, 'G': 345.2, 'GLN': 128.1, 'GLU': 129.1, 'GLY': 57.05, 'HIS': 137.1, 'ILE': 113.2, 'LEU': 113.2, 'LYS': 128.2, 'MET': 131.2, 'PHE': 147.2, 'PRO': 97.12, 'SER': 87.08, 'THR': 101.1, 'TRP': 186.2, 'TYR': 163.2, 'U': 306.2, 'VAL': 99.07}, charge_dict={'A': -0.75, 'ALA': 0.0, 'ARG': 0.75, 'ASN': 0.0, 'ASP': -0.75, 'C': -0.75, 'CYS': 0.0, 'G': -0.75, 'GLN': 0.0, 'GLU': -0.75, 'GLY': 0.0, 'HIS': 0.375, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 0.75, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'U': -0.75, 'VAL': 0.0})

Parse molecule.

Parameters

exclude12bool

Whether to exclude nonbonded interactions between 1-2 atoms.

mass_dictdict

Mass dictionary.

charge_dictdict

Charge dictionary.

class MpipiRNAParser(cg_pdb, default_parse=True)

Mpipi RNA parser.

classmethod from_sequence(seq, cg_pdb, write_TER=False, default_parse=True)

Initialize an Mpipi model ssRNA from sequence. The initial structure is a straight chain.

Parameters

seqstr or sequence-like

Input RNA sequence.

cg_pdbstr

Output CG RNA pdb file path.

write_TERbool

Whether to write TER between two chains.

default_parsebool

Whether to parse with default settings.

Returns

resultclass instance

A class instance.

parse_mol(exclude12=True, mass_dict={'A': 329.2, 'ALA': 71.08, 'ARG': 156.2, 'ASN': 114.1, 'ASP': 115.1, 'C': 305.2, 'CYS': 103.1, 'G': 345.2, 'GLN': 128.1, 'GLU': 129.1, 'GLY': 57.05, 'HIS': 137.1, 'ILE': 113.2, 'LEU': 113.2, 'LYS': 128.2, 'MET': 131.2, 'PHE': 147.2, 'PRO': 97.12, 'SER': 87.08, 'THR': 101.1, 'TRP': 186.2, 'TYR': 163.2, 'U': 306.2, 'VAL': 99.07}, charge_dict={'A': -0.75, 'ALA': 0.0, 'ARG': 0.75, 'ASN': 0.0, 'ASP': -0.75, 'C': -0.75, 'CYS': 0.0, 'G': -0.75, 'GLN': 0.0, 'GLU': -0.75, 'GLY': 0.0, 'HIS': 0.375, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 0.75, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'U': -0.75, 'VAL': 0.0})

Parse molecule.

Parameters

exclude12bool

Whether to exclude nonbonded interactions between 1-2 atom pairs.

mass_dictdict

Mass dictionary.

chargefloat or int

Nucleotide charge.

class SMOGParser(atomistic_pdb, ca_pdb, default_parse=True)

SMOG protein parser.

classmethod from_atomistic_pdb(atomistic_pdb, ca_pdb, write_TER=False, default_parse=True)

Initialize an HPS model protein from atomistic pdb.

Parameters

atomistic_pdbstr

Path for the input atomistic pdb file.

ca_pdbstr

Path for the output CA pdb file.

write_TERbool

Whether to write TER between two chains.

default_parsebool

Whether to parse with default settings.

parse_exclusions(exclude12=True, exclude13=True, exclude14=True, exclude_native_pairs=True)

Parse nonbonded exclusions based on bonds, angles, dihedrals, and native pairs.

Parameters

exclude12bool

Whether to exclude nonbonded interactions between 1-2 atom pairs.

exclude13bool

Whether to exclude nonbonded interactions between 1-3 atom pairs.

exclude14bool

Whether to exclude nonbonded interactions between 1-4 atom pairs.

exclude_native_pairsbool

Whether to exclude nonbonded interactions between native pairs.

parse_mol(get_native_pairs=True, frame=0, radius=0.1, bonded_radius=0.05, cutoff=0.6, box=None, use_pbc=False, exclude12=True, exclude13=True, exclude14=True, exclude_native_pairs=True, mass_dict={'ALA': 71.0788, 'ARG': 156.1875, 'ASN': 114.1038, 'ASP': 115.0886, 'CYS': 103.1388, 'GLN': 128.1307, 'GLU': 129.1155, 'GLY': 57.0519, 'HIS': 137.1411, 'ILE': 113.1594, 'LEU': 113.1594, 'LYS': 128.1741, 'MET': 131.1926, 'PHE': 147.1766, 'PRO': 97.1167, 'SER': 87.0782, 'THR': 101.1051, 'TRP': 186.2132, 'TYR': 163.176, 'VAL': 99.1326}, charge_dict={'ALA': 0.0, 'ARG': 1.0, 'ASN': 0.0, 'ASP': -1.0, 'CYS': 0.0, 'GLN': 0.0, 'GLU': -1.0, 'GLY': 0.0, 'HIS': 0.0, 'ILE': 0.0, 'LEU': 0.0, 'LYS': 1.0, 'MET': 0.0, 'PHE': 0.0, 'PRO': 0.0, 'SER': 0.0, 'THR': 0.0, 'TRP': 0.0, 'TYR': 0.0, 'VAL': 0.0}, bonded_energy_scale=2.5)

Parse molecule.

Parameters

get_native_pairsbool

Whether to get native pairs from atomistic pdb with shadow algorithm.

frameint

Frame index to compute native configuration parameters for both CG and atomistic models.

radiusfloat or int

Shadow algorithm radius.

bonded_radiusfloat or int

Shadow algorithm bonded radius.

cutofffloat or int

Shadow algorithm cutoff.

box : None or np.ndarray Specify box shape. Note use_pbc = False is only compatible with orthogonal box. If use_pbc = False, then set box as None. If use_pbc = True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).

use_pbcbool

Whether to use periodic boundary condition (PBC) for searching native pairs.

exclude12bool

Whether to exclude nonbonded interactions between 1-2 atom pairs.

exclude13bool

Whether to exclude nonbonded interactions between 1-3 atom pairs.

exclude14bool

Whether to exclude nonbonded interactions between 1-4 atom pairs.

exclude_native_pairsbool

Whether to exclude nonbonded interactions between native pairs.

mass_dictdict

Mass dictionary.

charge_dictdict

Charge dictionary.

bonded_energy_scalefloat or int

Bonded energy additional scale factor.

class DNA3SPN2Parser(dna_type='B_curved')

DNA 3SPN2 parser. For B-curved DNA, the parser works best for a single strand ssDNA or WC-paired dsDNA. Please ensure the parsed DNA has unique chainID for each chain.

static aa_to_cg(aa_atoms, PSB_order=True)

Convert DNA all-atom structure to CG structure. Both input and output structures are saved as pandas dataframe. Notably, this function gives each residue in the output CG model a unique resSeq (index starts from 0). Using unique resSeq can be helpful when comparing with x3dna built template as chainID may be different. This code is long and specific to 3SPN2 model, so we put it here instead of in helper_functions.py.

Parameters

aa_atomspd.DataFrame

Input all-atom structure.

PSB_orderbool

Whether to ensure CG atom order in each nucleotide is P-S-B.

Returns

cg_atomspd.DataFrame

Output CG structure.

build_x3dna_template(temp_name='dna')

Build template DNA structure with x3dna. We do not need to specify whether to use PSB order, since template atom order is aligned with self.atoms. If self.atoms is one dsDNA molecule and the target sequence is W-C paired, x3dna input sequence is the sequence of the first ssDNA. Else, use the full sequence of the DNA as x3dna input sequence.

Parameters

temp_namestr

Name for built template files.

Returns

temp_atomspd.DataFrame

X3DNA built CG DNA template structure.

static change_sequence(cg_atoms, sequence)

Change DNA sequence. Note sequence includes the sequence of all the ssDNA chains.

Parameters

cg_atomspd.DataFrame

DNA CG structure.

sequencestr

New DNA sequence.

Returns

new_cg_atomspd.DataFrame

DNA CG structure with the new sequence.

classmethod from_atomistic_pdb(atomistic_pdb, cg_pdb, PSB_order=True, new_sequence=None, dna_type='B_curved', default_parse=True, temp_from_x3dna=True, temp_name='dna', input_temp=None)

Create object from atomistic pdb file. Ensure each chain in input pdb_file has unique chainID. If a new sequence is provided, then the CG DNA topology is from input pdb file, but using the new sequence.

Parameters

atomistic_pdbstr

Path for the input atomistic pdb file.

cg_pdbstr

Path for the output CG pdb file.

PSB_orderbool

Whether to ensure CG atom order in each nucleotide is P-S-B.

new_sequenceNone or str

The new full DNA sequence. If None, keep the original sequence.

dna_typestr

DNA type.

default_parsebool

Whether to parse molecule with default settings.

temp_from_x3dnabool

Whether to use template built by x3dna. If False, use the input pdb file as the template.

temp_namestr

Template file names.

input_tempNone or pd.DataFrame

Additional input template.

get_sequence()

Get all ssDNA sequence from CG atoms as a string. Note each chain has to possess a unique chainID.

Returns

sequencestr

DNA sequence.

get_sequence_list()

Get all ssDNA sequence from self.atoms as a list. Note each chain has to possess a unique chainID.

Returns

sequence_listlist

A list including multiple strings. Each string is the sequence of one chain.

static get_sequence_list_from_cg_atoms(cg_atoms)

Get all ssDNA sequence from CG atoms as a list. Since we use groupby to group by chainID, each chain has to possess a unique chainID.

Parameters

cg_atomspd.DataFrame

CG DNA atoms.

Returns

sequence_listlist

A list including multiple strings. Each string is the sequence of one chain.

parse_mol(temp_from_x3dna=True, temp_name='dna', input_temp=None)

Parse molecule with given template.

Parameters

temp_from_x3dnabool

Whether to use x3dna template.

temp_namestr

Name for x3dna built template files.

input_tempNone or pd.DataFrame

Additional input template. Use input_temp only if (not ((self.dna_type == ‘B_curved’) and temp_from_x3dna)) and (input_temp is not None).

openabc.utils

align_protein(input_pdbfile, reference_pdbfile, output_pdbfile, reference_atoms='CA')

Align a protein with the reference protein accoding to positions of alpha carbons.

Parameters

input_pdbfilestr

Input pdb file to be aligned.

reference_pdbfilestr

Pdb file that be used as reference. Note ‘input_pdbfile’ and ‘reference_pdbfile’ must represent the same system.

output_pdbfilestr

Aligned pdb file.

reference_atomsstr, default=’CA’

Name of the atoms that used as reference atoms. Currently only support alpha carbons.

References

Yunqi Li and Yang Zhang. REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins, 2009, 76: 665-676. https://zhanggroup.org/REMO/.

combine_proteins(pdbfile_lis, output_pdbflie)

Merge multiple pdb files into a single pdb file.

Parameters

pdbfile_lisstr

List that contains name of pdb files to be merged.

output_pdbfilestr

Merged pdb file.

multiple_chains_CA2AA(input_pdbfile, num_chains, num_residues, REMO_path, debug=False)

Reconstruct all-atom representation of a multiple-chain protein from its alpha carbons. If the name of input file is ‘XXX.pdb’, then output will be saved as ‘XXX_AA.pdb’ in the folder where ‘XXX.pdb’ is located.

Parameters

input_pdbfilestr

Input pdb file.

num_chainslist

Number of protein chains in the system. Each element of the list represent one type of protein.

num_residueslist

Length of protein chains in the system. Each element of the list represent one type of protein. Note num_chains and num_residues share the same length unit.

REMO_pathstr

Path where REMO program locates.

debugbool, default=False

If True, temporary files will not be deleted.

References

Yunqi Li and Yang Zhang. REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins, 2009, 76: 665-676. https://zhanggroup.org/REMO/.

single_chain_CA2AA(input_pdbfile, output_pdbfile, REMO_path, debug=False)

Reconstruct all-atom representation for a single-chain protein from its alpha carbons using REMO.

Parameters

input_pdbfilestr

Input pdb file that contains one protein chain.

output_pdbfilestr

output pdb file.

REMO_pathstr

Path where REMO program locates.

debugbool, default=False

If True, temporary files will not be deleted.

References

Yunqi Li and Yang Zhang. REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins, 2009, 76: 665-676. https://zhanggroup.org/REMO/.

split_protein(input_pdbfile, num_chains, num_residues, output_path)

Split a PDB file containing multiple chains into individual files, each containing one chain.

Parameters

input_pdbfilestr

Input pdb file.

num_chainslist

Number of protein chains in the system. Each element of the list represent one type of protein.

num_residueslist

Length of protein chains in the system. Each element of the list represent one type of protein. Note num_chains and num_residues share the same length unit.

output_pathstr

The folder where user wants to save output pdb files.

atomistic_pdb_to_ca_pdb(atomistic_pdb, ca_pdb, write_TER=False)

Convert atomistic pdb to protein CA pdb.

Parameters

atomistic_pdbstr

Path for the atomistic pdb file.

ca_pdbstr

Output path for the CA pdb file.

write_TERbool

Whether to write TER between two chains.

atomistic_pdb_to_nucleotide_pdb(atomistic_pdb, cg_nucleotide_pdb, write_TER=False)

Convert DNA or RNA atomistic pdb to nucleotide pdb (i.e. one CG bead per nucleotide). The position of each CG nucleotide bead is the geometric center of all the nucleotide atoms in the pdb.

Parameters

atomistic_pdbstr

Path for the atomistic pdb file.

cg_nucleotide_pdbstr

Output path for the CG pdb file.

write_TERbool

Whether to write TER between two chains.

build_straight_CA_chain(sequence, r0=0.38)

Build a straight portein CA atom chain with given sequence.

Parameters

sequencestr

Protein chain sequence (1 letter for each amino acid).

r0: float or int

Distance in unit nm between two neighboring CA atoms.

Returns

df_atomspd.DataFrame

A pandas dataframe includes atom information.

build_straight_chain(n_atoms, chainID, r0)

Build a straight chain. Each atom is viewed as one residue.

Parameters

n_atomsint

The number of atoms along the chain.

chainIDfloar or int or str

Chain ID.

r0float or int

Distance in unit nm between neighboring atoms along the chain.

Returns

df_atomspd.DataFrame

A pandas dataframe includes atom information.

compute_PE(traj, system, platform_name='Reference', properties={'Precision': 'double'}, groups=-1)

Compute the potential energy of samples in the trajectory with the given openmm system.

Parameters

trajmdtraj.Trajectory

The trajectory including all the samples.

systemopenmm.System

The openmm system.

platform_namestr

The platform name for running openmm to evaulate energy.

propertiesdict

Properties for running openmm. Only used when platform_name is CUDA or OpenCL.

groupsint or set

The force groups included when computing energy. See openmm context.getState for details about this parameter.

Returns

energynp.ndarray, shape = (traj.n_frames,)

The energy of each sample in the trajectory in unit kJ/mol.

compute_rg(coord, mass)

Compute radius of gyration.

Parameters

coordnp.ndarray, shape = (n_atoms, 3)

Input atom coordinates.

massnp.ndarray, shape = (n_atoms,)

Input atom mass.

Returns

rgfloat

Radius of gyration.

fix_pdb(pdb_file)

Fix pdb file with pdbfixer. The fixed structure is output as pandas dataframe. This function is adapted from open3SPN2.

Parameters

pdb_filestr

PDB file path.

Returns

atomspd.DataFrame

Output fixed structure.

get_WC_paired_sequence(sequence)

Get WC-paired sequence.

Parameters

sequencestr

Input sequence.

Returns

paired_sequencestr

W-C paired sequence.

make_mol_whole(coord, box_a, box_b, box_c)

Use this function to make the molecule whole. This is useful to recover the whole molecule if it is split by box boundary. The position of the first atom is kept. The other atoms are moved so that each one is in the closest periodic image relative to the previous one.

Parameters

coordnp.ndarray, shape = (n_atoms, 3)

Input atom coordinates.

box_afloat or int

Box length along x-axis.

box_bfloat or int

Box length along y-axis.

box_cfloat or int

Box length along z-axis.

Returns

new_coordnp.ndarray, shape = (n_atoms, 3)

Output atom coordinates.

move_atoms_to_closest_pbc_image(coord, ref_point, box_a, box_b, box_c)

Move the atom coordinates to the periodic image closest to ref_point.

Parameters

coordnp.ndarray, shape = (n_atoms, 3)

Input atom coordinates.

ref_pointnp.ndarray, shape = (3,)

The reference point coordinate.

box_afloat or int

Box length along x-axis.

box_bfloat or int

Box length along y-axis.

box_cfloat or int

Box length along z-axis.

Returns

new_coordnp.ndarray, shape = (n_atoms, 3)

Output atom coordinates.

parse_pdb(pdb_file)

Load pdb file as pandas dataframe. Note there should be only a single frame in the pdb file.

Parameters

pdb_filestr

Path for the pdb file.

Returns

pdb_atomspd.DataFrame

A pandas dataframe includes atom information.

write_pdb(pdb_atoms, pdb_file, write_TER=False)

Write pandas dataframe to pdb file.

Parameters

pdb_atomspd.DataFrame

A pandas dataframe includes atom information.

pdb_filestr

Output path for the pdb file.

write_TERbool

Whether to write TER between two chains.

class TemperatureReplicaExchange(backend, n_replicas, rank, positions, top, system, temperatures, integrator, platform_name='CUDA', properties={'Precision': 'mixed'})

Temperature replica exchange class for performing temperature replica exchange (TRE) simulations.

Note only positions and scaled velocities are exchanged, while other internal states of the simulations are not exchanged. This means the class may not be properly applied to simulations involving other internal states. For example, Nose-Hoover integrator may not work properly as it includes internal chain states.

The parallelization is achieved with torch.distributed.

An example of setting environment variables in a slurm job script:

export WORLD_SIZE=$((${SLURM_NNODES}*${SLURM_NTASKS_PER_NODE}))

export MASTER_PORT=$(expr 30000 + $(echo -n ${SLURM_JOBID} | tail -c 4))

master_addr=$(scontrol show hostnames “${SLURM_JOB_NODELIST}” | head -n 1)

export MASTER_ADDR=${master_addr}

add_reporters(report_interval, report_state=True, report_dcd=True, output_dcd=None)

Add reporters for OpenMM simulation.

Whether to use PBC is read from self.system information.

Parameters

report_intervalint

Report interval.

report_statebool

Whether to report simulation state.

report_dcdbool

Whether to report trajectory as dcd file.

output_dcdstr or None

The output dcd file path. If None, then the output dcd file path is set as output.{self.rank}.dcd.

run_replica_exchange(n_steps, exchange_interval, verbose=True)

Perform replica exchange simulation.

Exchange atom positions and rescaled velocities. Other state variables are not exchanged.

The temperature of each replica remains unchanged.

Parameters

n_stepsint

Total number of simulation steps for each replica.

exchange_intervalint

Exchange interval.

verbosebool

Whether to report exchange acceptance ratio and simulation speed.

find_ca_pairs_from_atomistic_pdb(atomistic_pdb, frame=0, radius=0.1, bonded_radius=0.05, cutoff=0.6, box=None, use_pbc=False)

Find protein CA atom pairs whose residues are in contact.

Parameters

atomistic_pdbstr

Path for the atomistic pdb file.

frameint

PDB frame index used for searching pairs.

radiusfloat or int

Shadow radius for candidate pair atoms and blocking atoms if not bonded to any pair atom.

bonded_radiusfloat or int

Shadow radius if the candidate blocking atom is directly bonded to any candidate pair atom.

cutoff: float or int

Cutoff distance for searching neighboring atom pairs.

boxNone or np.ndarray

Specify box shape. Note use_pbc = False is only compatible with orthogonal box. If use_pbc = False, then set box as None. If use_pbc = True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).

use_pbcbool

Whether to use periodic boundary condition (PBC).

Returns

df_ca_pairspd.DataFrame

Pandas DataFrame including CA atom pair indices and distances.

find_res_pairs_from_atomistic_pdb(atomistic_pdb, frame=0, radius=0.1, bonded_radius=0.05, cutoff=0.6, box=None, use_pbc=False)

Find native pairs between residues following the shadow algorithm. They algorithm only searches native pairs between residues that do not have 1-2, 1-3, or 1-4 interactions. If two heavy atoms from different residues are in contact, then the two residues are in contact.

Parameters

atomistic_pdbstr

Path for the atomistic pdb file.

frameint

PDB frame index used for searching pairs.

radiusfloat or int

Shadow radius for candidate pair atoms and blocking atoms if not bonded to any pair atom.

bonded_radiusfloat or int

Shadow radius if the candidate blocking atom is directly bonded to any candidate pair atom.

cutoff: float or int

Cutoff distance for searching neighboring atom pairs.

boxNone or np.ndarray

Specify box shape. Note use_pbc = False is only compatible with orthogonal box. If use_pbc = False, then set box as None. If use_pbc = True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).

use_pbcbool

Whether to use periodic boundary condition (PBC).

Returns

res_pairsnp.ndarray, shape = (n_pairs, 2)

Residue pairs found by the shadow algorithm.

df_atomspd.DataFrame

Pandas DataFrame including atom information from file atomistic_pdb.

get_bonded_neighbor_dict(atomistic_pdb)

Find atoms that are directly connected by bonds based on MDTraj.

Parameters

atomistic_pdbstr

Path for the atomistic pdb file.

Returns

bonded_neighbor_dictdict

A dictionary that the value is a list including all the atoms bonded to the atom of index key.

References

https://mdtraj.org/1.9.4/api/generated/mdtraj.Topology.html

get_neighbor_pairs_and_distances(coord, cutoff=0.6, box=None, use_pbc=False)

Find spatially neighboring atom pairs within cutoff range based on MDAnalysis.

Parameters

coordnp.ndarray, shape = (n_atoms, 3)

Input atom coordinates.

cutofffloat or int (default = 0.6)

Cutoff distance for searching neighboring atom pairs. Note coord and cutoff share the same length unit.

boxNone or np.ndarray

Specify box shape. Note use_pbc = False is only compatible with orthogonal box. If use_pbc = False, then set box as None. If use_pbc = True, then box = np.array([lx, ly, lz, alpha1, alpha2, alpha3]).

use_pbcbool

Whether to use periodic boundary condition (PBC).

Returns

neighbor_pairsnp.ndarray, shape = (n_pairs, 2)

Neighboring atom pairs.

neighbor_pair_distances: np.ndarray, shape = (N,)

Neighboring atom pair distances.

References

https://docs.mdanalysis.org/1.1.1/documentation_pages/lib/nsgrid.html

light_is_blocked(d12, d13, d23, r2, r3)

Check if the light from atom1 to atom2 is blocked by atom3.

Parameters

d12: float or int, positive

The distance between atom1 and atom2.

d13: float or int, positive

The distance between atom1 and atom3.

d23: float or int, positive

The distance between atom2 and atom3.

r2: float or int, non-negative

The radius of atom2. Require r2 <= d12.

r3: float or int, non-negative

The radius of atom3. Require r3 <= d13.

Returns

flagbool

Whether the light is blocked.

load_ca_pairs_from_gmx_top(top_file, ca_pdb, frame=0, use_pbc=False)

Load native pairs from GROMACS topology file. Note in GROMACS, atom indices start from 1, while in OpenMM, atom indices start from 0.

Parameters

top_filestr

Path for the GROMACS topology file.

ca_pdbstr

Path for the CA model PDB file.

frame: int

PDB frame index used for computing pair distances.

use_pbcbool

Whether to use periodic boundary condition (PBC).

Returns

df_ca_pairspd.DataFrame

Pandas DataFrame including CA atom pair indices and distances.

insert_molecules(new_pdb, output_pdb, n_mol, radius=0.5, existing_pdb=None, max_n_attempts=10000, box=[100, 100, 100], method='FastNS', reset_serial=True)

Insert multiple copies of given molecule into an existing pdb or a new empty box. Currently this function only supports using orthogonal box.

Parameters

new_pdbstr

Path of the new pdb file to be inserted.

output_pdbstr

Path of the output pdb file.

n_molint

Number of copies of the new molecule to be inserted.

radiusfloat or int

Radius for each atom in unit nm. Insertion ensures the distance between any two atoms are larger than 2*radius under periodic boundary condition.

existing_pdbNone or str

Path of an existing pdb file. If None, the molecules will be inserted into a new empty box, otherwise, inserted into a copy of existing_pdb.

max_n_attemptsint

Maximal number of attempts to insert. Ensure this value no smaller than n_mol.

boxlist or numpy array, shape = (3,)

New box size in unit nm.

methodstr

Which method to use for detecting if the inserted molecule has contact with existing atoms. Currently only ‘FastNS’ is supported. Any string will lead to method as ‘FastNS’.

reset_serialbool

Whether to reset serial to 0, 1, …, N - 1. If True, the serial in the final pdb is reset as 0, 1, …, N - 1. If False, the serial remains unchanged. If True and atom number > 1,000,000, the serial remains unchanged since the largest atom serial number allowed in pdb is 999,999.

insert_molecules_dataframe(new_atoms, n_mol, radius=0.5, existing_atoms=None, max_n_attempts=10000, box=[100, 100, 100], method='FastNS', reset_serial=True)

Insert multiple copies of given molecule as dataframes into an existing dataframe or a new empty box. Currently this function only supports using orthogonal box.

Parameters

new_atomspd.DataFrame

New atoms of the molecule to be inserted.

n_molint

Number of copies of the new molecule to be inserted.

radiusfloat or int

Radius for each atom in unit nm. Insertion ensures the distance between any two atoms are larger than 2*radius under periodic boundary condition.

existing_atomsNone or pd.DataFrame

Existing atoms. If None, the molecules will be inserted into a new empty box, otherwise, inserted into a copy of existing_atoms.

max_n_attemptsint

Maximal number of attempts to insert. Ensure this value no smaller than n_mol.

boxlist or numpy array, shape = (3,)

New box size in unit nm.

methodstr

Which method to use for detecting if the inserted molecule has contact with existing atoms. Currently only ‘FastNS’ is supported. Any string will lead to method as ‘FastNS’.

reset_serialbool

Whether to reset serial to 0, 1, …, N - 1. If True, the serial in the output dataframe is reset as 0, 1, …, N - 1. If False, the serial remains unchanged. If True and atom number > 1,000,000, the serial remains unchanged since the largest atom serial number allowed in pdb is 999,999.

Returns

atomspd.DataFrame

Output atoms.

parse_stride(stride_file)

Parse stride output file.

Parameters

stride_filestr

The stride output file.

Returns

df_datapd.DataFrame

A dataframe with the secondary structure information.

get_chromatin_rigid_bodies(n_nucl, nrl, n_rigid_bp_per_nucl=73)

Get chromatin rigid bodies. The chromatin should possess uniform linker length without additional linkers on both ends.

Parameters

n_nuclint

Nucleosome number.

nrlint

Nucleosome repeat length.

n_flexible_bp_per_nuclint

The number of flexible nucleosomal base pairs for each nucleosome.

Returns

rigid_bodieslist

List of rigid bodies.

remove_histone_tail_dihedrals(df_dihedrals)

Remove histone tail dihedral potentials from a single histone.

A dihedral potential is removed if at least one atom involved is within histone tail.

Parameters

df_dihedralspd.DataFrame

Dihedral potential. This should only include histones. Each histone includes 974 CA atoms, and there should be 974*n CG atoms in all (n is the number of histones).

Returns

new_df_dihedralspd.DataFrame

New dihedral potential.

remove_histone_tail_native_pairs_and_exclusions(df_native_pairs, df_exclusions)

Remove histone tail native pair potentials and corresponding exclusions from a single histone. A native pair potential is removed if at least one atom involved is within histone tail. The corresponding exclusions should also be removed if the native pair is removed. Also remove native pairs between atoms from different histones.

Parameters

df_native_pairspd.DataFrame

Native pair potential. This should only include histones. Each histone includes 974 CA atoms, and there should be 974*n CG atoms in all (n is the number of histones).

df_exclusionspd.DataFrame

Nonbonded exclusions.

Returns

new_df_native_pairspd.DataFrame

New native pair potential.

new_df_exclusionspd.DataFrame

New nonbonded exclusions.