PoseButcher Documentation

Logo

Installation

PoseButcher has several dependencies, which can’t all be managed via a pip installation. Notable vtkbool, open3d and PyGAMer can cause problems. In a fresh conda python=3.10 virtual environment execute the folowing:

conda install -y -c conda-forge vtkbool
pip install --upgrade posebutcher

Open3d is required in all cases. PyGaMER is only necessary to generate protein surfaces, which can be computed on a different machine, or omitted from calculations. vtkbool is required for pocket-protein clipping. In case you want to use posebutcher without one of the optional dependencies:

pip install --upgrade posebutcher --no-dependencies

Getting started

This example uses the open science 2a viral protease fragment screen from the ASAP consortium, available on Fragalysis <https://fragalysis.diamond.ac.uk>. The test data used in this example is available at test_data.

Creating a PoseButcher object

Specify a reference protein structure:

protein = 'test_data/2a_Ax0310a_apo-desolv.pdb'

In order to butcher ligands with reference to the space occupied by fragments, specify a set of hits in an SD file:

hits = 'test_data/2a_fragments.sdf'

Defining the pockets can be tricky, It is suggested you use an external tool (PyMOL/Fragalysis) to pick the correct atoms. Currently only spherical pockets are supported. {ockets should be a dict with str keys and dict values. Below, spherical pockets are defined at the centre of mass of several atoms with a radius defined by the average distance from CoM to the atoms or a given value:

pockets = {
    "P1": dict(type='sphere', atoms=['GLY 127 O', 'PRO 107 CG', 'CYS 110 SG'], radius='mean'),
    "P2": dict(type='sphere', atoms=['VAL 84 CG1', 'TYR 90 CD2', 'SER 87 CB'], radius='mean'),
    "P1'": dict(type='sphere', atoms=['GLU 88 CB', 'PRO 107 CB', 'HIS 21 CD2'], radius='mean'),
    "P2'": dict(type='sphere', atoms=['PRO 107 CB', 'LEU 22 CD1'], shift=[0, 0, 0], radius='mean'),
    "P3": dict(type='sphere', atoms=['GLY 127 O', 'GLU 85 CB'], radius=4),
    "P4": dict(type='sphere', atoms=['LEU 98 CD2'], radius=5),
    "P5": dict(type='sphere', atoms=['ASN 129 ND2', 'ASN 129 ND2', 'ILE 82 CG2'], radius=4),
    "P6": dict(type='sphere', atoms=['ILE 82 CG2'], shift=[-1,0,1], radius=4),
}

A chain letter can also be specified e.g. (‘GLY 127 O A’). The radius parameter can be a float, ‘mean’, ‘min’, or ‘max’. The shift keyword applies a translation to the center of the spherical pocket.

The PoseButcher object can then be created:

butcher = PoseButcher(
            protein,
            hits,
            pockets,
            pocket_clip=True,
            pocket_clip_protein=True,
            pocket_clip_hull=True,
           )

Pocket clipping can take a while, when iterating on the pocket definitions you may prefer to disable it. pocket_clip == False will disable all clipping, alternatively just disable clipping to the protein with pocket_clip_protein (or it’s hull with pocket_clip_hull). Pass a list of chain characters as pocket_chains to duplicate the pockets across the specified chains.

To view the 3d meshes created use PoseButcher.render():

butcher.render()
Chop 3d Output

Using PoseButcher.chop()

Use PoseButcher.chop() to chop up a posed de novo compound. First, let’s load a test molecule:

mol_df = PandasTools.LoadSDF('test_data/2a_bases.sdf')
mol = mol_df.iloc[1]['ROMol']
mol._Name = mol_df.iloc[1]['ID']

PoseButcher.chop() can tell you whether the atoms in a ligand are in user-defined pockets, overlapping with the fragment bolus, clashing with the protein, or in the solvent space:

result = butcher.chop(mol, draw='2d')
Chop 2d Output

A 3d view is also available:

result = butcher.chop(mol, draw='3d')

In both cases, the result is a dictionary with atom index keys:

{
    0: ('GOOD', 'pocket', 'P1'),
    1: ('GOOD', 'pocket', 'P1'),
    2: ('GOOD', 'pocket', 'P1'),
    3: ('GOOD', 'pocket', 'P1'),
    4: ('GOOD', 'pocket', 'P1'),
    5: ('GOOD', 'pocket', 'P1'),
    6: ('GOOD', 'pocket', 'P1'),
    7: ('GOOD', 'pocket', 'P1'),
    8: ('GOOD', 'pocket', 'P2'),
    9: ('GOOD', 'pocket', 'P2'),
   10: ('GOOD', 'pocket', 'P2'),
   11: ('GOOD', 'pocket', 'P1'),
   12: ('GOOD', 'pocket', 'P1'),
   13: ('GOOD', 'pocket', 'P1'),
   14: ('GOOD', 'pocket', 'P1'),
   15: ('BAD', 'solvent space'),
   16: ('BAD', 'solvent space'),
   17: ('BAD', 'solvent space'),
   18: ('BAD', 'solvent space'),
   19: ('BAD', 'solvent space')
}

If your ligand is an elaboration or expansion of a known base (parent) compound you can consider only the novel material using the base argument to chop.

If you want to use pockets to tag a compound but don’t care which atoms use the PoseButcher.tag() method which will return a set of the atom classifications:

>>> butcher.tag(mol)
{'P1', 'P2', 'solvent space'}
>>> butcher.tag(mol, pockets_only=True)
{'P1', 'P2'}

File I/O of a PoseButcher object

If you are happy with your PoseButcher object you can export it using PoseButcher.write().

A new instance can be created using the PoseButcher.from_directory() classmethod.

Changing the protein of a PoseButcher object

To change the reference protein simply:

butcher.protein = '/new/path/to/protein.pdb'

See also the example notebook on github.