Finding the largest induced subfragment

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Finding the largest induced subfragment

Sam Tonddast-Navaei
Hello all,

 Thank you in advance for taking your time and I hope to explain the problem well. I am trying to find the largest induced subfragment given two molecules. For example given ATP (Adenosine Triposphate) and Guanine, I would like the algorithm to return the Purine ring. 

 I wrote a piece of code using NetworkX library in Python to implement the algorithm described here (https://en.wikipedia.org/wiki/Modular_product_of_graphs) and find the maximum clique. I included the function below. 

 The problem here is that it gets slow as the size of the molecule increases due to the fact that the problem is NP-complete. However at the limit of one molecule being the substructure of the other one, the problem is the same as SMARTS pattern matching and the findall() function of pybel is much much faster than my version. I am wondering what's the algorithm behind SMARTS match and is there an easier way to do what I did using existing Openbabel libraries. 

Looking forward to your thoughts.

Best,
Sam


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

match.py (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Finding the largest induced subfragment

Andrew Dalke
On Jun 21, 2017, at 01:14, Sam Tonddast-Navaei <[hidden email]> wrote:
>  Thank you in advance for taking your time and I hope to explain the problem well. I am trying to find the largest induced subfragment given two molecules. For example given ATP (Adenosine Triposphate) and Guanine, I would like the algorithm to return the Purine ring.

You might try https://github.com/openbabel/contributed/tree/master/c%2B%2B/mcs-cliquer , which is part of the code contributed to Open Babel but not part of the core project. That is, you'll need to clone or download https://github.com/openbabel/contributed then compile the code under c++/mcs-cliquer .

I did that just now with your use case, and got:

[openbabel-contributed/c++/mcs-cliquer] dalke% cat atp.smi
O=P(O)(O)OP(=O)(O)OP(=O)(O)OC[C@H]3O[C@@H](n2cnc1c(ncnc12)N)[C@H](O)[C@@H]3O ATP
[openbabel-contributed/c++/mcs-cliquer] dalke% cat guanine.smi
c1[nH]c2c(n1)c(=O)[nH]c(n2)N guanine
[openbabel-contributed/c++/mcs-cliquer] dalke% ./mcs atp.smi guanine.smi
[nH]1cnc2cncnc12 0.243902

The "0.243902" is a similarity score based on (the number of bonds "C" in the MCS)/(the number of bonds in both input structures - "C").


> I am wondering what's the algorithm behind SMARTS match and is there an easier way to do what I did using existing Openbabel libraries.

Open Babel uses the VF2 algorithm.



                                Andrew
                                [hidden email]



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss