similarity search

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

similarity search

Floriane Montanari
Hi all,

I am doing a similarity search between a query molecule given as a smiles string and a database of molecules. This database has been properly indexed with the command line. I am using MACCS fingerprints and programming in Python. But actually I am seeing the same thing with the command line:
So I have noticed that, if my query is in the database, the output is correct and the first compound of the list of similar compounds is my query itself.
But when I give as -s option a smiles string that corresponds to a molecule that is not present in the database, the first molecule of the output file is not my query molecule anymore, but apparently one close molecule from my database.

I have read here this:

note: if the query molecule does not match the SMARTS string this will not work as expected, as the first molecule in the database that matches the SMARTS string will instead be used as the query

Is it what is happening to me? Is there a way to force the fingerprint comparison between my real query and the database?

In case this is not possible, I was planning to use more programming, and doing that I have a second question:
is it possible to get a Fingerprint object from a list of "on" bits? Using SetBit() for example?

Thanks for any help,
Regards,

Floriane Montanari

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: similarity search

Chris Morley-3
On 12/01/2011 16:05, Floriane Montanari wrote:

> Hi all,
>
> I am doing a similarity search between a query molecule given as a
> smiles string and a database of molecules. This database has been
> properly indexed with the command line. I am using MACCS fingerprints
> and programming in Python. But actually I am seeing the same thing
> with the command line:
> So I have noticed that, if my query is in the database, the output is
> correct and the first compound of the list of similar compounds is my
> query itself.
> But when I give as -s option a smiles string that corresponds to a
> molecule that is not present in the database, the first molecule of
> the output file is not my query molecule anymore, but apparently one
> close molecule from my database.
>
> I have read here <http://openbabel.org/wiki/Tutorial:Fingerprints> this:
>
>     *note:* if the query molecule does not match the SMARTS string
>     this will not work as expected, as the first molecule in the
>     database that matches the SMARTS string will instead be used as
>     the query

The documentation applies when you are using the fpt output format.
This calculates the Tanimoto coefficient from the first molecule it is
given to all the rest. So you could provide the target molecule first:

   obabel -:"CCO" data.xxx -ofpt -xfMACCS

This will give an output line for every molecule, which is maybe not
what you want. If you try to filter using -s or --filter you will get
the behaviour you observe.

A more robust way, which I guess you are using, is to index it first
and then do one or more similarity searches:

   obabel data.xxx -ofs -xfMACCS
   obabel data.fs -O out.smi -at10 -aa -sSMILES

This will output the ten most similar molecules with the Tanimoto
attached. It seems to work ok for me. (Detailed output below.)

If you are still having difficulty, perhaps you could post the Python
or commandline you are using.


> Is it what is happening to me? Is there a way to force the fingerprint
> comparison between my /real query/ and the database?
>
> In case this is not possible, I was planning to use more programming,
> and doing that I have a second question:
> is it possible to get a Fingerprint object from a list of "on" bits?
> Using SetBit() for example?

I'm not clear what you are wanting here. It is possible to define a
new type of fingerprint that has each of its bits defined by a SMARTS
string in a data file and without programming. The MACCS fingerprint
is done in this way. I'm not sure where it is described, but I'll look
it out if you need it.

Chris


 >type sim4.smi
CC
CCCC
COC
COCCC

 >obabel sim4.smi -ofs -xfMACCS
This will prepare an index of sim4.smi and may take some time...
It contains 4 molecules
  It took 0 seconds
4 molecules converted

 >obabel sim4.fs -osmi -aa -at5 -sCC
CC      1
CCCC    0.333333
COC     0.285714
COCCC   0.142857
4 molecules converted

 >obabel sim4.fs -osmi -aa -at5 -sCCC
CCCC    0.571429
CC      0.4
COC     0.333333
COCCC   0.266667
4 molecules converted

 >obabel -:CC sim4.smi -ofpt -xfMACCS
 >
 >   Tanimoto from first mol = 1
Possible superstructure of first mol
 >   Tanimoto from first mol = 0.333333
Possible superstructure of first mol
 >   Tanimoto from first mol = 0.285714
Possible superstructure of first mol
 >   Tanimoto from first mol = 0.142857
Possible superstructure of first mol
5 molecules converted

 >obabel -:CCC sim4.smi -ofpt -xfMACCS
 >
 >   Tanimoto from first mol = 0.4
 >   Tanimoto from first mol = 0.571429
 >   Tanimoto from first mol = 0.333333
 >   Tanimoto from first mol = 0.266667
5 molecules converted




------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss