Need help in calculationg tanimoto coefficient

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Need help in calculationg tanimoto coefficient

chakravarthy
Hi,

I am new to openbabel. I came to know of open babel recently and want to use it for
my academic work.

Here is my Query:

I want to calculate tanimoto coefficient (similarity score) of two lipid molecules
but could not find step-by-step procedure to do so in open babel documentation.

What I have done so far:

I converted .sdf files of two lipids into SMILES string and obtained fingerprints
(FP2) of SMILES string. FP2 of one of the molecules looks like this -

00000000 01000000 00000000 00000400 00080000 00000000
00000000 00000000 00000000 00000080 00000000 40010000
00000000 00800000 00000000 00080008 00000000 00000000
01402000 00000001 00000000 10000010 03000000 00000010
00000000 00000000 00000000 00000000 00000000 00040000
00020000 00000000

Now, I would like to compare two such fingerprints (actually more than two) and
calculate their similarity score. Can someone advice me how to do it ?

Thanks in Advance
Chak

PS:  I am new to openbabel and Linux. Till now, I used openbabel GUI in windows.
Even though I have installed openbabel in my ubuntu, I have not used it.



------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in calculationg tanimoto coefficient

Chris Morley-3
When there are only  a small number of molecules, fpt format will give
you what you want; you are nearly there.

In the GUI, select sdf for the input format and ftp for the output.
Open the sdf file with the pattern molecule, then open the sdf file with
all the other molecules you want to compare to it, holding CTRL as you
leave the open dialog (which allows more than one input file).
Click Convert, and you will get a list of the Tanimoto coefficients
between the first molecule and each of the rest, using the default FP2
fingerprint.

On the command line (Windows or Linux):
   obabel  patternmol.sdf  othermols.sdf -ofpt

Chris

On 24/01/2011 06:30, [hidden email] wrote:

> Hi,
>
> I am new to openbabel. I came to know of open babel recently and want to use it for
> my academic work.
>
> Here is my Query:
>
> I want to calculate tanimoto coefficient (similarity score) of two lipid molecules
> but could not find step-by-step procedure to do so in open babel documentation.
>
> What I have done so far:
>
> I converted .sdf files of two lipids into SMILES string and obtained fingerprints
> (FP2) of SMILES string. FP2 of one of the molecules looks like this -
>
> 00000000 01000000 00000000 00000400 00080000 00000000
> 00000000 00000000 00000000 00000080 00000000 40010000
> 00000000 00800000 00000000 00080008 00000000 00000000
> 01402000 00000001 00000000 10000010 03000000 00000010
> 00000000 00000000 00000000 00000000 00000000 00040000
> 00020000 00000000
>
> Now, I would like to compare two such fingerprints (actually more than two) and
> calculate their similarity score. Can someone advice me how to do it ?
>
> Thanks in Advance
> Chak
>
> PS:  I am new to openbabel and Linux. Till now, I used openbabel GUI in windows.
> Even though I have installed openbabel in my ubuntu, I have not used it.
>
>
>
> ------------------------------------------------------------------------------
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsight-sfd2d
> _______________________________________________
> OpenBabel-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>


------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in calculationg tanimoto coefficient

chakravarthy
In reply to this post by chakravarthy
Dear All,

I made some progress with calculating tanimoto coefficient. I learn't to so it from
command line in my ubuntu machine. I was able to do it for two molecules with the
following command

PROMPT>  babel  mysmiles.smi  mymols.sdf -ofpt

When I tried  it for larger set, 1 (mysmiles.smi) vs 8(mymols.sdf), the following
error pops up

=================================================
*** Open Babel Warning  in ReadMolecule
  WARNING: Problems reading a MDL file
Cannot read atom and bond count
Expected standard 6 character atom and bond count
==================================================

Can any expert explain what went wrong ?
I suppose, problem lies in grouping of 8 molecules into single mymols.sdf file.

This is how grouped mymols.sdf file looks like

================================
[ OpenBabel12221015072D

 23 22  0  0  0  0  0  0  0  0999 V2000
    7.7424   -9.2926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    .................................................................
    .................................................................
    2.6281  -18.1511    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  ...................
  ...................
 22 23  1  0  0  0  0
M  END
$$$$
$$$$ ] 7 times

 OpenBabel12221015072D

 25 24  0  0  0  0  0  0  0  0999 V2000
    7.7424   -9.2926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    .................................................................
    .................................................................
    6.7196   -7.5210    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  ...................
  ...................
 24 25  1  0  0  0  0
M  END
$$$$
==================================

Any help is much appreciated

Thanks
Chak

chakravar..@....res.in wrote:

> Hi,
>
> I am new to openbabel. I came to know of open babel recently and want to use it for
> my academic work.
>
> Here is my Query:
>
> I want to calculate tanimoto coefficient (similarity score) of two lipid molecules
> but could not find step-by-step procedure to do so in open babel documentation.
>
> What I have done so far:
>
> I converted .sdf files of two lipids into SMILES string and obtained fingerprints
> (FP2) of SMILES string. FP2 of one of the molecules looks like this -
>
> 00000000 01000000 00000000 00000400 00080000 00000000
> 00000000 00000000 00000000 00000080 00000000 40010000
> 00000000 00800000 00000000 00080008 00000000 00000000
> 01402000 00000001 00000000 10000010 03000000 00000010
> 00000000 00000000 00000000 00000000 00000000 00040000
> 00020000 00000000
>
> Now, I would like to compare two such fingerprints (actually more than two) and
> calculate their similarity score. Can someone advice me how to do it ?
>
> Thanks in Advance
> Chak
>
> PS:  I am new to openbabel and Linux. Till now, I used openbabel GUI in windows.
> Even though I have installed openbabel in my ubuntu, I have not used it.
>
>



------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in calculationg tanimoto coefficient

chakravarthy
Dear All,

I tried chris technique of holding ctrl to input more than one file. It helps, but,
I did not get desired output.

Here is my input and output.

========================================================
Input:
 [ OpenBabel12221015072D

  23 22  0  0  0  0  0  0  0  0999 V2000
     7.7424   -9.2926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
     .................................................................
     .................................................................
     2.6281  -18.1511    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   1  2  1  0  0  0  0
   ...................
   ...................
  22 23  1  0  0  0  0
 M  END
 $$$$
 $$$$ ] 7 times

  OpenBabel12221015072D

  25 24  0  0  0  0  0  0  0  0999 V2000
     7.7424   -9.2926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
     .................................................................
     .................................................................
     6.7196   -7.5210    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
   1  2  1  0  0  0  0
   ...................
   ...................
  24 25  1  0  0  0  0
 M  END
 $$$$
=====================================================
Output:
>
>   Tanimoto from first mol = 0.869565
Possible superstructure of first mol

=====================================================

Any suggestions are welcome

Thank you Floriane and Chris !!

Chak

From: Chris Morley <c.morley@ga...> - 2011-01-24 10:34

When there are only  a small number of molecules, fpt format will give
you what you want; you are nearly there.

In the GUI, select sdf for the input format and ftp for the output.
Open the sdf file with the pattern molecule, then open the sdf file with
all the other molecules you want to compare to it, holding CTRL as you
leave the open dialog (which allows more than one input file).
Click Convert, and you will get a list of the Tanimoto coefficients
between the first molecule and each of the rest, using the default FP2
fingerprint.

On the command line (Windows or Linux):
   obabel  patternmol.sdf  othermols.sdf -ofpt

Chris


chak...@n..res.in wrote:

> Dear All,
>
> I made some progress with calculating tanimoto coefficient. I learn't to so it from
> command line in my ubuntu machine. I was able to do it for two molecules with the
> following command
>
> PROMPT>  babel  mysmiles.smi  mymols.sdf -ofpt
>
> When I tried  it for larger set, 1 (mysmiles.smi) vs 8(mymols.sdf), the following
> error pops up
>
> =================================================
> *** Open Babel Warning  in ReadMolecule
>   WARNING: Problems reading a MDL file
> Cannot read atom and bond count
> Expected standard 6 character atom and bond count
> ==================================================
>
> Can any expert explain what went wrong ?
> I suppose, problem lies in grouping of 8 molecules into single mymols.sdf file.
>
> This is how grouped mymols.sdf file looks like
>
> ================================
> [ OpenBabel12221015072D
>
>  23 22  0  0  0  0  0  0  0  0999 V2000
>     7.7424   -9.2926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     .................................................................
>     .................................................................
>     2.6281  -18.1511    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0  0  0  0
>   ...................
>   ...................
>  22 23  1  0  0  0  0
> M  END
> $$$$
> $$$$ ] 7 times
>
>  OpenBabel12221015072D
>
>  25 24  0  0  0  0  0  0  0  0999 V2000
>     7.7424   -9.2926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     .................................................................
>     .................................................................
>     6.7196   -7.5210    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0  0  0  0
>   ...................
>   ...................
>  24 25  1  0  0  0  0
> M  END
> $$$$
> ==================================
>
> Any help is much appreciated
>
> Thanks
> Chak
>
> chak wrote:
>> Hi,
>>
>> I am new to openbabel. I came to know of open babel recently and want to use it
>> for
>> my academic work.
>>
>> Here is my Query:
>>
>> I want to calculate tanimoto coefficient (similarity score) of two lipid
>> molecules
>> but could not find step-by-step procedure to do so in open babel documentation.
>>
>> What I have done so far:
>>
>> I converted .sdf files of two lipids into SMILES string and obtained fingerprints
>> (FP2) of SMILES string. FP2 of one of the molecules looks like this -
>>
>> 00000000 01000000 00000000 00000400 00080000 00000000
>> 00000000 00000000 00000000 00000080 00000000 40010000
>> 00000000 00800000 00000000 00080008 00000000 00000000
>> 01402000 00000001 00000000 10000010 03000000 00000010
>> 00000000 00000000 00000000 00000000 00000000 00040000
>> 00020000 00000000
>>
>> Now, I would like to compare two such fingerprints (actually more than two) and
>> calculate their similarity score. Can someone advice me how to do it ?
>>
>> Thanks in Advance
>> Chak
>>
>> PS:  I am new to openbabel and Linux. Till now, I used openbabel GUI in windows.
>> Even though I have installed openbabel in my ubuntu, I have not used it.
>>
>>
>
>



------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in calculationg tanimoto coefficient

Chris Morley-3
In reply to this post by chakravarthy
Your file othermol.sdf was a proper sdf file but the molecules all had a
title "$$$$". The use of the delimiter string as a title is not good
practice and confused OpenBabel.  I'll see if I can correct this, but if
you change the titles to Mol1, Mol2, etc. and name PatternMol, the
command below gives:

 >PatternMol
 >Mol1   Tanimoto from PatternMol = 0.5
 >Mol2   Tanimoto from PatternMol = 0.575
 >Mol3   Tanimoto from PatternMol = 0.95
 >Mol4   Tanimoto from PatternMol = 0.825
 >Mol5   Tanimoto from PatternMol = 0.673913

which is I think what you want. The revised othermols.sdf is attached.

Chris


On 25/01/2011 02:47, [hidden email] wrote:

> Dear Chris Morley,
>
> I tried your technique of holding ctrl to input more than one file. It helps, but,
> I did not get desired output.
>
> Here is my input and output.
>
> ========================================================
> Input:
>   [ OpenBabel12221015072D
>
>    23 22  0  0  0  0  0  0  0  0999 V2000
>       7.7424   -9.2926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>       .................................................................
>       .................................................................
>       2.6281  -18.1511    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     1  2  1  0  0  0  0
>     ...................
>     ...................
>    22 23  1  0  0  0  0
>   M  END
>   $$$$
>   $$$$ ] n times
>
>    OpenBabel12221015072D
>
>    25 24  0  0  0  0  0  0  0  0999 V2000
>       7.7424   -9.2926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>       .................................................................
>       .................................................................
>       6.7196   -7.5210    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
>     1  2  1  0  0  0  0
>     ...................
>     ...................
>    24 25  1  0  0  0  0
>   M  END
>   $$$$
> =====================================================
> Output:
>>    Tanimoto from first mol = 0.869565
> Possible superstructure of first mol
>
> =====================================================
> The command I am using is (in linux) ->  babel  patternmol.sdf  othermols.sdf -ofpt
> Both in Windows GUI and linux I receive same error message. It reads...
>
> =================================================
> *** Open Babel Warning  in ReadMolecule
>    WARNING: Problems reading a MDL file
> Cannot read atom and bond count
> Expected standard 6 character atom and bond count
> ==================================================
>
> Both in linux and windows, output contains tanimoto coefficient only for first
> molecule in othermols.sdf.
>
> Is the problem arising because of othermols.sdf format ? Molecule separators "M
> END" "$$$$" ? ( Enclosed patternmol.sdf and othermols.sdf files )
>
> It must be a trivial problem, but I could not figure it out all by myself.
>
> Any help is greatly appreciated
>
> Thankyou
> Chakravarthy
>
> From: Chris Morley<c.morley@ga...>  - 2011-01-24 10:34
>
> When there are only  a small number of molecules, fpt format will give
> you what you want; you are nearly there.
>
> In the GUI, select sdf for the input format and ftp for the output.
> Open the sdf file with the pattern molecule, then open the sdf file with
> all the other molecules you want to compare to it, holding CTRL as you
> leave the open dialog (which allows more than one input file).
> Click Convert, and you will get a list of the Tanimoto coefficients
> between the first molecule and each of the rest, using the default FP2
> fingerprint.
>
> On the command line (Windows or Linux):
>     obabel  patternmol.sdf  othermols.sdf -ofpt
>
> Chris
>

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

othermols.sdf (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in calculationg tanimoto coefficient

Noel O'Boyle
Administrator
Regarding the use of $$$$ in the title, Chris, you might want to read
http://blueobelisk.shapado.com/questions/what-is-the-minimum-needed-to-correctly-identify-records-in-an-sd-file.
It seems that this is not allowed by the spec...

- Noel

2011/1/25 Chris Morley <[hidden email]>:

> Your file othermol.sdf was a proper sdf file but the molecules all had a
> title "$$$$". The use of the delimiter string as a title is not good
> practice and confused OpenBabel.  I'll see if I can correct this, but if you
> change the titles to Mol1, Mol2, etc. and name PatternMol, the command below
> gives:
>
>>PatternMol
>>Mol1   Tanimoto from PatternMol = 0.5
>>Mol2   Tanimoto from PatternMol = 0.575
>>Mol3   Tanimoto from PatternMol = 0.95
>>Mol4   Tanimoto from PatternMol = 0.825
>>Mol5   Tanimoto from PatternMol = 0.673913
>
> which is I think what you want. The revised othermols.sdf is attached.
>
> Chris
>
>
> On 25/01/2011 02:47, [hidden email] wrote:
>>
>> Dear Chris Morley,
>>
>> I tried your technique of holding ctrl to input more than one file. It
>> helps, but,
>> I did not get desired output.
>>
>> Here is my input and output.
>>
>> ========================================================
>> Input:
>>  [ OpenBabel12221015072D
>>
>>   23 22  0  0  0  0  0  0  0  0999 V2000
>>      7.7424   -9.2926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>      .................................................................
>>      .................................................................
>>      2.6281  -18.1511    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>    1  2  1  0  0  0  0
>>    ...................
>>    ...................
>>   22 23  1  0  0  0  0
>>  M  END
>>  $$$$
>>  $$$$ ] n times
>>
>>   OpenBabel12221015072D
>>
>>   25 24  0  0  0  0  0  0  0  0999 V2000
>>      7.7424   -9.2926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>      .................................................................
>>      .................................................................
>>      6.7196   -7.5210    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
>>    1  2  1  0  0  0  0
>>    ...................
>>    ...................
>>   24 25  1  0  0  0  0
>>  M  END
>>  $$$$
>> =====================================================
>> Output:
>>>
>>>   Tanimoto from first mol = 0.869565
>>
>> Possible superstructure of first mol
>>
>> =====================================================
>> The command I am using is (in linux) ->  babel  patternmol.sdf
>>  othermols.sdf -ofpt
>> Both in Windows GUI and linux I receive same error message. It reads...
>>
>> =================================================
>> *** Open Babel Warning  in ReadMolecule
>>   WARNING: Problems reading a MDL file
>> Cannot read atom and bond count
>> Expected standard 6 character atom and bond count
>> ==================================================
>>
>> Both in linux and windows, output contains tanimoto coefficient only for
>> first
>> molecule in othermols.sdf.
>>
>> Is the problem arising because of othermols.sdf format ? Molecule
>> separators "M
>> END" "$$$$" ? ( Enclosed patternmol.sdf and othermols.sdf files )
>>
>> It must be a trivial problem, but I could not figure it out all by myself.
>>
>> Any help is greatly appreciated
>>
>> Thankyou
>> Chakravarthy
>>
>> From: Chris Morley<c.morley@ga...>  - 2011-01-24 10:34
>>
>> When there are only  a small number of molecules, fpt format will give
>> you what you want; you are nearly there.
>>
>> In the GUI, select sdf for the input format and ftp for the output.
>> Open the sdf file with the pattern molecule, then open the sdf file with
>> all the other molecules you want to compare to it, holding CTRL as you
>> leave the open dialog (which allows more than one input file).
>> Click Convert, and you will get a list of the Tanimoto coefficients
>> between the first molecule and each of the rest, using the default FP2
>> fingerprint.
>>
>> On the command line (Windows or Linux):
>>    obabel  patternmol.sdf  othermols.sdf -ofpt
>>
>> Chris
>>
>
>
> ------------------------------------------------------------------------------
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsight-sfd2d
> _______________________________________________
> OpenBabel-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>
>

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in calculationg tanimoto coefficient

Chris Morley-3
I read the discussion on Blue Obelisk immediately after posting. But in
fact the problem was that the sdf was malformed (extra lines). OB will
read titles with $$$$,  but the -f option, which skips molecules,
doesn't like it.

Chris

On 25/01/2011 16:45, Noel O'Boyle wrote:

> Regarding the use of $$$$ in the title, Chris, you might want to read
> http://blueobelisk.shapado.com/questions/what-is-the-minimum-needed-to-correctly-identify-records-in-an-sd-file.
> It seems that this is not allowed by the spec...
>
> - Noel
>
> 2011/1/25 Chris Morley<[hidden email]>:
>> Your file othermol.sdf was a proper sdf file but the molecules all had a
>> title "$$$$". The use of the delimiter string as a title is not good
>> practice and confused OpenBabel.  I'll see if I can correct this, but if you
>> change the titles to Mol1, Mol2, etc. and name PatternMol, the command below
>> gives:
>>
>>> PatternMol
>>> Mol1   Tanimoto from PatternMol = 0.5
>>> Mol2   Tanimoto from PatternMol = 0.575
>>> Mol3   Tanimoto from PatternMol = 0.95
>>> Mol4   Tanimoto from PatternMol = 0.825
>>> Mol5   Tanimoto from PatternMol = 0.673913
>> which is I think what you want. The revised othermols.sdf is attached.
>>
>> Chris
>>
>>
>> On 25/01/2011 02:47, [hidden email] wrote:
>>> Dear Chris Morley,
>>>
>>> I tried your technique of holding ctrl to input more than one file. It
>>> helps, but,
>>> I did not get desired output.
>>>
>>> Here is my input and output.
>>>
>>> ========================================================
>>> Input:
>>>   [ OpenBabel12221015072D
>>>
>>>    23 22  0  0  0  0  0  0  0  0999 V2000
>>>       7.7424   -9.2926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>       .................................................................
>>>       .................................................................
>>>       2.6281  -18.1511    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>     1  2  1  0  0  0  0
>>>     ...................
>>>     ...................
>>>    22 23  1  0  0  0  0
>>>   M  END
>>>   $$$$
>>>   $$$$ ] n times
>>>
>>>    OpenBabel12221015072D
>>>
>>>    25 24  0  0  0  0  0  0  0  0999 V2000
>>>       7.7424   -9.2926    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>>>       .................................................................
>>>       .................................................................
>>>       6.7196   -7.5210    0.0000 H   0  0  0  0  0  0  0  0  0  0  0  0
>>>     1  2  1  0  0  0  0
>>>     ...................
>>>     ...................
>>>    24 25  1  0  0  0  0
>>>   M  END
>>>   $$$$
>>> =====================================================
>>> Output:
>>>>    Tanimoto from first mol = 0.869565
>>> Possible superstructure of first mol
>>>
>>> =====================================================
>>> The command I am using is (in linux) ->    babel  patternmol.sdf
>>>   othermols.sdf -ofpt
>>> Both in Windows GUI and linux I receive same error message. It reads...
>>>
>>> =================================================
>>> *** Open Babel Warning  in ReadMolecule
>>>    WARNING: Problems reading a MDL file
>>> Cannot read atom and bond count
>>> Expected standard 6 character atom and bond count
>>> ==================================================
>>>
>>> Both in linux and windows, output contains tanimoto coefficient only for
>>> first
>>> molecule in othermols.sdf.
>>>
>>> Is the problem arising because of othermols.sdf format ? Molecule
>>> separators "M
>>> END" "$$$$" ? ( Enclosed patternmol.sdf and othermols.sdf files )
>>>
>>> It must be a trivial problem, but I could not figure it out all by myself.
>>>
>>> Any help is greatly appreciated
>>>
>>> Thankyou
>>> Chakravarthy
>>>
>>> From: Chris Morley<c.morley@ga...>    - 2011-01-24 10:34
>>>
>>> When there are only  a small number of molecules, fpt format will give
>>> you what you want; you are nearly there.
>>>
>>> In the GUI, select sdf for the input format and ftp for the output.
>>> Open the sdf file with the pattern molecule, then open the sdf file with
>>> all the other molecules you want to compare to it, holding CTRL as you
>>> leave the open dialog (which allows more than one input file).
>>> Click Convert, and you will get a list of the Tanimoto coefficients
>>> between the first molecule and each of the rest, using the default FP2
>>> fingerprint.
>>>
>>> On the command line (Windows or Linux):
>>>     obabel  patternmol.sdf  othermols.sdf -ofpt
>>>
>>> Chris
>>>
>>
>> ------------------------------------------------------------------------------
>> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
>> Finally, a world-class log management solution at an even better price-free!
>> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
>> February 28th, so secure your free ArcSight Logger TODAY!
>> http://p.sf.net/sfu/arcsight-sfd2d
>> _______________________________________________
>> OpenBabel-discuss mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>>
>>
> ------------------------------------------------------------------------------
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsight-sfd2d
> _______________________________________________
> OpenBabel-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>


------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in calculationg tanimoto coefficient

chakravarthy
In reply to this post by Noel O'Boyle
Dear All,

Thank you Chris for your help. It works well after removing $$$$.

I am curious to know what goes behind the open babel program. In particular, I want
to understand two things

1. How FP2 is generated and what each digit in FP2 stands for.
2. How two FP2's are compared and Tanimoto score is calculated. I found some
references for Tanimoto scoring. But, I couldn't find references for FP2 (except
for a brief description in open babel docs -
"http://openbabel.org/docs/dev/Fingerprints/fingerprints.html#fingerprint-format-details")

I would be great full if anyone could provide me references for understanding
fingerprints (especially FP2, but also others)

Thanks in Advance
Chakravarthy




------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in calculationg tanimoto coefficient

Ernst-Georg Schmid-2
Hi,

FP2 is a Daylight type fingerprint. See 6.1.2 of http://www.daylight.com/dayhtml/doc/theory/theory.finger.html for an explanation.

FP2 generates fragments up to 7 bonds length or shorter if a ring is encountered. The fragments are then hashed to the first 1021 bits of the 1024 bit space FP2 uses by default. So every bit corresponds to the presence of one or more (if hash collision occurs) fragments, but you cannot map a specific bit back to a specific fragment. If you need that, a substructure pattern based fingerprint like FP3 is the obvious choice.

See 6.3 of http://www.daylight.com/dayhtml/doc/theory/theory.finger.html for an explanation how similarity can be calculated from fingerprints.

Best regards,

Ergo


------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in calculationg tanimoto coefficient

Noel O'Boyle
Administrator
Just a small comment...

Since Open Babel 2.3.0, it is possible to describe each of the set
bits using the "s" output option, e.g.

C:\Users\Noel>obabel -:"CCC(=O)Cl" -ofpt -xs
>
0 6 1 6 <670>
0 6 1 6 1 6 <260>
0 8 2 6 <623>
0 8 2 6 1 6 <329>
0 8 2 6 1 6 1 6 <652>
0 17 <17>
0 17 1 6 <328>
0 17 1 6 1 6 <219>
0 17 1 6 1 6 1 6 <1009>
0 17 1 6 2 8 <329>
1 molecule converted

e.g. the first bit described, at 670, is a linear fragment (0)
consisting of a carbon (6) connected by a single bond (1) to another
carbon (6).

I should really write this up for the docs...

- Noel

On 26 January 2011 07:22, Ernst-Georg Schmid
<[hidden email]> wrote:

> Hi,
>
> FP2 is a Daylight type fingerprint. See 6.1.2 of http://www.daylight.com/dayhtml/doc/theory/theory.finger.html for an explanation.
>
> FP2 generates fragments up to 7 bonds length or shorter if a ring is encountered. The fragments are then hashed to the first 1021 bits of the 1024 bit space FP2 uses by default. So every bit corresponds to the presence of one or more (if hash collision occurs) fragments, but you cannot map a specific bit back to a specific fragment. If you need that, a substructure pattern based fingerprint like FP3 is the obvious choice.
>
> See 6.3 of http://www.daylight.com/dayhtml/doc/theory/theory.finger.html for an explanation how similarity can be calculated from fingerprints.
>
> Best regards,
>
> Ergo
>
>
> ------------------------------------------------------------------------------
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsight-sfd2d
> _______________________________________________
> OpenBabel-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in calculationg tanimoto coefficient

Ernst-Georg Schmid-2
>Just a small comment...

>Since Open Babel 2.3.0, it is possible to describe each of the set
>bits using the "s" output option, e.g.

Ah, nice feature.

Then one can build a dictionary of fragment<->bit mappings for a given input set along FP2 generation, including detection of ambiguous mappings.

Still, with FP3 the pattern<->bit mapping is ex ante valid for any possible input set, if the substructure patterns are unambiguous and the substructure detection algorithm works perfectly.

Best regards,
Ergo

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Need help in Modifying Tanimoto coefficient

chakravarthy
In reply to this post by chakravarthy
Dear All,

I am facing a problem in tanimoto scoring while comparing two molecules and would
like to seek your advice on this:

I noticed that molecules such as Myristic acid and Palmitic acid have same
similarity score of 1, when queried with Lauric acid. Open babel considers lauric
acid as substructure of Myristic and Palmitic acid. I would like to differentiate
between substructures by giving different score. I tried other fingerprints but
they all return with same core.

I am thinking of modifying Tanimoto score to other coefficient's like Kulczynski
index or Russel index. Eventhough I am not an expert at programming, I can
understand C, C++, Perl and Python languages. I would be grateful if any one can
direct me to file that contains code for calculating tanimoto coefficient. (I am
using openbabel
2.2.3 in Ubuntu OS)

Please feel free to suggest better ways to score substructures.

Thanks in Advance
Chak



------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in Modifying Tanimoto coefficient

Andrew Dalke
On Jan 31, 2011, at 8:15 PM, [hidden email] wrote:
> I noticed that molecules such as Myristic acid and Palmitic acid have same
> similarity score of 1,
 ...
> I am thinking of modifying Tanimoto score to other coefficient's like Kulczynski
> index or Russel index.

The only way to get a Tanimoto score of 1 is if the two fingerprints are identical. In that case there is no scoring method can tell the difference between the two because they are identical.

To get what you want you'll need to come up with a new fingerprinting scheme, not a new scoring method.

                                Andrew
                                [hidden email]



------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in Modifying Tanimoto coefficient

Chris Morley-3
On 01/02/2011 07:12, Andrew Dalke wrote:
> On Jan 31, 2011, at 8:15 PM, [hidden email] wrote:
>> I noticed that molecules such as Myristic acid and Palmitic acid have same
>> similarity score of 1,
>   ...
>> I am thinking of modifying Tanimoto score to other coefficient's like Kulczynski
>> index or Russel index.
> The only way to get a Tanimoto score of 1 is if the two fingerprints are identical. In that case there is no scoring method can tell the difference between the two because they are identical.
>
> To get what you want you'll need to come up with a new fingerprinting scheme, not a new scoring method.
None of OpenBabel's fingerprints provide a complete description of a
molecule. They are really intended as part of a fast screening method to
exclude molecules that, compared with a target molecule,  are too
dissimilar (or too similar) or which are not a superstructure of it.  
None of the current fingerprint types include stereochemistry and the
FP2 fingerprint has a built-in lack of certainty because different
fragments can be assigned to the the same bit.  It also indexes by the
presence or absence of fragments of up to 7 atoms, so does not
discriminate well for long chains of carbon atoms, like fatty acids or
normal hydrocarbons. It is possible make specialized FP3 fingerprints to
handle this type of structure, by including the number of times a
substructure occurs. Further description is in the code, although
recompilation is not necessary to make a new fingerprint type. However I
guess this is probably further than you want to go.

Chris

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in Modifying Tanimoto coefficient

Daniel Zaharevitz

On Feb 1, 2011, at 4:49 AM, Chris Morley wrote:

> On 01/02/2011 07:12, Andrew Dalke wrote:
>> On Jan 31, 2011, at 8:15 PM, [hidden email] wrote:
>>> I noticed that molecules such as Myristic acid and Palmitic acid  
>>> have same
>>> similarity score of 1,
>>  ...
>>> I am thinking of modifying Tanimoto score to other coefficient's  
>>> like Kulczynski
>>> index or Russel index.
>> The only way to get a Tanimoto score of 1 is if the two  
>> fingerprints are identical. In that case there is no scoring method  
>> can tell the difference between the two because they are identical.
>>
>> To get what you want you'll need to come up with a new  
>> fingerprinting scheme, not a new scoring method.
> None of OpenBabel's fingerprints provide a complete description of a
> molecule. They are really intended as part of a fast screening  
> method to
> exclude molecules that, compared with a target molecule,  are too
> dissimilar (or too similar) or which are not a superstructure of it.
> None of the current fingerprint types include stereochemistry and the
> FP2 fingerprint has a built-in lack of certainty because different
> fragments can be assigned to the the same bit.  It also indexes by the
> presence or absence of fragments of up to 7 atoms, so does not
> discriminate well for long chains of carbon atoms, like fatty acids or
> normal hydrocarbons. It is possible make specialized FP3  
> fingerprints to
> handle this type of structure, by including the number of times a
> substructure occurs. Further description is in the code, although
> recompilation is not necessary to make a new fingerprint type.  
> However I
> guess this is probably further than you want to go.
>

It looks like that is where I want to go. It seems that the FP2 isn't  
going to be good enough and I will probably end up having to customize  
at least to some extent. If we are going to get into customization, we  
may as well also look at tuning the fingerprints so that similar  
structures result in somewhat similar activity. To that end we are  
putting together a testing dataset from our open compounds. It looks  
like it will be about 9000 compounds chosen by:
1) been tested at least twice in the NCI-60 dose  response assay. This  
should mean that the NCI-60 correlations should be reasonably well  
determined.
2) the 2D structure exists and is consistent with the molecular  
formula stored independently in our database. This consistency is  
checked via CDK, so it means that at least CDK is able to assign atoms  
types well enough to get to the correct molecular formula.

We will calculate all the NCI-60 pairwise correlations and will post  
these and the structures. Should be done in a week or so. We will be  
looking to find a set of fingerprints that
1) never (or as close to never as we can get) return a value of 1.0  
for different structures.
2) has a well behaved (or maybe just well documented) relation between  
structure similarity and NCI-60 correlation. I'm not sure what we will  
get here, but I would like to be able to say something like a  
similarity score of >0.9 gives a 80% chance of a NCI-60 correlation of  
 >0.6.

I'm thinking we might also put together a dataset from the compounds  
tested in the onedose assay. That set would make it possible to look  
at relation of structure similarity to chances a compound will pass  
the onedose criteria.

DanZ

/********************************************
  *  Daniel Zaharevitz
  *  Chief, Information Technology Branch
  *  Developmental Therapeutics Program
  *  National Cancer Institute
  *  [hidden email]
  *
  ********************************************/





------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in Modifying Tanimoto coefficient

Imran Haque
On 02/01/2011 05:37 AM, Daniel Zaharevitz wrote:
> We will be
> looking to find a set of fingerprints that
> 1) never (or as close to never as we can get) return a value of 1.0
> for different structures.

I'm not sure that's it's implemented in OpenBabel, but if it's a 2D
structural descriptor you want, you could give LINGO (Vidal, Thormann,
and Pons, JCIM 2005; DOI: 10.1021/ci0496797) a shot. I've written a
LINGO implementation that's primarily targeted at GPUs but has a
reasonably fast CPU version (https://simtk.org/home/siml). The CPU code
is not as quick as the fastest DFA-based methods, but it'll handle your
9000^2 similarities in a matter of seconds.

(PS, it's BSD-licensed, in case anyone would like to integrate it into OB!)

> 2) has a well behaved (or maybe just well documented) relation between
> structure similarity and NCI-60 correlation. I'm not sure what we will
> get here, but I would like to be able to say something like a
> similarity score of>0.9 gives a 80% chance of a NCI-60 correlation of
>   >0.6.

The VTP paper above as well as a later one (DOI: 10.1021/ci6002152) show
decent correlation with activity; in my experience, similar to any other
given 2D similarity measure. I'm sure you know this, but "tuning"
fingerprints to any given small dataset is a dangerous art; it's very
easy to overfit.

Cheers,

Imran

------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in Modifying Tanimoto coefficient

Daniel Zaharevitz

On Feb 1, 2011, at 12:34 PM, Imran Haque wrote:

> On 02/01/2011 05:37 AM, Daniel Zaharevitz wrote:
>> We will be
>> looking to find a set of fingerprints that
>> 1) never (or as close to never as we can get) return a value of 1.0
>> for different structures.
>
> I'm not sure that's it's implemented in OpenBabel, but if it's a 2D
> structural descriptor you want, you could give LINGO (Vidal, Thormann,
> and Pons, JCIM 2005; DOI: 10.1021/ci0496797) a shot. I've written a
> LINGO implementation that's primarily targeted at GPUs but has a
> reasonably fast CPU version (https://simtk.org/home/siml). The CPU  
> code
> is not as quick as the fastest DFA-based methods, but it'll handle  
> your
> 9000^2 similarities in a matter of seconds.
>
> (PS, it's BSD-licensed, in case anyone would like to integrate it  
> into OB!)
>
>> 2) has a well behaved (or maybe just well documented) relation  
>> between
>> structure similarity and NCI-60 correlation. I'm not sure what we  
>> will
>> get here, but I would like to be able to say something like a
>> similarity score of>0.9 gives a 80% chance of a NCI-60 correlation of
>>> 0.6.
>
> The VTP paper above as well as a later one (DOI: 10.1021/ci6002152)  
> show
> decent correlation with activity; in my experience, similar to any  
> other
> given 2D similarity measure. I'm sure you know this, but "tuning"
> fingerprints to any given small dataset is a dangerous art; it's very
> easy to overfit.
>


Thanks for the interesting and useful pointers. I hope we can give it  
a try when I get the data together.

DanZ

/********************************************
  *  Daniel Zaharevitz
  *  Chief, Information Technology Branch
  *  Developmental Therapeutics Program
  *  National Cancer Institute
  *  [hidden email]
  *
  ********************************************/





------------------------------------------------------------------------------
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in calculationg tanimoto coefficient

chakravarthy
In reply to this post by Chris Morley-3
On 01/02/2011 09:49, Chris Morley wrote:

 On 01/02/2011 07:12, Andrew Dalke wrote:

  On Jan 31, 2011, at 8:15 PM, [hidden email] wrote:
    I noticed that molecules such as Myristic acid and Palmitic acid have
same similarity score of 1, ... I am thinking of modifying Tanimoto
score to other coefficient's like Kulczynski index or Russel index.

 The only way to get a Tanimoto score of 1 is if the two fingerprints are
identical. In that case there is no scoring method can tell the
difference between the two because they are identical. To get what you
want you'll need to come up with a new fingerprinting scheme, not a new
scoring method.

None of OpenBabel's fingerprints provide a complete description of a
molecule. They are really intended as part of a fast screening method to
exclude molecules that, compared with a target molecule,  are too
dissimilar (or too similar) or which are not a superstructure of it.
None of the current fingerprint types include stereochemistry and the
FP2 fingerprint has a built-in lack of certainty because different
fragments can be assigned to the the same bit.  It also indexes by the
presence or absence of fragments of up to 7 atoms, so does not
discriminate well for long chains of carbon atoms, like fatty acids or
normal hydrocarbons. It is possible make specialized FP3 fingerprints to
handle this type of structure, by including the number of times a
substructure occurs. Further description is in the code, although
recompilation is not necessary to make a new fingerprint type. However I
guess this is probably further than you want to go.

Thank you Chris, Andrew, Noel for your inputs.

My data set contains lipid/fattyacids (in smiles format). At times, each
molecule is different from other in a single oxygen atom "O" or single
carbon atom "C" or single double bond "=".

I tried patching-up fingerprint similarity by adding weight to number of
carbon atoms, number of double-bonds etc, this turns out to be a dirty
job as this can easily overfit or misfit. I would like to see a permanent
solution, that is applicable for any given set of smiles.

Can anyone point out existing 1D (input as smiles string) similarity
methods that can differentiate single atom/bond differences ?

Thanks
Chak


------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

browse around these guys

robertfcrocker
This post has NOT been accepted by the mailing list yet.
In reply to this post by Daniel Zaharevitz
I found your this post while searching for some related information on blog search...Its a good post..keep posting and update the information
browse around these guys
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help in calculationg tanimoto coefficient

alleppeyhouseboat
This post has NOT been accepted by the mailing list yet.
In reply to this post by chakravarthy
thank you for all replies i was in search for a better answer for this question. i like the forum am a travel agent in kerala houseboats. but like to work in mncs, so am studying all these things.
Loading...