[Open Babel] Fingerprints

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Open Babel] Fingerprints

drc-2
Hi,

I've just started looking at the use of fingerprints, (I was thinking about trying to add to them).

 babel /Users/swain/Desktop/mymols.sdf -ofpt
>MOL_00000067
>MOL_00000083   Tanimoto from MOL_00000067 = 0.810811
>MOL_00000105   Tanimoto from MOL_00000067 = 0.833333
>MOL_00000296   Tanimoto from MOL_00000067 = 0.425926
>MOL_00000320   Tanimoto from MOL_00000067 = 0.534884
>MOL_00000328   Tanimoto from MOL_00000067 = 0.511111
>MOL_00000338   Tanimoto from MOL_00000067 = 0.522727

Which is fine,

babel /Users/swain/Desktop/mymols.sdf -ofpt -xfFP3  
Cannot open /usr/local/share/openbabel/patterns.txt

patterns.txt is actually in a subfolder of /usr/local/share/openbabel/

I moved patterns.txt to the expected place and now I get

babel /Users/swain/Desktop/mymols.sdf -ofpt -xfFP3
SMARTS Error: [#6]C(=[S)[#6]
                       ^
SMARTS Error: [CX3]=N[#6,#1])[#6,#1]
                            ^
SMARTS Error: [#6]OOH
                    ^
>MOL_00000067
SMARTS Error: [#6]C(=[S)[#6]
                       ^
SMARTS Error: [CX3]=N[#6,#1])[#6,#1]
                            ^
SMARTS Error: [#6]OOH
                    ^
>MOL_00000083   Tanimoto from MOL_00000067 = 1
Possible superstructure of MOL_00000067
SMARTS Error: [#6]C(=[S)[#6]
                       ^
SMARTS Error: [CX3]=N[#6,#1])[#6,#1]
                            ^
SMARTS Error: [#6]OOH
                    ^
>MOL_00000105   Tanimoto from MOL_00000067 = 1
Possible superstructure of MOL_00000067
SMARTS Error: [#6]C(=[S)[#6]
                       ^
SMARTS Error: [CX3]=N[#6,#1])[#6,#1]

Chris





-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] Fingerprints

Geoffrey Hutchison

On Nov 7, 2005, at 9:09 AM, [hidden email] wrote:

> babel /Users/swain/Desktop/mymols.sdf -ofpt -xfFP3
> Cannot open /usr/local/share/openbabel/patterns.txt
>
> patterns.txt is actually in a subfolder of /usr/local/share/openbabel/

Oops, that's my fault. The fingerprints patterns aren't handled by  
the same code that reads in the rest of the data files. I'll fix the  
code ASAP.

> SMARTS Error: [#6]C(=[S)[#6]
> ...

Yes, there seem to be some SMARTS errors in patterns.txt! Can you  
file this as a separate bug in the SF tracker?

Also, I think Chris Morley (who wrote this fingerprint module) was  
looking for some lists of organic functional groups. (Well, I guess  
it doesn't need to be limited to organic FG... inorganic or  
organometallic would probably be OK too.) Improving the list in  
patterns.txt would be nice.

Thanks for the bug reports -- it's a big help.

Cheers,
-Geoff




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] Fingerprints

Chris Morley-3
In reply to this post by drc-2
[hidden email] wrote:
> I've just started looking at the use of fingerprints, (I was thinking about trying to add to them).

Good, I'm pleased to hear this.

>  babel /Users/swain/Desktop/mymols.sdf -ofpt
>
>>MOL_00000067
>>MOL_00000083   Tanimoto from MOL_00000067 = 0.810811
>>MOL_00000105   Tanimoto from MOL_00000067 = 0.833333
>>MOL_00000296   Tanimoto from MOL_00000067 = 0.425926
>>MOL_00000320   Tanimoto from MOL_00000067 = 0.534884
>>MOL_00000328   Tanimoto from MOL_00000067 = 0.511111
>>MOL_00000338   Tanimoto from MOL_00000067 = 0.522727
>
> Which is fine,
This uses the default fingerprint type FP2, which is the
only one which has received an significant amount of
testing (thanks to David Hoekman).

> babel /Users/swain/Desktop/mymols.sdf -ofpt -xfFP3  
> Cannot open /usr/local/share/openbabel/patterns.txt
>
> patterns.txt is actually in a subfolder of /usr/local/share/openbabel/
>
> I moved patterns.txt to the expected place and now I get
>
> babel /Users/swain/Desktop/mymols.sdf -ofpt -xfFP3
> SMARTS Error: [#6]C(=[S)[#6]
>                        ^
> SMARTS Error: [CX3]=N[#6,#1])[#6,#1]
>                             ^
> SMARTS Error: [#6]OOH
>                     ^
>
>>MOL_00000067
>
> SMARTS Error: [#6]C(=[S)[#6]
>                        ^
> SMARTS Error: [CX3]=N[#6,#1])[#6,#1]
>                             ^
> SMARTS Error: [#6]OOH
>                     ^
>
>>MOL_00000083   Tanimoto from MOL_00000067 = 1
>
> Possible superstructure of MOL_00000067
> SMARTS Error: [#6]C(=[S)[#6]
>                        ^
> SMARTS Error: [CX3]=N[#6,#1])[#6,#1]
>                             ^
> SMARTS Error: [#6]OOH
>                     ^
>
>>MOL_00000105   Tanimoto from MOL_00000067 = 1
>
> Possible superstructure of MOL_00000067
> SMARTS Error: [#6]C(=[S)[#6]
>                        ^
> SMARTS Error: [CX3]=N[#6,#1])[#6,#1]
FP3 uses a list of sub-structures in SMARTS form to
construct the fingerprint. I was hoping that somebody
would contibute such a list, but that hasn't happened
(yet). I started to construct such a list by hand in
pattern.txt based on the substructures used in Checkmol.
I found this very tedious and am far from finishing it.
Apparently I was so discouraged I didn't even check for
typos after the last edit. I have now corrected
patterns.txt so that at least it doesn't give SMARTS
errors. But this data set is not really useful at present
and I have added a line to it saying so. What's needed is
for somebody to provide a (free) list of substructures
along the lines of the proprietry ones like the MACCS Keys
described for instance in
http://www.mesaac.com/Fingerprint.htm

Chris

##############################################################################
#                                                                            #
#                Open Babel file: patterns.txt                        #
#                                                                            #
#  Copyright (c) 2005 Chris Morley                                           #
#  Part of the Open Babel package, under the GNU General Public License (GPL)#
#                                                                            #
# Functional groups for molecular fingerprinting based on Checkmol:          #
#   http://merian.pch.univie.ac.at/~nhaider/cheminf/fgtable.pdf              #
#                                                                            #
# SMARTS Patterns are used by finger3.cpp:PatternFP                          #
# Format of each line is a SMARTS pattern, then optionally                   #
#   followed by a tab character and a pattern number and/or desription       #
#   (everything after the tab will be ignored by the code                    #
#                                                                            #
#  INCOMPLETE!! Really only useful to test the fingerprint FP3               #
##############################################################################
[+] 1
[-] 2
[#6][CX3](=O) 3 aldehyde or ketone
[CX3H1](=O)[#6] 4 aldehyde
[#6][CX3](=O)[#6] 5 ketone
[#6][CX3](=S) 6 thioaldehyde or thioketone
[CX3H1](=S) 7 thioaldehyde
[#6]C(=[S])[#6] 8 thioketone
[CX3]=N([#6,#1])[#6,#1] 9 imine
[#6,#1]C([#6,#1])=[N][N]([#6,#1])[#6,#1] 10 hydrazone
[#6,#1]C([#6,#1])=[N][N]([#6,#1])C(=[O])[N]([#6,#1])[#6,#1] 11 semicarbazone
[#6,#1]C([#6,#1])=[N][N]([#6,#1])C(=[S])[N]([#6,#1])[#6,#1] 12 thiosemicarbazone
[#6,#1]C([#6,#1])=[N][OH] 13 oxime
[#6,#1]C([#6,#1])=[N][O][#6] 14 oxime ether
[CX3]=C=O 15 ketene
[CX3]=C=O 16 keten acetyl derivative***
[#6,#1]C([#6,#1])([OH])([OH]) 17 carbonyl hydrate
[#6,#1]C([#6,#1])([OH])(O[#6]) 18 hemiacetal
[#6,#1]C([#6,#1])(O[#6])(O[#6]) 19 acetal
[#6,#1]C([#6,#1])(N([#6,#1])[#6,#1])(O[#6]) 20 hemiaminal
[#6,#1]C([#6,#1])(N([#6,#1])[#6,#1])(N([#6,#1])[#6,#1]) 21 aminal
[#6,#1]C([#6,#1])(N([#6,#1])[#6,#1])([S][#6]) 22 thiohemiaminal
[#6,#1]C([#6,#1])([S][#6])([S][#6]) 23 thioacetal
[#6,#1]C([#6,#1])=C([#6,#1])N([#6,#1])[#6,#1] 24 enamine
[#6,#1]C([#6,#1])=C([#6,#1])[OH] 25 enol
[#6,#1]C([#6,#1])=C([#6,#1])O[#6] 26 enol ether
[#6][OH] 27 hydroxy compound
C[OH] 28 alcohol
[#6][CH2][OH] 29 primary alcohol  
[#6][CH]([#6])[OH] 30 secondary alcohol  
[#6][C]([#6])([#6])[OH] 31 tertiary alcohol  
[#6,#1]C([#6,#1])([OH])C([#6,#1])([#6,#1])[OH] 32 1,2-diol
[#6,#1]C([#6,#1])([OH])C([#6,#1])([#6,#1])[NH2] 33 1,2-aminoalcohol
c[OH] 34 phenol
[OH]cc[OH] 35 1,2-diphenol
[OH]C=C[OH] 36 enediol
[#6]O[#6] 37 ether
COC 38 dialkyl ether
cOC 39 alkylaryl ether
cOc 40 diaryl ether
[#6]S[#6] 41 thioether
[#6]SS[#6] 42 disulfide
[#6]OO[#6] 43 peroxide
[#6]O[OH] 44 hydroperoxide
Loading...