Hi,
I am new to openbabel. I came to know of open babel recently and want to use it for my academic work. Here is my Query: I want to calculate tanimoto coefficient (similarity score) of two lipid molecules but could not find step-by-step procedure to do so in open babel documentation. What I have done so far: I converted .sdf files of two lipids into SMILES string and obtained fingerprints (FP2) of SMILES string. FP2 of one of the molecules looks like this - 00000000 01000000 00000000 00000400 00080000 00000000 00000000 00000000 00000000 00000080 00000000 40010000 00000000 00800000 00000000 00080008 00000000 00000000 01402000 00000001 00000000 10000010 03000000 00000010 00000000 00000000 00000000 00000000 00000000 00040000 00020000 00000000 Now, I would like to compare two such fingerprints (actually more than two) and calculate their similarity score. Can someone advice me how to do it ? Thanks in Advance Chak PS: I am new to openbabel and Linux. Till now, I used openbabel GUI in windows. Even though I have installed openbabel in my ubuntu, I have not used it. ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
When there are only a small number of molecules, fpt format will give
you what you want; you are nearly there. In the GUI, select sdf for the input format and ftp for the output. Open the sdf file with the pattern molecule, then open the sdf file with all the other molecules you want to compare to it, holding CTRL as you leave the open dialog (which allows more than one input file). Click Convert, and you will get a list of the Tanimoto coefficients between the first molecule and each of the rest, using the default FP2 fingerprint. On the command line (Windows or Linux): obabel patternmol.sdf othermols.sdf -ofpt Chris On 24/01/2011 06:30, [hidden email] wrote: > Hi, > > I am new to openbabel. I came to know of open babel recently and want to use it for > my academic work. > > Here is my Query: > > I want to calculate tanimoto coefficient (similarity score) of two lipid molecules > but could not find step-by-step procedure to do so in open babel documentation. > > What I have done so far: > > I converted .sdf files of two lipids into SMILES string and obtained fingerprints > (FP2) of SMILES string. FP2 of one of the molecules looks like this - > > 00000000 01000000 00000000 00000400 00080000 00000000 > 00000000 00000000 00000000 00000080 00000000 40010000 > 00000000 00800000 00000000 00080008 00000000 00000000 > 01402000 00000001 00000000 10000010 03000000 00000010 > 00000000 00000000 00000000 00000000 00000000 00040000 > 00020000 00000000 > > Now, I would like to compare two such fingerprints (actually more than two) and > calculate their similarity score. Can someone advice me how to do it ? > > Thanks in Advance > Chak > > PS: I am new to openbabel and Linux. Till now, I used openbabel GUI in windows. > Even though I have installed openbabel in my ubuntu, I have not used it. > > > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > OpenBabel-discuss mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss > ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
In reply to this post by chakravarthy
Dear All,
I made some progress with calculating tanimoto coefficient. I learn't to so it from command line in my ubuntu machine. I was able to do it for two molecules with the following command PROMPT> babel mysmiles.smi mymols.sdf -ofpt When I tried it for larger set, 1 (mysmiles.smi) vs 8(mymols.sdf), the following error pops up ================================================= *** Open Babel Warning in ReadMolecule WARNING: Problems reading a MDL file Cannot read atom and bond count Expected standard 6 character atom and bond count ================================================== Can any expert explain what went wrong ? I suppose, problem lies in grouping of 8 molecules into single mymols.sdf file. This is how grouped mymols.sdf file looks like ================================ [ OpenBabel12221015072D 23 22 0 0 0 0 0 0 0 0999 V2000 7.7424 -9.2926 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 ................................................................. ................................................................. 2.6281 -18.1511 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 ................... ................... 22 23 1 0 0 0 0 M END $$$$ $$$$ ] 7 times OpenBabel12221015072D 25 24 0 0 0 0 0 0 0 0999 V2000 7.7424 -9.2926 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 ................................................................. ................................................................. 6.7196 -7.5210 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 ................... ................... 24 25 1 0 0 0 0 M END $$$$ ================================== Any help is much appreciated Thanks Chak chakravar..@....res.in wrote: > Hi, > > I am new to openbabel. I came to know of open babel recently and want to use it for > my academic work. > > Here is my Query: > > I want to calculate tanimoto coefficient (similarity score) of two lipid molecules > but could not find step-by-step procedure to do so in open babel documentation. > > What I have done so far: > > I converted .sdf files of two lipids into SMILES string and obtained fingerprints > (FP2) of SMILES string. FP2 of one of the molecules looks like this - > > 00000000 01000000 00000000 00000400 00080000 00000000 > 00000000 00000000 00000000 00000080 00000000 40010000 > 00000000 00800000 00000000 00080008 00000000 00000000 > 01402000 00000001 00000000 10000010 03000000 00000010 > 00000000 00000000 00000000 00000000 00000000 00040000 > 00020000 00000000 > > Now, I would like to compare two such fingerprints (actually more than two) and > calculate their similarity score. Can someone advice me how to do it ? > > Thanks in Advance > Chak > > PS: I am new to openbabel and Linux. Till now, I used openbabel GUI in windows. > Even though I have installed openbabel in my ubuntu, I have not used it. > > ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
Dear All,
I tried chris technique of holding ctrl to input more than one file. It helps, but, I did not get desired output. Here is my input and output. ======================================================== Input: [ OpenBabel12221015072D 23 22 0 0 0 0 0 0 0 0999 V2000 7.7424 -9.2926 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 ................................................................. ................................................................. 2.6281 -18.1511 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 ................... ................... 22 23 1 0 0 0 0 M END $$$$ $$$$ ] 7 times OpenBabel12221015072D 25 24 0 0 0 0 0 0 0 0999 V2000 7.7424 -9.2926 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 ................................................................. ................................................................. 6.7196 -7.5210 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 ................... ................... 24 25 1 0 0 0 0 M END $$$$ ===================================================== Output: > > Tanimoto from first mol = 0.869565 Possible superstructure of first mol ===================================================== Any suggestions are welcome Thank you Floriane and Chris !! Chak From: Chris Morley <c.morley@ga...> - 2011-01-24 10:34 When there are only a small number of molecules, fpt format will give you what you want; you are nearly there. In the GUI, select sdf for the input format and ftp for the output. Open the sdf file with the pattern molecule, then open the sdf file with all the other molecules you want to compare to it, holding CTRL as you leave the open dialog (which allows more than one input file). Click Convert, and you will get a list of the Tanimoto coefficients between the first molecule and each of the rest, using the default FP2 fingerprint. On the command line (Windows or Linux): obabel patternmol.sdf othermols.sdf -ofpt Chris chak...@n..res.in wrote: > Dear All, > > I made some progress with calculating tanimoto coefficient. I learn't to so it from > command line in my ubuntu machine. I was able to do it for two molecules with the > following command > > PROMPT> babel mysmiles.smi mymols.sdf -ofpt > > When I tried it for larger set, 1 (mysmiles.smi) vs 8(mymols.sdf), the following > error pops up > > ================================================= > *** Open Babel Warning in ReadMolecule > WARNING: Problems reading a MDL file > Cannot read atom and bond count > Expected standard 6 character atom and bond count > ================================================== > > Can any expert explain what went wrong ? > I suppose, problem lies in grouping of 8 molecules into single mymols.sdf file. > > This is how grouped mymols.sdf file looks like > > ================================ > [ OpenBabel12221015072D > > 23 22 0 0 0 0 0 0 0 0999 V2000 > 7.7424 -9.2926 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > ................................................................. > ................................................................. > 2.6281 -18.1511 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 1 0 0 0 0 > ................... > ................... > 22 23 1 0 0 0 0 > M END > $$$$ > $$$$ ] 7 times > > OpenBabel12221015072D > > 25 24 0 0 0 0 0 0 0 0999 V2000 > 7.7424 -9.2926 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > ................................................................. > ................................................................. > 6.7196 -7.5210 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 1 0 0 0 0 > ................... > ................... > 24 25 1 0 0 0 0 > M END > $$$$ > ================================== > > Any help is much appreciated > > Thanks > Chak > > chak wrote: >> Hi, >> >> I am new to openbabel. I came to know of open babel recently and want to use it >> for >> my academic work. >> >> Here is my Query: >> >> I want to calculate tanimoto coefficient (similarity score) of two lipid >> molecules >> but could not find step-by-step procedure to do so in open babel documentation. >> >> What I have done so far: >> >> I converted .sdf files of two lipids into SMILES string and obtained fingerprints >> (FP2) of SMILES string. FP2 of one of the molecules looks like this - >> >> 00000000 01000000 00000000 00000400 00080000 00000000 >> 00000000 00000000 00000000 00000080 00000000 40010000 >> 00000000 00800000 00000000 00080008 00000000 00000000 >> 01402000 00000001 00000000 10000010 03000000 00000010 >> 00000000 00000000 00000000 00000000 00000000 00040000 >> 00020000 00000000 >> >> Now, I would like to compare two such fingerprints (actually more than two) and >> calculate their similarity score. Can someone advice me how to do it ? >> >> Thanks in Advance >> Chak >> >> PS: I am new to openbabel and Linux. Till now, I used openbabel GUI in windows. >> Even though I have installed openbabel in my ubuntu, I have not used it. >> >> > > ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
In reply to this post by chakravarthy
Your file othermol.sdf was a proper sdf file but the molecules all had a
title "$$$$". The use of the delimiter string as a title is not good practice and confused OpenBabel. I'll see if I can correct this, but if you change the titles to Mol1, Mol2, etc. and name PatternMol, the command below gives: >PatternMol >Mol1 Tanimoto from PatternMol = 0.5 >Mol2 Tanimoto from PatternMol = 0.575 >Mol3 Tanimoto from PatternMol = 0.95 >Mol4 Tanimoto from PatternMol = 0.825 >Mol5 Tanimoto from PatternMol = 0.673913 which is I think what you want. The revised othermols.sdf is attached. Chris On 25/01/2011 02:47, [hidden email] wrote: > Dear Chris Morley, > > I tried your technique of holding ctrl to input more than one file. It helps, but, > I did not get desired output. > > Here is my input and output. > > ======================================================== > Input: > [ OpenBabel12221015072D > > 23 22 0 0 0 0 0 0 0 0999 V2000 > 7.7424 -9.2926 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > ................................................................. > ................................................................. > 2.6281 -18.1511 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 1 0 0 0 0 > ................... > ................... > 22 23 1 0 0 0 0 > M END > $$$$ > $$$$ ] n times > > OpenBabel12221015072D > > 25 24 0 0 0 0 0 0 0 0999 V2000 > 7.7424 -9.2926 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > ................................................................. > ................................................................. > 6.7196 -7.5210 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 > 1 2 1 0 0 0 0 > ................... > ................... > 24 25 1 0 0 0 0 > M END > $$$$ > ===================================================== > Output: >> Tanimoto from first mol = 0.869565 > Possible superstructure of first mol > > ===================================================== > The command I am using is (in linux) -> babel patternmol.sdf othermols.sdf -ofpt > Both in Windows GUI and linux I receive same error message. It reads... > > ================================================= > *** Open Babel Warning in ReadMolecule > WARNING: Problems reading a MDL file > Cannot read atom and bond count > Expected standard 6 character atom and bond count > ================================================== > > Both in linux and windows, output contains tanimoto coefficient only for first > molecule in othermols.sdf. > > Is the problem arising because of othermols.sdf format ? Molecule separators "M > END" "$$$$" ? ( Enclosed patternmol.sdf and othermols.sdf files ) > > It must be a trivial problem, but I could not figure it out all by myself. > > Any help is greatly appreciated > > Thankyou > Chakravarthy > > From: Chris Morley<c.morley@ga...> - 2011-01-24 10:34 > > When there are only a small number of molecules, fpt format will give > you what you want; you are nearly there. > > In the GUI, select sdf for the input format and ftp for the output. > Open the sdf file with the pattern molecule, then open the sdf file with > all the other molecules you want to compare to it, holding CTRL as you > leave the open dialog (which allows more than one input file). > Click Convert, and you will get a list of the Tanimoto coefficients > between the first molecule and each of the rest, using the default FP2 > fingerprint. > > On the command line (Windows or Linux): > obabel patternmol.sdf othermols.sdf -ofpt > > Chris > ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
Administrator
|
Regarding the use of $$$$ in the title, Chris, you might want to read
http://blueobelisk.shapado.com/questions/what-is-the-minimum-needed-to-correctly-identify-records-in-an-sd-file. It seems that this is not allowed by the spec... - Noel 2011/1/25 Chris Morley <[hidden email]>: > Your file othermol.sdf was a proper sdf file but the molecules all had a > title "$$$$". The use of the delimiter string as a title is not good > practice and confused OpenBabel.  I'll see if I can correct this, but if you > change the titles to Mol1, Mol2, etc. and name PatternMol, the command below > gives: > >>PatternMol >>Mol1  Tanimoto from PatternMol = 0.5 >>Mol2  Tanimoto from PatternMol = 0.575 >>Mol3  Tanimoto from PatternMol = 0.95 >>Mol4  Tanimoto from PatternMol = 0.825 >>Mol5  Tanimoto from PatternMol = 0.673913 > > which is I think what you want. The revised othermols.sdf is attached. > > Chris > > > On 25/01/2011 02:47, [hidden email] wrote: >> >> Dear Chris Morley, >> >> I tried your technique of holding ctrl to input more than one file. It >> helps, but, >> I did not get desired output. >> >> Here is my input and output. >> >> ======================================================== >> Input: >>  [ OpenBabel12221015072D >> >>  23 22  0  0  0  0  0  0  0  0999 V2000 >>    7.7424  -9.2926   0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 >>    ................................................................. >>    ................................................................. >>    2.6281  -18.1511   0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 >>   1  2  1  0  0  0  0 >>   ................... >>   ................... >>  22 23  1  0  0  0  0 >>  M  END >>  $$$$ >>  $$$$ ] n times >> >>  OpenBabel12221015072D >> >>  25 24  0  0  0  0  0  0  0  0999 V2000 >>    7.7424  -9.2926   0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 >>    ................................................................. >>    ................................................................. >>    6.7196  -7.5210   0.0000 H  0  0  0  0  0  0  0  0  0  0  0  0 >>   1  2  1  0  0  0  0 >>   ................... >>   ................... >>  24 25  1  0  0  0  0 >>  M  END >>  $$$$ >> ===================================================== >> Output: >>> >>>  Tanimoto from first mol = 0.869565 >> >> Possible superstructure of first mol >> >> ===================================================== >> The command I am using is (in linux) ->  babel  patternmol.sdf >>  othermols.sdf -ofpt >> Both in Windows GUI and linux I receive same error message. It reads... >> >> ================================================= >> *** Open Babel Warning  in ReadMolecule >>  WARNING: Problems reading a MDL file >> Cannot read atom and bond count >> Expected standard 6 character atom and bond count >> ================================================== >> >> Both in linux and windows, output contains tanimoto coefficient only for >> first >> molecule in othermols.sdf. >> >> Is the problem arising because of othermols.sdf format ? Molecule >> separators "M >> END" "$$$$" ? ( Enclosed patternmol.sdf and othermols.sdf files ) >> >> It must be a trivial problem, but I could not figure it out all by myself. >> >> Any help is greatly appreciated >> >> Thankyou >> Chakravarthy >> >> From: Chris Morley<c.morley@ga...>  - 2011-01-24 10:34 >> >> When there are only  a small number of molecules, fpt format will give >> you what you want; you are nearly there. >> >> In the GUI, select sdf for the input format and ftp for the output. >> Open the sdf file with the pattern molecule, then open the sdf file with >> all the other molecules you want to compare to it, holding CTRL as you >> leave the open dialog (which allows more than one input file). >> Click Convert, and you will get a list of the Tanimoto coefficients >> between the first molecule and each of the rest, using the default FP2 >> fingerprint. >> >> On the command line (Windows or Linux): >>   obabel  patternmol.sdf  othermols.sdf -ofpt >> >> Chris >> > > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > OpenBabel-discuss mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss > > ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
I read the discussion on Blue Obelisk immediately after posting. But in
fact the problem was that the sdf was malformed (extra lines). OB will read titles with $$$$, but the -f option, which skips molecules, doesn't like it. Chris On 25/01/2011 16:45, Noel O'Boyle wrote: > Regarding the use of $$$$ in the title, Chris, you might want to read > http://blueobelisk.shapado.com/questions/what-is-the-minimum-needed-to-correctly-identify-records-in-an-sd-file. > It seems that this is not allowed by the spec... > > - Noel > > 2011/1/25 Chris Morley<[hidden email]>: >> Your file othermol.sdf was a proper sdf file but the molecules all had a >> title "$$$$". The use of the delimiter string as a title is not good >> practice and confused OpenBabel. I'll see if I can correct this, but if you >> change the titles to Mol1, Mol2, etc. and name PatternMol, the command below >> gives: >> >>> PatternMol >>> Mol1 Tanimoto from PatternMol = 0.5 >>> Mol2 Tanimoto from PatternMol = 0.575 >>> Mol3 Tanimoto from PatternMol = 0.95 >>> Mol4 Tanimoto from PatternMol = 0.825 >>> Mol5 Tanimoto from PatternMol = 0.673913 >> which is I think what you want. The revised othermols.sdf is attached. >> >> Chris >> >> >> On 25/01/2011 02:47, [hidden email] wrote: >>> Dear Chris Morley, >>> >>> I tried your technique of holding ctrl to input more than one file. It >>> helps, but, >>> I did not get desired output. >>> >>> Here is my input and output. >>> >>> ======================================================== >>> Input: >>> [ OpenBabel12221015072D >>> >>> 23 22 0 0 0 0 0 0 0 0999 V2000 >>> 7.7424 -9.2926 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 >>> ................................................................. >>> ................................................................. >>> 2.6281 -18.1511 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 >>> 1 2 1 0 0 0 0 >>> ................... >>> ................... >>> 22 23 1 0 0 0 0 >>> M END >>> $$$$ >>> $$$$ ] n times >>> >>> OpenBabel12221015072D >>> >>> 25 24 0 0 0 0 0 0 0 0999 V2000 >>> 7.7424 -9.2926 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 >>> ................................................................. >>> ................................................................. >>> 6.7196 -7.5210 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 >>> 1 2 1 0 0 0 0 >>> ................... >>> ................... >>> 24 25 1 0 0 0 0 >>> M END >>> $$$$ >>> ===================================================== >>> Output: >>>> Tanimoto from first mol = 0.869565 >>> Possible superstructure of first mol >>> >>> ===================================================== >>> The command I am using is (in linux) -> babel patternmol.sdf >>> othermols.sdf -ofpt >>> Both in Windows GUI and linux I receive same error message. It reads... >>> >>> ================================================= >>> *** Open Babel Warning in ReadMolecule >>> WARNING: Problems reading a MDL file >>> Cannot read atom and bond count >>> Expected standard 6 character atom and bond count >>> ================================================== >>> >>> Both in linux and windows, output contains tanimoto coefficient only for >>> first >>> molecule in othermols.sdf. >>> >>> Is the problem arising because of othermols.sdf format ? Molecule >>> separators "M >>> END" "$$$$" ? ( Enclosed patternmol.sdf and othermols.sdf files ) >>> >>> It must be a trivial problem, but I could not figure it out all by myself. >>> >>> Any help is greatly appreciated >>> >>> Thankyou >>> Chakravarthy >>> >>> From: Chris Morley<c.morley@ga...> - 2011-01-24 10:34 >>> >>> When there are only a small number of molecules, fpt format will give >>> you what you want; you are nearly there. >>> >>> In the GUI, select sdf for the input format and ftp for the output. >>> Open the sdf file with the pattern molecule, then open the sdf file with >>> all the other molecules you want to compare to it, holding CTRL as you >>> leave the open dialog (which allows more than one input file). >>> Click Convert, and you will get a list of the Tanimoto coefficients >>> between the first molecule and each of the rest, using the default FP2 >>> fingerprint. >>> >>> On the command line (Windows or Linux): >>> obabel patternmol.sdf othermols.sdf -ofpt >>> >>> Chris >>> >> >> ------------------------------------------------------------------------------ >> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! >> Finally, a world-class log management solution at an even better price-free! >> Download using promo code Free_Logger_4_Dev2Dev. Offer expires >> February 28th, so secure your free ArcSight Logger TODAY! >> http://p.sf.net/sfu/arcsight-sfd2d >> _______________________________________________ >> OpenBabel-discuss mailing list >> [hidden email] >> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss >> >> > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > OpenBabel-discuss mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss > ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
In reply to this post by Noel O'Boyle
Dear All,
Thank you Chris for your help. It works well after removing $$$$. I am curious to know what goes behind the open babel program. In particular, I want to understand two things 1. How FP2 is generated and what each digit in FP2 stands for. 2. How two FP2's are compared and Tanimoto score is calculated. I found some references for Tanimoto scoring. But, I couldn't find references for FP2 (except for a brief description in open babel docs - "http://openbabel.org/docs/dev/Fingerprints/fingerprints.html#fingerprint-format-details") I would be great full if anyone could provide me references for understanding fingerprints (especially FP2, but also others) Thanks in Advance Chakravarthy ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
Hi,
FP2 is a Daylight type fingerprint. See 6.1.2 of http://www.daylight.com/dayhtml/doc/theory/theory.finger.html for an explanation. FP2 generates fragments up to 7 bonds length or shorter if a ring is encountered. The fragments are then hashed to the first 1021 bits of the 1024 bit space FP2 uses by default. So every bit corresponds to the presence of one or more (if hash collision occurs) fragments, but you cannot map a specific bit back to a specific fragment. If you need that, a substructure pattern based fingerprint like FP3 is the obvious choice. See 6.3 of http://www.daylight.com/dayhtml/doc/theory/theory.finger.html for an explanation how similarity can be calculated from fingerprints. Best regards, Ergo ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
Administrator
|
Just a small comment...
Since Open Babel 2.3.0, it is possible to describe each of the set bits using the "s" output option, e.g. C:\Users\Noel>obabel -:"CCC(=O)Cl" -ofpt -xs > 0 6 1 6 <670> 0 6 1 6 1 6 <260> 0 8 2 6 <623> 0 8 2 6 1 6 <329> 0 8 2 6 1 6 1 6 <652> 0 17 <17> 0 17 1 6 <328> 0 17 1 6 1 6 <219> 0 17 1 6 1 6 1 6 <1009> 0 17 1 6 2 8 <329> 1 molecule converted e.g. the first bit described, at 670, is a linear fragment (0) consisting of a carbon (6) connected by a single bond (1) to another carbon (6). I should really write this up for the docs... - Noel On 26 January 2011 07:22, Ernst-Georg Schmid <[hidden email]> wrote: > Hi, > > FP2 is a Daylight type fingerprint. See 6.1.2 of http://www.daylight.com/dayhtml/doc/theory/theory.finger.html for an explanation. > > FP2 generates fragments up to 7 bonds length or shorter if a ring is encountered. The fragments are then hashed to the first 1021 bits of the 1024 bit space FP2 uses by default. So every bit corresponds to the presence of one or more (if hash collision occurs) fragments, but you cannot map a specific bit back to a specific fragment. If you need that, a substructure pattern based fingerprint like FP3 is the obvious choice. > > See 6.3 of http://www.daylight.com/dayhtml/doc/theory/theory.finger.html for an explanation how similarity can be calculated from fingerprints. > > Best regards, > > Ergo > > > ------------------------------------------------------------------------------ > Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a world-class log management solution at an even better price-free! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsight-sfd2d > _______________________________________________ > OpenBabel-discuss mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss > ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
>Just a small comment...
>Since Open Babel 2.3.0, it is possible to describe each of the set >bits using the "s" output option, e.g. Ah, nice feature. Then one can build a dictionary of fragment<->bit mappings for a given input set along FP2 generation, including detection of ambiguous mappings. Still, with FP3 the pattern<->bit mapping is ex ante valid for any possible input set, if the substructure patterns are unambiguous and the substructure detection algorithm works perfectly. Best regards, Ergo ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
In reply to this post by chakravarthy
Dear All,
I am facing a problem in tanimoto scoring while comparing two molecules and would like to seek your advice on this: I noticed that molecules such as Myristic acid and Palmitic acid have same similarity score of 1, when queried with Lauric acid. Open babel considers lauric acid as substructure of Myristic and Palmitic acid. I would like to differentiate between substructures by giving different score. I tried other fingerprints but they all return with same core. I am thinking of modifying Tanimoto score to other coefficient's like Kulczynski index or Russel index. Eventhough I am not an expert at programming, I can understand C, C++, Perl and Python languages. I would be grateful if any one can direct me to file that contains code for calculating tanimoto coefficient. (I am using openbabel 2.2.3 in Ubuntu OS) Please feel free to suggest better ways to score substructures. Thanks in Advance Chak ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
On Jan 31, 2011, at 8:15 PM, [hidden email] wrote:
> I noticed that molecules such as Myristic acid and Palmitic acid have same > similarity score of 1, ... > I am thinking of modifying Tanimoto score to other coefficient's like Kulczynski > index or Russel index. The only way to get a Tanimoto score of 1 is if the two fingerprints are identical. In that case there is no scoring method can tell the difference between the two because they are identical. To get what you want you'll need to come up with a new fingerprinting scheme, not a new scoring method. Andrew [hidden email] ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
On 01/02/2011 07:12, Andrew Dalke wrote:
> On Jan 31, 2011, at 8:15 PM, [hidden email] wrote: >> I noticed that molecules such as Myristic acid and Palmitic acid have same >> similarity score of 1, > ... >> I am thinking of modifying Tanimoto score to other coefficient's like Kulczynski >> index or Russel index. > The only way to get a Tanimoto score of 1 is if the two fingerprints are identical. In that case there is no scoring method can tell the difference between the two because they are identical. > > To get what you want you'll need to come up with a new fingerprinting scheme, not a new scoring method. None of OpenBabel's fingerprints provide a complete description of a molecule. They are really intended as part of a fast screening method to exclude molecules that, compared with a target molecule, are too dissimilar (or too similar) or which are not a superstructure of it. None of the current fingerprint types include stereochemistry and the FP2 fingerprint has a built-in lack of certainty because different fragments can be assigned to the the same bit. It also indexes by the presence or absence of fragments of up to 7 atoms, so does not discriminate well for long chains of carbon atoms, like fatty acids or normal hydrocarbons. It is possible make specialized FP3 fingerprints to handle this type of structure, by including the number of times a substructure occurs. Further description is in the code, although recompilation is not necessary to make a new fingerprint type. However I guess this is probably further than you want to go. Chris ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
On Feb 1, 2011, at 4:49 AM, Chris Morley wrote: > On 01/02/2011 07:12, Andrew Dalke wrote: >> On Jan 31, 2011, at 8:15 PM, [hidden email] wrote: >>> I noticed that molecules such as Myristic acid and Palmitic acid >>> have same >>> similarity score of 1, >> ... >>> I am thinking of modifying Tanimoto score to other coefficient's >>> like Kulczynski >>> index or Russel index. >> The only way to get a Tanimoto score of 1 is if the two >> fingerprints are identical. In that case there is no scoring method >> can tell the difference between the two because they are identical. >> >> To get what you want you'll need to come up with a new >> fingerprinting scheme, not a new scoring method. > None of OpenBabel's fingerprints provide a complete description of a > molecule. They are really intended as part of a fast screening > method to > exclude molecules that, compared with a target molecule, are too > dissimilar (or too similar) or which are not a superstructure of it. > None of the current fingerprint types include stereochemistry and the > FP2 fingerprint has a built-in lack of certainty because different > fragments can be assigned to the the same bit. It also indexes by the > presence or absence of fragments of up to 7 atoms, so does not > discriminate well for long chains of carbon atoms, like fatty acids or > normal hydrocarbons. It is possible make specialized FP3 > fingerprints to > handle this type of structure, by including the number of times a > substructure occurs. Further description is in the code, although > recompilation is not necessary to make a new fingerprint type. > However I > guess this is probably further than you want to go. > It looks like that is where I want to go. It seems that the FP2 isn't going to be good enough and I will probably end up having to customize at least to some extent. If we are going to get into customization, we may as well also look at tuning the fingerprints so that similar structures result in somewhat similar activity. To that end we are putting together a testing dataset from our open compounds. It looks like it will be about 9000 compounds chosen by: 1) been tested at least twice in the NCI-60 dose response assay. This should mean that the NCI-60 correlations should be reasonably well determined. 2) the 2D structure exists and is consistent with the molecular formula stored independently in our database. This consistency is checked via CDK, so it means that at least CDK is able to assign atoms types well enough to get to the correct molecular formula. We will calculate all the NCI-60 pairwise correlations and will post these and the structures. Should be done in a week or so. We will be looking to find a set of fingerprints that 1) never (or as close to never as we can get) return a value of 1.0 for different structures. 2) has a well behaved (or maybe just well documented) relation between structure similarity and NCI-60 correlation. I'm not sure what we will get here, but I would like to be able to say something like a similarity score of >0.9 gives a 80% chance of a NCI-60 correlation of >0.6. I'm thinking we might also put together a dataset from the compounds tested in the onedose assay. That set would make it possible to look at relation of structure similarity to chances a compound will pass the onedose criteria. DanZ /******************************************** * Daniel Zaharevitz * Chief, Information Technology Branch * Developmental Therapeutics Program * National Cancer Institute * [hidden email] * ********************************************/ ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
On 02/01/2011 05:37 AM, Daniel Zaharevitz wrote:
> We will be > looking to find a set of fingerprints that > 1) never (or as close to never as we can get) return a value of 1.0 > for different structures. I'm not sure that's it's implemented in OpenBabel, but if it's a 2D structural descriptor you want, you could give LINGO (Vidal, Thormann, and Pons, JCIM 2005; DOI: 10.1021/ci0496797) a shot. I've written a LINGO implementation that's primarily targeted at GPUs but has a reasonably fast CPU version (https://simtk.org/home/siml). The CPU code is not as quick as the fastest DFA-based methods, but it'll handle your 9000^2 similarities in a matter of seconds. (PS, it's BSD-licensed, in case anyone would like to integrate it into OB!) > 2) has a well behaved (or maybe just well documented) relation between > structure similarity and NCI-60 correlation. I'm not sure what we will > get here, but I would like to be able to say something like a > similarity score of>0.9 gives a 80% chance of a NCI-60 correlation of > >0.6. The VTP paper above as well as a later one (DOI: 10.1021/ci6002152) show decent correlation with activity; in my experience, similar to any other given 2D similarity measure. I'm sure you know this, but "tuning" fingerprints to any given small dataset is a dangerous art; it's very easy to overfit. Cheers, Imran ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
On Feb 1, 2011, at 12:34 PM, Imran Haque wrote: > On 02/01/2011 05:37 AM, Daniel Zaharevitz wrote: >> We will be >> looking to find a set of fingerprints that >> 1) never (or as close to never as we can get) return a value of 1.0 >> for different structures. > > I'm not sure that's it's implemented in OpenBabel, but if it's a 2D > structural descriptor you want, you could give LINGO (Vidal, Thormann, > and Pons, JCIM 2005; DOI: 10.1021/ci0496797) a shot. I've written a > LINGO implementation that's primarily targeted at GPUs but has a > reasonably fast CPU version (https://simtk.org/home/siml). The CPU > code > is not as quick as the fastest DFA-based methods, but it'll handle > your > 9000^2 similarities in a matter of seconds. > > (PS, it's BSD-licensed, in case anyone would like to integrate it > into OB!) > >> 2) has a well behaved (or maybe just well documented) relation >> between >> structure similarity and NCI-60 correlation. I'm not sure what we >> will >> get here, but I would like to be able to say something like a >> similarity score of>0.9 gives a 80% chance of a NCI-60 correlation of >>> 0.6. > > The VTP paper above as well as a later one (DOI: 10.1021/ci6002152) > show > decent correlation with activity; in my experience, similar to any > other > given 2D similarity measure. I'm sure you know this, but "tuning" > fingerprints to any given small dataset is a dangerous art; it's very > easy to overfit. > Thanks for the interesting and useful pointers. I hope we can give it a try when I get the data together. DanZ /******************************************** * Daniel Zaharevitz * Chief, Information Technology Branch * Developmental Therapeutics Program * National Cancer Institute * [hidden email] * ********************************************/ ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
In reply to this post by Chris Morley-3
On 01/02/2011 09:49, Chris Morley wrote:
On 01/02/2011 07:12, Andrew Dalke wrote: On Jan 31, 2011, at 8:15 PM, [hidden email] wrote: I noticed that molecules such as Myristic acid and Palmitic acid have same similarity score of 1, ... I am thinking of modifying Tanimoto score to other coefficient's like Kulczynski index or Russel index. The only way to get a Tanimoto score of 1 is if the two fingerprints are identical. In that case there is no scoring method can tell the difference between the two because they are identical. To get what you want you'll need to come up with a new fingerprinting scheme, not a new scoring method. None of OpenBabel's fingerprints provide a complete description of a molecule. They are really intended as part of a fast screening method to exclude molecules that, compared with a target molecule, are too dissimilar (or too similar) or which are not a superstructure of it. None of the current fingerprint types include stereochemistry and the FP2 fingerprint has a built-in lack of certainty because different fragments can be assigned to the the same bit. It also indexes by the presence or absence of fragments of up to 7 atoms, so does not discriminate well for long chains of carbon atoms, like fatty acids or normal hydrocarbons. It is possible make specialized FP3 fingerprints to handle this type of structure, by including the number of times a substructure occurs. Further description is in the code, although recompilation is not necessary to make a new fingerprint type. However I guess this is probably further than you want to go. Thank you Chris, Andrew, Noel for your inputs. My data set contains lipid/fattyacids (in smiles format). At times, each molecule is different from other in a single oxygen atom "O" or single carbon atom "C" or single double bond "=". I tried patching-up fingerprint similarity by adding weight to number of carbon atoms, number of double-bonds etc, this turns out to be a dirty job as this can easily overfit or misfit. I would like to see a permanent solution, that is applicable for any given set of smiles. Can anyone point out existing 1D (input as smiles string) similarity methods that can differentiate single atom/bond differences ? Thanks Chak ------------------------------------------------------------------------------ The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
This post has NOT been accepted by the mailing list yet.
In reply to this post by chakravarthy
thank you for all replies i was in search for a better answer for this question. i like the forum am a travel agent in kerala houseboats. but like to work in mncs, so am studying all these things.
|
Free forum by Nabble | Edit this page |