Hello,
I was converting SDF molecules into canonical SMILES, but in my case, I would like to keep the correspondence between the (order of appearance) index of the SDF file atoms and the (order of appearance) index of the atoms in the newly created SMILES string. Would you say this is possible without first making the conversion and then doing graph isomorphism? Since I am doing large numbers of conversions, efficiency is of great importance and this proposition is not efficient at all, it would seem. I know this is probably very simple, but I have not gone too much into detail of the inner workings of OpenBabel, so it's difficult for me to solve currently. I appreciate any advice anyone here may offer. Regards, Leonid Chepelev ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
Canonical SMILES has the atoms in the same order whatever the input
order (which is the reason it is used). Because the atoms can be in any order in the SDF, ordinary (non-canonical) SMILES with the same order might be have to be very strange. For formaldehyde HCHO if the atoms in the SDF were in the order HOHC, the best SMILES I can find is: [H]1.O=2.[H]3.C123 but there may be something better. Anyway, OpenBabel does not have the option to produce forms like this. However, in the reverse direction, I think the atom order in an output SDF will be the same as an input SMILES. So to get SMILES and SDF identically ordered you could first convert to SMILES and then convert back to SDF. Chris Leonid Chepelev wrote: > Hello, > > I was converting SDF molecules into canonical SMILES, but in my case, > I would like to keep the correspondence between the (order of > appearance) index of the SDF file atoms and the (order of appearance) > index of the atoms in the newly created SMILES string. > > Would you say this is possible without first making the conversion and > then doing graph isomorphism? Since I am doing large numbers of > conversions, efficiency is of great importance and this proposition is > not efficient at all, it would seem. > > I know this is probably very simple, but I have not gone too much into > detail of the inner workings of OpenBabel, so it's difficult for me to > solve currently. I appreciate any advice anyone here may offer. > ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
In reply to this post by Leonid Chepelev
> appearance) index of the SDF file atoms and the (order of appearance)
> index of the atoms in the newly created SMILES string. This would no longer be a canonical SMILES -- it would be a "regular" SMILES, where there may be several SMILES strings with different ordering. There are ways to create a canonical SMILES and then re-order the SDF appropriately. If you're curious, I can write up code which would do that. (I don't think it's an option for SDF output yet.) Hope that helps, -Geoff ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
Thank you Chris, though I must say that canonical smiles definitely
re-orders the atom to create an algorithm-unique ordering(except for a couple of buggy cases with tautomers and aromatic systems), though I believe that in some "regular" SMILES, the ordering is often the same as input SDF. Geoffrey, thank you for your answer. Yes, I would really like that re-ordering of the SDF atoms - I believe an option like that already exists for non-canonical smiles: to output the atom coordinates in the order they appear in canonical form. So, I was wondering, does the following command already do the trick, and if so, what's the point of .can output? babel -isdf yournoncanonicalsdf.sdf -osmi yoursmilesfile.smi -xcx The -xc part should output canonical smiles, and -xx gives me the X-coordinate of the atoms in the order they appear in the canonical SMILES string. Is this usage safe/correct? Is the canonicalization algorithm in -osmi -xc the same as in -ocan ? Because if that's the case, no extra work needs to be done. Thank you all so much for your attention and help. Leonid Chepelev On Sat, Feb 20, 2010 at 9:14 PM, Geoffrey Hutchison <[hidden email]> wrote: >> appearance) index of the SDF file atoms and the (order of appearance) >> index of the atoms in the newly created SMILES string. > > This would no longer be a canonical SMILES -- it would be a "regular" SMILES, where there may be several SMILES strings with different ordering. > > There are ways to create a canonical SMILES and then re-order the SDF appropriately. If you're curious, I can write up code which would do that. (I don't think it's an option for SDF output yet.) > > Hope that helps, > -Geoff ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
> babel -isdf yournoncanonicalsdf.sdf -osmi yoursmilesfile.smi -xcx
> > The -xc part should output canonical smiles, and -xx gives me the > X-coordinate of the atoms in the order they appear in the canonical > SMILES string. Is this usage safe/correct? Is the canonicalization > algorithm in -osmi -xc the same as in -ocan ? Because if that's the > case, no extra work needs to be done. If all you want is the XYZ coordinates, then you can certainly use that method. There is no difference between using "-xc" to indicate that you want canonical SMILES and using the so-called "can" format. There's just more than one way to do it. Hope that helps, -Geoff ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
Well, as I said, I really only want to preserve the mapping of the
order of the atoms as they appear in SDF and atoms as they appear in the canonical smiles string. It would seem that using the output options I specified, as a post-conversion step I could simply look for the order of the atoms having the indicated combination of the reported X and Y coordinates. For some reason, the -xx option does not output the Z coordinate even though the coordinate is non-zero in my input. Would it be easy for someone to change that, so that Z-coordinate is printed? Because if my original molecule is planar (e.g. benzene) and is aligned in the xz plane, the X and Y coordinates may not identify the benzene ring atoms uniquely, and the process I've just outlined won't work. Of course, it would beeven better if someone actually added an option in -x such that the actual original atom positions were reported instead of the X and Y coordinates. Thank you so much, Geoff! On Sun, Feb 21, 2010 at 7:37 AM, Geoffrey Hutchison <[hidden email]> wrote: >> babel -isdf yournoncanonicalsdf.sdf -osmi yoursmilesfile.smi -xcx >> >> The -xc part should output canonical smiles, and -xx gives me the >> X-coordinate of the atoms in the order they appear in the canonical >> SMILES string. Is this usage safe/correct? Is the canonicalization >> algorithm in -osmi -xc the same as in -ocan ? Because if that's the >> case, no extra work needs to be done. > > If all you want is the XYZ coordinates, then you can certainly use that method. There is no difference between using "-xc" to indicate that you want canonical SMILES and using the so-called "can" format. There's just more than one way to do it. > > Hope that helps, > -Geoff ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
In reply to this post by Leonid Chepelev
Leonid,
> I was converting SDF molecules into canonical SMILES, but in my case, > I would like to keep the correspondence between the (order of > appearance) index of the SDF file atoms and the (order of appearance) > index of the atoms in the newly created SMILES string. > > Would you say this is possible without first making the conversion and > then doing graph isomorphism? Since I am doing large numbers of > conversions, efficiency is of great importance and this proposition is > not efficient at all, it would seem. > > I know this is probably very simple, but I have not gone too much into > detail of the inner workings of OpenBabel, so it's difficult for me to > solve currently. I appreciate any advice anyone here may offer. As Chris said, it is not practical to write SMILES with the atoms in a specific order. I suggest you use the more "traditional" way. You write out the canonical SMILES, and you also write out an atom-mapping string that correlates the canonical order to the original order. For example: CCO ==> OCC 2,1,0 c1c(O)cccc1 ==> Oc1ccccc1 2,1,0,6,5,4,3 The canonical atom order is already stored as a sting. I haven't compiled this example, but it shows the idea: if (mol.HasData("Canonical Atom Order")) { vector<string> vs; string canorder = mol.GetData("Canonical Atom Order")->GetValue(); ofs << " " << canorder << endl; } } Once you have this string, you can use it to build a simple array that maps the canonical order back to the original order, or vice versa. Craig ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
In reply to this post by Leonid Chepelev
Leonid Chepelev wrote:
> For some reason, the -xx option does not output the Z coordinate even > though the coordinate is non-zero in my input. Probably because I was in a hurry when I wrote it... > Would it be easy for > someone to change that, so that Z-coordinate is printed? If you're in a hurry, just modify line 3958 of smilesformat.cpp: ofs << atom->GetX() << "," << atom->GetY(); to ofs << atom->GetX() << "," << atom->GetY() << "," << atom->GetZ(); But don't check that back in. It really needs to be a different option, like "z" instead of "x". Craig ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
In reply to this post by Craig James-2
Thank you for both of your suggestions, Craig, both are informative
and will do the job. I'm in a hurry to get a result quickly now, so I think I'll go with your suggestion to modify the appropriate line to make babel print out the Z coordinate for now, and will work on your other suggestion when I will be generating a clean and efficient version of my code (soon). But in any case, all of this solves my problem really well. Thank you very much for your help! On Sun, Feb 21, 2010 at 10:19 AM, Craig A. James <[hidden email]> wrote: > Leonid, > >> I was converting SDF molecules into canonical SMILES, but in my case, >> I would like to keep the correspondence between the (order of >> appearance) index of the SDF file atoms and the (order of appearance) >> index of the atoms in the newly created SMILES string. >> >> Would you say this is possible without first making the conversion and >> then doing graph isomorphism? Since I am doing large numbers of >> conversions, efficiency is of great importance and this proposition is >> not efficient at all, it would seem. >> >> I know this is probably very simple, but I have not gone too much into >> detail of the inner workings of OpenBabel, so it's difficult for me to >> solve currently. I appreciate any advice anyone here may offer. > > As Chris said, it is not practical to write SMILES with the atoms in a > specific order. > > I suggest you use the more "traditional" way. You write out the canonical > SMILES, and you also write out an atom-mapping string that correlates the > canonical order to the original order. For example: > > CCO ==> OCC 2,1,0 > > c1c(O)cccc1 ==> Oc1ccccc1 2,1,0,6,5,4,3 > > The canonical atom order is already stored as a sting. I haven't compiled > this example, but it shows the idea: > > if (mol.HasData("Canonical Atom Order")) { > vector<string> vs; > string canorder = mol.GetData("Canonical Atom Order")->GetValue(); > ofs << " " << canorder << endl; > } > } > > Once you have this string, you can use it to build a simple array that maps > the canonical order back to the original order, or vice versa. > > Craig > ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
In reply to this post by Craig James-2
> As Chris said, it is not practical to write SMILES with the atoms in a
> specific order. > > I suggest you use the more "traditional" way. You write out the canonical > SMILES, and you also write out an atom-mapping string that correlates the > canonical order to the original order. Oh, and I forgot to add, just to clarify, that I never wanted to write SMILES with atoms in a specific order. All I wanted was that atom-mapping string, that is, to know that atom number x in my input sdf is atom number y in my output canonical smiles. I am sorry that I wasn't clear on my problem, it certainly led to a little bit of misunderstanding. Again, thank you to everyone who has answered! ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ OpenBabel-discuss mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/openbabel-discuss |
Free forum by Nabble | Edit this page |