Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Craig James-2
This looks bad:

   echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
   CC=CC

Notice the cis/trans bonds are lost.  In OB 2.2.x, it works correctly:

   echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
   C/C=C/C

The problem seems to be here in 2.3.x:

   echo "C/C=C/C" | babel -i smi -o sdf
   
    OpenBabel05111215342D

     4  3  0  0  0  0  0  0  0  0999 V2000
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
     1  2  1  0  0  0  0
     2  3  2  0  0  0  0
     3  4  1  0  0  0  0
   M  END
   $$$$

Notice that the bond block has no stereo (cis/trans) markings.  Do the same thing in 2.2.x and the cis/trans bonds are properly marked:

    echo "C/C=C/C" | babel -i smi -o sdf
   
     OpenBabel05111215352D
   
      4  3  0  0  0  0  0  0  0  0999 V2000
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
      1  2  1  1  0  0
      2  3  2  3  0  0
      3  4  1  6  0  0
    M  END
   
    $$$$

The bond block is correct here in this output from 2.2.x.

Any ideas when this might have happened and if it was intentional?

Thanks,
Craig


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Noel O'Boyle
Administrator
It's intentional, rather than a bug. I originally had some code in there to support stereo in 0D SDF, but the format really doesn't support this officially - it's supposed to be either 2D or 3D. It's all very well for cis/trans, but it's not possible to store tet stereo without coordinates (which aren't present in 0D) or tet parities (which the spec explicitly says to ignore on reading).

In short, we could support this, but Open Babel would be the only software to do so, and these 0D SDF files would not be handled correctly by others...

In short, if you use --gen2d or --gen3d it will work fine.

- Noel

On 11 May 2012 23:48, Craig James <[hidden email]> wrote:
This looks bad:

   echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
   CC=CC

Notice the cis/trans bonds are lost.  In OB 2.2.x, it works correctly:

   echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
   C/C=C/C

The problem seems to be here in 2.3.x:

   echo "C/C=C/C" | babel -i smi -o sdf
   
    OpenBabel05111215342D

     4  3  0  0  0  0  0  0  0  0999 V2000
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
     1  2  1  0  0  0  0
     2  3  2  0  0  0  0
     3  4  1  0  0  0  0
   M  END
   $$$$

Notice that the bond block has no stereo (cis/trans) markings.  Do the same thing in 2.2.x and the cis/trans bonds are properly marked:

    echo "C/C=C/C" | babel -i smi -o sdf
   
     OpenBabel05111215352D
   
      4  3  0  0  0  0  0  0  0  0999 V2000
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
      1  2  1  1  0  0
      2  3  2  3  0  0
      3  4  1  6  0  0
    M  END
   
    $$$$

The bond block is correct here in this output from 2.2.x.

Any ideas when this might have happened and if it was intentional?

Thanks,
Craig


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Noel O'Boyle
Administrator
Sorry - I got things backward. It's storing the cis/trans stereochemistry in a 0D format that's the problem. See the post and comments at http://baoilleach.blogspot.com/2010/02/how-to-store-stereochemistry-in-mol.html

- Noel

On 12 May 2012 13:52, Noel O'Boyle <[hidden email]> wrote:
It's intentional, rather than a bug. I originally had some code in there to support stereo in 0D SDF, but the format really doesn't support this officially - it's supposed to be either 2D or 3D. It's all very well for cis/trans, but it's not possible to store tet stereo without coordinates (which aren't present in 0D) or tet parities (which the spec explicitly says to ignore on reading).

In short, we could support this, but Open Babel would be the only software to do so, and these 0D SDF files would not be handled correctly by others...

In short, if you use --gen2d or --gen3d it will work fine.

- Noel

On 11 May 2012 23:48, Craig James <[hidden email]> wrote:
This looks bad:

   echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
   CC=CC

Notice the cis/trans bonds are lost.  In OB 2.2.x, it works correctly:

   echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
   C/C=C/C

The problem seems to be here in 2.3.x:

   echo "C/C=C/C" | babel -i smi -o sdf
   
    OpenBabel05111215342D

     4  3  0  0  0  0  0  0  0  0999 V2000
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
     1  2  1  0  0  0  0
     2  3  2  0  0  0  0
     3  4  1  0  0  0  0
   M  END
   $$$$

Notice that the bond block has no stereo (cis/trans) markings.  Do the same thing in 2.2.x and the cis/trans bonds are properly marked:

    echo "C/C=C/C" | babel -i smi -o sdf
   
     OpenBabel05111215352D
   
      4  3  0  0  0  0  0  0  0  0999 V2000
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
      1  2  1  1  0  0
      2  3  2  3  0  0
      3  4  1  6  0  0
    M  END
   
    $$$$

The bond block is correct here in this output from 2.2.x.

Any ideas when this might have happened and if it was intentional?

Thanks,
Craig


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel




------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Craig James-2
Hi Noel,

Thanks for the pointer to your blog post ... it explains the issue well.  I'll address the topic here, but let me know if it would be better to post on your blog for completeness or on the OB list for wider distribution.

My overall answer to this whole question is that it's always a mistake to lose information -- particularly in a toolkit like OpenBabel.  The primary raison d'etre of OpenBabel is to communicate between different file formats with the greatest fidelity possible.  With this change, we have a situation where a round-trip between two formats loses critical molecular information where previously it didn't.

I see it as more of a pragmatic question than anything else.  There is a way to keep the information, so why not do it?

The origin of this problem is the age-old complaint that the SD File Format has both ambiguity and redundancy.  Each developer interprets the spec differently and chaos results. My philosophy has always been to err on the side of too much information rather than just enough or too little. When a stereo center is present, mark it every way possible.  When a cis/trans bond is present, use both the 2D coordinates and the bond labels.

From your blog:
> My current understanding is that where 3D coordinates are present, there's no need
> to store stereochemical information in either the atom parity or the bond block. I think
> I'll probably set the atom parity anyway (since I've already written the code, and it
> helps when you look at the file to be able to easily identify the chiral centers).

There are three reasons why you should store stereo information everywhere.

First, because there's no reason not to (what's the harm?).

Second, it's often used to designate partially-known stereochemistry.  It's common for a molecule to have both known and unknown stereo centers.  SMILES handles this because each stereo center is specified independently.  People often will generate 3D coordinates for a molecule even though they don't know each stereo center -- they just arbitrarily pick a configuration for the unknown centers.  By marking some centers' parity bits or up/down bonds and leaving others out, you can make it clear that the stereochemistry is partially known.  (It would be nice if this were written into the CTFile specification.)

And third, there are applications out there that rely on the atom parity and bond blocks to specify chirality.  It's a bit of work to do the geometry to deduce stereochemistry from 3D coordinates, so many apps just count on the atom-parity bit or bond block.  My recollection is that Daylight's SDF-to-SMILES conversion programs used the atom parity and bond up/down flags if they could, and only used the 3D geometry as a last resort.

> For 2D coordinates, there's no need to store the bond stereochemistry (as this can
> be worked out from the coordinates), but chirality needs to be stored explicitly. The
> normal way to store this is not using atom parity (but I'll set this anyway for the same
> reasons as above), but by setting one of the bonds on the tetrahedral center to up or down.

This is true in theory but useless in practice.  The first argument above ("what's the harm?") applies here too.  But more importantly, most molecule editors and 2D generators (including OpenBabel!) will use 120-degree bonds on every double bond they draw or lay out.  And in almost all cases, by default they draw the trans configuration.  In real life, often time a chemist will draw a double bond in the trans configuration without actually knowing (or caring) whether it's cis or trans.

And like the 3D information, it's often the case that one double-bond's configuration is known while another's is not.  If you assume that you can derive the cis/trans configuration from the 2D coordinates, then there's no way to represent the information in "CC=CC/C=CC/".  On the other hand, by using the up/down bond flags, you can represent this molecule correctly.

> For 0D coordinates, there are no guidelines. I propose to store cis/trans stereo
> using the bond stereo (you know, UP [or DOWN] at both ends of a double bond
> means cis),

But right now OpenBabel isn't even doing this.  It's just discarding the cis/trans information.

> and chirality using the atom parity. The MDL spec states that atom
> parity should be ignored when read, but the alternative is to just forget the
> stereochemistry, or else to store both cis/trans stereo *and* chirality in the bond
> block, which may just about be possible but is likely to be a real mess.

Here again, I'd argue for putting the information everywhere possible for reasons of portability. The CTFile spec, combined with various heroic attempts to work around its shortcomings, means that for every possible choice of how to write the chirality there's at least one app that does it that way.  If OpenBabel can write correct SD Files that put redundant but consistent chiral specifications (i.e. use 3D, atom parity and bond flags), then why not?

Here's a more pragmatic argument.  In OB 2.3.1, they only way to get a correct round-trip SMILES-SDF-SMILES generation is to use --gen2D.  That requires a very expensive and unnecessary ab initio calculation of 2D coordinates.  For many real molecules, generating 2D coordinates can be 10x or 100x slower than merely parsing the molecule ... and it was completely unnecessary in OB 2.2.x.

And more to the point, this is a showstopper for us.  In our experience, most pharmaceutical researchers use SMILES for molecular modeling, diversity analysis, toxicology analysis and so forth. Once they decide what to buy, they may send us the SMILES, or may send us SD Files. These files can range from a few compounds to hundreds of thousands of compounds.  It would be a disaster if the cis/trans information was lost at the end of this time-consuming analysis just because they (or we) converted their SMILES to SDF format using OpenBabel before buying the compounds.

Since I know about this problem, eMolecules can exercise diligence and never do a SMILES-to-SDF conversion.  But customers might not be aware of this restriction -- they use OpenBabel because it is known to be good at file-format conversion.  It would be really unpleasant for us to have to explain to a customer that they'd ordered hundreds of incorrect compounds because OpenBabel doesn't handle cis/trans the way you'd expect.

Thanks,
Craig


On Sat, May 12, 2012 at 6:37 AM, Noel O'Boyle <[hidden email]> wrote:
Sorry - I got things backward. It's storing the cis/trans stereochemistry in a 0D format that's the problem. See the post and comments at http://baoilleach.blogspot.com/2010/02/how-to-store-stereochemistry-in-mol.html

- Noel

On 12 May 2012 13:52, Noel O'Boyle <[hidden email]> wrote:
It's intentional, rather than a bug. I originally had some code in there to support stereo in 0D SDF, but the format really doesn't support this officially - it's supposed to be either 2D or 3D. It's all very well for cis/trans, but it's not possible to store tet stereo without coordinates (which aren't present in 0D) or tet parities (which the spec explicitly says to ignore on reading).

In short, we could support this, but Open Babel would be the only software to do so, and these 0D SDF files would not be handled correctly by others...

In short, if you use --gen2d or --gen3d it will work fine.

- Noel

On 11 May 2012 23:48, Craig James <[hidden email]> wrote:
This looks bad:

   echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
   CC=CC

Notice the cis/trans bonds are lost.  In OB 2.2.x, it works correctly:

   echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
   C/C=C/C

The problem seems to be here in 2.3.x:

   echo "C/C=C/C" | babel -i smi -o sdf
   
    OpenBabel05111215342D

     4  3  0  0  0  0  0  0  0  0999 V2000
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
     1  2  1  0  0  0  0
     2  3  2  0  0  0  0
     3  4  1  0  0  0  0
   M  END
   $$$$

Notice that the bond block has no stereo (cis/trans) markings.  Do the same thing in 2.2.x and the cis/trans bonds are properly marked:

    echo "C/C=C/C" | babel -i smi -o sdf
   
     OpenBabel05111215352D
   
      4  3  0  0  0  0  0  0  0  0999 V2000
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
      1  2  1  1  0  0
      2  3  2  3  0  0
      3  4  1  6  0  0
    M  END
   
    $$$$

The bond block is correct here in this output from 2.2.x.

Any ideas when this might have happened and if it was intentional?

Thanks,
Craig


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel





------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Noel O'Boyle
Administrator
The issue is only regarding cis/trans stereo so I'll confine my comments to that. (We do store tet stereo twice, as well as provide an option to read the chirality flag if desired.)

You assume that there is a convention for storing cis/trans stereo in the bond block, but that OB is not following it. There is no such convention; neither for 3D, 2D or 0D. For cis/trans stereo, these bonds are only used to indicate unknown dbl bond stereo (via two separate conventions, either on the double bond itself or on an attached single bond).

Now...we can implement a way to do this, using two Up bonds (basically how SMILES are stored) for example, and that's what I had done earlier. But Open Babel will be the only software that will read this correctly as it's not even hinted at in the spec. Other software will regard the cis/trans stereo as undefined.

So...which is the lesser evil? For OB to implement a self-consistent cis/trans stereo representation for 0D SDF which is unrecognised by other software (thus causing silent loss of information), or for OB to not support storing cis/trans stereo for 0D SDF (which currently causes a warning about loss of stereo), a behaviour which is consistent with other software.

It's not a rhetorical question - I'm happy to be convinced either way (especially as the code is already written), as until now, there's been no-one interested in discussing this.

- Noel

On 14 May 2012 02:05, Craig James <[hidden email]> wrote:
Hi Noel,

Thanks for the pointer to your blog post ... it explains the issue well.  I'll address the topic here, but let me know if it would be better to post on your blog for completeness or on the OB list for wider distribution.

My overall answer to this whole question is that it's always a mistake to lose information -- particularly in a toolkit like OpenBabel.  The primary raison d'etre of OpenBabel is to communicate between different file formats with the greatest fidelity possible.  With this change, we have a situation where a round-trip between two formats loses critical molecular information where previously it didn't.

I see it as more of a pragmatic question than anything else.  There is a way to keep the information, so why not do it?

The origin of this problem is the age-old complaint that the SD File Format has both ambiguity and redundancy.  Each developer interprets the spec differently and chaos results. My philosophy has always been to err on the side of too much information rather than just enough or too little. When a stereo center is present, mark it every way possible.  When a cis/trans bond is present, use both the 2D coordinates and the bond labels.

From your blog:
> My current understanding is that where 3D coordinates are present, there's no need
> to store stereochemical information in either the atom parity or the bond block. I think
> I'll probably set the atom parity anyway (since I've already written the code, and it
> helps when you look at the file to be able to easily identify the chiral centers).

There are three reasons why you should store stereo information everywhere.

First, because there's no reason not to (what's the harm?).

Second, it's often used to designate partially-known stereochemistry.  It's common for a molecule to have both known and unknown stereo centers.  SMILES handles this because each stereo center is specified independently.  People often will generate 3D coordinates for a molecule even though they don't know each stereo center -- they just arbitrarily pick a configuration for the unknown centers.  By marking some centers' parity bits or up/down bonds and leaving others out, you can make it clear that the stereochemistry is partially known.  (It would be nice if this were written into the CTFile specification.)

And third, there are applications out there that rely on the atom parity and bond blocks to specify chirality.  It's a bit of work to do the geometry to deduce stereochemistry from 3D coordinates, so many apps just count on the atom-parity bit or bond block.  My recollection is that Daylight's SDF-to-SMILES conversion programs used the atom parity and bond up/down flags if they could, and only used the 3D geometry as a last resort.

> For 2D coordinates, there's no need to store the bond stereochemistry (as this can
> be worked out from the coordinates), but chirality needs to be stored explicitly. The
> normal way to store this is not using atom parity (but I'll set this anyway for the same
> reasons as above), but by setting one of the bonds on the tetrahedral center to up or down.

This is true in theory but useless in practice.  The first argument above ("what's the harm?") applies here too.  But more importantly, most molecule editors and 2D generators (including OpenBabel!) will use 120-degree bonds on every double bond they draw or lay out.  And in almost all cases, by default they draw the trans configuration.  In real life, often time a chemist will draw a double bond in the trans configuration without actually knowing (or caring) whether it's cis or trans.

And like the 3D information, it's often the case that one double-bond's configuration is known while another's is not.  If you assume that you can derive the cis/trans configuration from the 2D coordinates, then there's no way to represent the information in "CC=CC/C=CC/".  On the other hand, by using the up/down bond flags, you can represent this molecule correctly.

> For 0D coordinates, there are no guidelines. I propose to store cis/trans stereo
> using the bond stereo (you know, UP [or DOWN] at both ends of a double bond
> means cis),

But right now OpenBabel isn't even doing this.  It's just discarding the cis/trans information.

> and chirality using the atom parity. The MDL spec states that atom
> parity should be ignored when read, but the alternative is to just forget the
> stereochemistry, or else to store both cis/trans stereo *and* chirality in the bond
> block, which may just about be possible but is likely to be a real mess.

Here again, I'd argue for putting the information everywhere possible for reasons of portability. The CTFile spec, combined with various heroic attempts to work around its shortcomings, means that for every possible choice of how to write the chirality there's at least one app that does it that way.  If OpenBabel can write correct SD Files that put redundant but consistent chiral specifications (i.e. use 3D, atom parity and bond flags), then why not?

Here's a more pragmatic argument.  In OB 2.3.1, they only way to get a correct round-trip SMILES-SDF-SMILES generation is to use --gen2D.  That requires a very expensive and unnecessary ab initio calculation of 2D coordinates.  For many real molecules, generating 2D coordinates can be 10x or 100x slower than merely parsing the molecule ... and it was completely unnecessary in OB 2.2.x.

And more to the point, this is a showstopper for us.  In our experience, most pharmaceutical researchers use SMILES for molecular modeling, diversity analysis, toxicology analysis and so forth. Once they decide what to buy, they may send us the SMILES, or may send us SD Files. These files can range from a few compounds to hundreds of thousands of compounds.  It would be a disaster if the cis/trans information was lost at the end of this time-consuming analysis just because they (or we) converted their SMILES to SDF format using OpenBabel before buying the compounds.

Since I know about this problem, eMolecules can exercise diligence and never do a SMILES-to-SDF conversion.  But customers might not be aware of this restriction -- they use OpenBabel because it is known to be good at file-format conversion.  It would be really unpleasant for us to have to explain to a customer that they'd ordered hundreds of incorrect compounds because OpenBabel doesn't handle cis/trans the way you'd expect.

Thanks,
Craig



On Sat, May 12, 2012 at 6:37 AM, Noel O'Boyle <[hidden email]> wrote:
Sorry - I got things backward. It's storing the cis/trans stereochemistry in a 0D format that's the problem. See the post and comments at http://baoilleach.blogspot.com/2010/02/how-to-store-stereochemistry-in-mol.html

- Noel

On 12 May 2012 13:52, Noel O'Boyle <[hidden email]> wrote:
It's intentional, rather than a bug. I originally had some code in there to support stereo in 0D SDF, but the format really doesn't support this officially - it's supposed to be either 2D or 3D. It's all very well for cis/trans, but it's not possible to store tet stereo without coordinates (which aren't present in 0D) or tet parities (which the spec explicitly says to ignore on reading).

In short, we could support this, but Open Babel would be the only software to do so, and these 0D SDF files would not be handled correctly by others...

In short, if you use --gen2d or --gen3d it will work fine.

- Noel

On 11 May 2012 23:48, Craig James <[hidden email]> wrote:
This looks bad:

   echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
   CC=CC

Notice the cis/trans bonds are lost.  In OB 2.2.x, it works correctly:

   echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
   C/C=C/C

The problem seems to be here in 2.3.x:

   echo "C/C=C/C" | babel -i smi -o sdf
   
    OpenBabel05111215342D

     4  3  0  0  0  0  0  0  0  0999 V2000
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
     1  2  1  0  0  0  0
     2  3  2  0  0  0  0
     3  4  1  0  0  0  0
   M  END
   $$$$

Notice that the bond block has no stereo (cis/trans) markings.  Do the same thing in 2.2.x and the cis/trans bonds are properly marked:

    echo "C/C=C/C" | babel -i smi -o sdf
   
     OpenBabel05111215352D
   
      4  3  0  0  0  0  0  0  0  0999 V2000
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
      1  2  1  1  0  0
      2  3  2  3  0  0
      3  4  1  6  0  0
    M  END
   
    $$$$

The bond block is correct here in this output from 2.2.x.

Any ideas when this might have happened and if it was intentional?

Thanks,
Craig


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel






------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Noel O'Boyle
Administrator
See also this thread: http://www.mail-archive.com/blueobelisk-discuss@.../msg00665.html

....where Greg Landrum dissuades me from including stereo in 0D Mol file.

- Noel

On 14 May 2012 11:32, Noel O'Boyle <[hidden email]> wrote:
The issue is only regarding cis/trans stereo so I'll confine my comments to that. (We do store tet stereo twice, as well as provide an option to read the chirality flag if desired.)

You assume that there is a convention for storing cis/trans stereo in the bond block, but that OB is not following it. There is no such convention; neither for 3D, 2D or 0D. For cis/trans stereo, these bonds are only used to indicate unknown dbl bond stereo (via two separate conventions, either on the double bond itself or on an attached single bond).

Now...we can implement a way to do this, using two Up bonds (basically how SMILES are stored) for example, and that's what I had done earlier. But Open Babel will be the only software that will read this correctly as it's not even hinted at in the spec. Other software will regard the cis/trans stereo as undefined.

So...which is the lesser evil? For OB to implement a self-consistent cis/trans stereo representation for 0D SDF which is unrecognised by other software (thus causing silent loss of information), or for OB to not support storing cis/trans stereo for 0D SDF (which currently causes a warning about loss of stereo), a behaviour which is consistent with other software.

It's not a rhetorical question - I'm happy to be convinced either way (especially as the code is already written), as until now, there's been no-one interested in discussing this.

- Noel


On 14 May 2012 02:05, Craig James <[hidden email]> wrote:
Hi Noel,

Thanks for the pointer to your blog post ... it explains the issue well.  I'll address the topic here, but let me know if it would be better to post on your blog for completeness or on the OB list for wider distribution.

My overall answer to this whole question is that it's always a mistake to lose information -- particularly in a toolkit like OpenBabel.  The primary raison d'etre of OpenBabel is to communicate between different file formats with the greatest fidelity possible.  With this change, we have a situation where a round-trip between two formats loses critical molecular information where previously it didn't.

I see it as more of a pragmatic question than anything else.  There is a way to keep the information, so why not do it?

The origin of this problem is the age-old complaint that the SD File Format has both ambiguity and redundancy.  Each developer interprets the spec differently and chaos results. My philosophy has always been to err on the side of too much information rather than just enough or too little. When a stereo center is present, mark it every way possible.  When a cis/trans bond is present, use both the 2D coordinates and the bond labels.

From your blog:
> My current understanding is that where 3D coordinates are present, there's no need
> to store stereochemical information in either the atom parity or the bond block. I think
> I'll probably set the atom parity anyway (since I've already written the code, and it
> helps when you look at the file to be able to easily identify the chiral centers).

There are three reasons why you should store stereo information everywhere.

First, because there's no reason not to (what's the harm?).

Second, it's often used to designate partially-known stereochemistry.  It's common for a molecule to have both known and unknown stereo centers.  SMILES handles this because each stereo center is specified independently.  People often will generate 3D coordinates for a molecule even though they don't know each stereo center -- they just arbitrarily pick a configuration for the unknown centers.  By marking some centers' parity bits or up/down bonds and leaving others out, you can make it clear that the stereochemistry is partially known.  (It would be nice if this were written into the CTFile specification.)

And third, there are applications out there that rely on the atom parity and bond blocks to specify chirality.  It's a bit of work to do the geometry to deduce stereochemistry from 3D coordinates, so many apps just count on the atom-parity bit or bond block.  My recollection is that Daylight's SDF-to-SMILES conversion programs used the atom parity and bond up/down flags if they could, and only used the 3D geometry as a last resort.

> For 2D coordinates, there's no need to store the bond stereochemistry (as this can
> be worked out from the coordinates), but chirality needs to be stored explicitly. The
> normal way to store this is not using atom parity (but I'll set this anyway for the same
> reasons as above), but by setting one of the bonds on the tetrahedral center to up or down.

This is true in theory but useless in practice.  The first argument above ("what's the harm?") applies here too.  But more importantly, most molecule editors and 2D generators (including OpenBabel!) will use 120-degree bonds on every double bond they draw or lay out.  And in almost all cases, by default they draw the trans configuration.  In real life, often time a chemist will draw a double bond in the trans configuration without actually knowing (or caring) whether it's cis or trans.

And like the 3D information, it's often the case that one double-bond's configuration is known while another's is not.  If you assume that you can derive the cis/trans configuration from the 2D coordinates, then there's no way to represent the information in "CC=CC/C=CC/".  On the other hand, by using the up/down bond flags, you can represent this molecule correctly.

> For 0D coordinates, there are no guidelines. I propose to store cis/trans stereo
> using the bond stereo (you know, UP [or DOWN] at both ends of a double bond
> means cis),

But right now OpenBabel isn't even doing this.  It's just discarding the cis/trans information.

> and chirality using the atom parity. The MDL spec states that atom
> parity should be ignored when read, but the alternative is to just forget the
> stereochemistry, or else to store both cis/trans stereo *and* chirality in the bond
> block, which may just about be possible but is likely to be a real mess.

Here again, I'd argue for putting the information everywhere possible for reasons of portability. The CTFile spec, combined with various heroic attempts to work around its shortcomings, means that for every possible choice of how to write the chirality there's at least one app that does it that way.  If OpenBabel can write correct SD Files that put redundant but consistent chiral specifications (i.e. use 3D, atom parity and bond flags), then why not?

Here's a more pragmatic argument.  In OB 2.3.1, they only way to get a correct round-trip SMILES-SDF-SMILES generation is to use --gen2D.  That requires a very expensive and unnecessary ab initio calculation of 2D coordinates.  For many real molecules, generating 2D coordinates can be 10x or 100x slower than merely parsing the molecule ... and it was completely unnecessary in OB 2.2.x.

And more to the point, this is a showstopper for us.  In our experience, most pharmaceutical researchers use SMILES for molecular modeling, diversity analysis, toxicology analysis and so forth. Once they decide what to buy, they may send us the SMILES, or may send us SD Files. These files can range from a few compounds to hundreds of thousands of compounds.  It would be a disaster if the cis/trans information was lost at the end of this time-consuming analysis just because they (or we) converted their SMILES to SDF format using OpenBabel before buying the compounds.

Since I know about this problem, eMolecules can exercise diligence and never do a SMILES-to-SDF conversion.  But customers might not be aware of this restriction -- they use OpenBabel because it is known to be good at file-format conversion.  It would be really unpleasant for us to have to explain to a customer that they'd ordered hundreds of incorrect compounds because OpenBabel doesn't handle cis/trans the way you'd expect.

Thanks,
Craig



On Sat, May 12, 2012 at 6:37 AM, Noel O'Boyle <[hidden email]> wrote:
Sorry - I got things backward. It's storing the cis/trans stereochemistry in a 0D format that's the problem. See the post and comments at http://baoilleach.blogspot.com/2010/02/how-to-store-stereochemistry-in-mol.html

- Noel

On 12 May 2012 13:52, Noel O'Boyle <[hidden email]> wrote:
It's intentional, rather than a bug. I originally had some code in there to support stereo in 0D SDF, but the format really doesn't support this officially - it's supposed to be either 2D or 3D. It's all very well for cis/trans, but it's not possible to store tet stereo without coordinates (which aren't present in 0D) or tet parities (which the spec explicitly says to ignore on reading).

In short, we could support this, but Open Babel would be the only software to do so, and these 0D SDF files would not be handled correctly by others...

In short, if you use --gen2d or --gen3d it will work fine.

- Noel

On 11 May 2012 23:48, Craig James <[hidden email]> wrote:
This looks bad:

   echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
   CC=CC

Notice the cis/trans bonds are lost.  In OB 2.2.x, it works correctly:

   echo "C/C=C/C" | babel -i smi -o sdf | babel -i sdf -o can
   C/C=C/C

The problem seems to be here in 2.3.x:

   echo "C/C=C/C" | babel -i smi -o sdf
   
    OpenBabel05111215342D

     4  3  0  0  0  0  0  0  0  0999 V2000
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
       0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
     1  2  1  0  0  0  0
     2  3  2  0  0  0  0
     3  4  1  0  0  0  0
   M  END
   $$$$

Notice that the bond block has no stereo (cis/trans) markings.  Do the same thing in 2.2.x and the cis/trans bonds are properly marked:

    echo "C/C=C/C" | babel -i smi -o sdf
   
     OpenBabel05111215352D
   
      4  3  0  0  0  0  0  0  0  0999 V2000
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
        0.0000    0.0000    0.0000 C   0  0  0  0  0
      1  2  1  1  0  0
      2  3  2  3  0  0
      3  4  1  6  0  0
    M  END
   
    $$$$

The bond block is correct here in this output from 2.2.x.

Any ideas when this might have happened and if it was intentional?

Thanks,
Craig


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel







------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Craig James-2
On Mon, May 14, 2012 at 3:46 AM, Noel O'Boyle <[hidden email]> wrote:
See also this thread: http://www.mail-archive.com/blueobelisk-discuss@.../msg00665.html

....where Greg Landrum dissuades me from including stereo in 0D Mol file.

Here's how that conversation ended. Greg wrote:
>> So...should we retain them or not? I think what I'll do is add an
>> option to allow users to retain them exactly. However, the default
>> will be that the wedges/hashes in the output will be solely dependent
>> on the perceived stereochemistry. *Sigh* This applies to all 2D output
>> formats.
>
> Having the option to retain them exactly sounds sensible, but that
> means retaining all of the user-provided markings, right? This almost
> sounds to me like it's a read setting, not a write one. But then I'm
> not familiar with the internal flow for processing mols in OB.

It seemed to me like the opposite (maybe Greg is following this now?).  At the end of this discussion, I was left with the understanding that OpenBabel *would* keep the 0D stereo information, or at least optionally be able to keep it.

And in that same thread, Peter beat me in making all my key arguments.  In particular:

http://www.mail-archive.com/blueobelisk-discuss@.../msg00666.html

... in which Peter argues that editors tend to draw zig-zag bonds by default, so you can't rely on 2D coordinates to deduce whether an author really meant cis or trans.  And more to the point, Peter closes by saying:
>> For 0D coordinates, there are no guidelines. I propose to store
>> cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
>> ends of a double bond means cis), and chirality using the atom parity.
>> The MDL spec states that atom parity should be ignored when read,
>
> I know this is the spec and I don't want to get into more arguments about
> whether it should be changed. At this stage I think it is useful if programs
> have the capability to read and interpret this field.

Exactly.

To reiterate my earlier argument, I just can't see any reason not to include cis/trans and chiral information in 0D SD Files.  It's genuinely useful, and it doesn't hurt anything.  The fact that it stretches the CTFile spec strikes me as uninteresting.  Apps are free to ignore it.

Craig

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Noel O'Boyle
Administrator
On 16 May 2012 16:28, Craig James <[hidden email]> wrote:
On Mon, May 14, 2012 at 3:46 AM, Noel O'Boyle <[hidden email]> wrote:
See also this thread: http://www.mail-archive.com/blueobelisk-discuss@.../msg00665.html

....where Greg Landrum dissuades me from including stereo in 0D Mol file.

Here's how that conversation ended. Greg wrote:
>> So...should we retain them or not? I think what I'll do is add an
>> option to allow users to retain them exactly. However, the default
>> will be that the wedges/hashes in the output will be solely dependent
>> on the perceived stereochemistry. *Sigh* This applies to all 2D output
>> formats.
>
> Having the option to retain them exactly sounds sensible, but that
> means retaining all of the user-provided markings, right? This almost
> sounds to me like it's a read setting, not a write one. But then I'm
> not familiar with the internal flow for processing mols in OB.

It seemed to me like the opposite (maybe Greg is following this now?).  At the end of this discussion, I was left with the understanding that OpenBabel *would* keep the 0D stereo information, or at least optionally be able to keep it.

And in that same thread, Peter beat me in making all my key arguments.  In particular:

http://www.mail-archive.com/blueobelisk-discuss@.../msg00666.html

... in which Peter argues that editors tend to draw zig-zag bonds by default, so you can't rely on 2D coordinates to deduce whether an author really meant cis or trans.  And more to the point, Peter closes by saying:
>> For 0D coordinates, there are no guidelines. I propose to store
>> cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
>> ends of a double bond means cis), and chirality using the atom parity.
>> The MDL spec states that atom parity should be ignored when read,
>
> I know this is the spec and I don't want to get into more arguments about
> whether it should be changed. At this stage I think it is useful if programs
> have the capability to read and interpret this field.

Exactly.

To reiterate my earlier argument, I just can't see any reason not to include cis/trans and chiral information in 0D SD Files.  It's genuinely useful, and it doesn't hurt anything.  The fact that it stretches the CTFile spec strikes me as uninteresting.  Apps are free to ignore it.

Craig

Ok - I'll add it back.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Craig James-2
Hi Noel,

I'm back from my vacation and caught up with the seemingly inevitable backlog of stuff that build up... now back to reality!

On Sun, May 20, 2012 at 10:15 AM, Noel O'Boyle <[hidden email]> wrote:
On 16 May 2012 16:28, Craig James <[hidden email]> wrote:
On Mon, May 14, 2012 at 3:46 AM, Noel O'Boyle <[hidden email]> wrote:
See also this thread: http://www.mail-archive.com/blueobelisk-discuss@.../msg00665.html

....where Greg Landrum dissuades me from including stereo in 0D Mol file.

Here's how that conversation ended. Greg wrote:
>> So...should we retain them or not? I think what I'll do is add an
>> option to allow users to retain them exactly. However, the default
>> will be that the wedges/hashes in the output will be solely dependent
>> on the perceived stereochemistry. *Sigh* This applies to all 2D output
>> formats.
>
> Having the option to retain them exactly sounds sensible, but that
> means retaining all of the user-provided markings, right? This almost
> sounds to me like it's a read setting, not a write one. But then I'm
> not familiar with the internal flow for processing mols in OB.

It seemed to me like the opposite (maybe Greg is following this now?).  At the end of this discussion, I was left with the understanding that OpenBabel *would* keep the 0D stereo information, or at least optionally be able to keep it.

And in that same thread, Peter beat me in making all my key arguments.  In particular:

http://www.mail-archive.com/blueobelisk-discuss@.../msg00666.html

... in which Peter argues that editors tend to draw zig-zag bonds by default, so you can't rely on 2D coordinates to deduce whether an author really meant cis or trans.  And more to the point, Peter closes by saying:
>> For 0D coordinates, there are no guidelines. I propose to store
>> cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
>> ends of a double bond means cis), and chirality using the atom parity.
>> The MDL spec states that atom parity should be ignored when read,
>
> I know this is the spec and I don't want to get into more arguments about
> whether it should be changed. At this stage I think it is useful if programs
> have the capability to read and interpret this field.

Exactly.

To reiterate my earlier argument, I just can't see any reason not to include cis/trans and chiral information in 0D SD Files.  It's genuinely useful, and it doesn't hurt anything.  The fact that it stretches the CTFile spec strikes me as uninteresting.  Apps are free to ignore it.

Craig

Ok - I'll add it back.

Can you tell me where we stand on this?

Thanks!
Craig


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Noel O'Boyle
Administrator
On 21 June 2012 16:13, Craig James <[hidden email]> wrote:

> Hi Noel,
>
> I'm back from my vacation and caught up with the seemingly inevitable
> backlog of stuff that build up... now back to reality!
>
> On Sun, May 20, 2012 at 10:15 AM, Noel O'Boyle <[hidden email]> wrote:
>>
>> On 16 May 2012 16:28, Craig James <[hidden email]> wrote:
>>>
>>> On Mon, May 14, 2012 at 3:46 AM, Noel O'Boyle <[hidden email]>
>>> wrote:
>>>>
>>>> See also this thread:
>>>> http://www.mail-archive.com/blueobelisk-discuss@.../msg00665.html
>>>>
>>>> ....where Greg Landrum dissuades me from including stereo in 0D Mol
>>>> file.
>>>
>>>
>>> Here's how that conversation ended. Greg wrote:
>>>
>>> >> So...should we retain them or not? I think what I'll do is add an
>>> >> option to allow users to retain them exactly. However, the default
>>> >> will be that the wedges/hashes in the output will be solely dependent
>>> >> on the perceived stereochemistry. *Sigh* This applies to all 2D output
>>> >> formats.
>>> >
>>> > Having the option to retain them exactly sounds sensible, but that
>>> > means retaining all of the user-provided markings, right? This almost
>>> > sounds to me like it's a read setting, not a write one. But then I'm
>>> > not familiar with the internal flow for processing mols in OB.
>>>
>>> It seemed to me like the opposite (maybe Greg is following this now?).
>>> At the end of this discussion, I was left with the understanding that
>>> OpenBabel *would* keep the 0D stereo information, or at least optionally be
>>> able to keep it.
>>>
>>> And in that same thread, Peter beat me in making all my key arguments.
>>> In particular:
>>>
>>>
>>> http://www.mail-archive.com/blueobelisk-discuss@.../msg00666.html
>>>
>>> ... in which Peter argues that editors tend to draw zig-zag bonds by
>>> default, so you can't rely on 2D coordinates to deduce whether an author
>>> really meant cis or trans.  And more to the point, Peter closes by saying:
>>>
>>> >> For 0D coordinates, there are no guidelines. I propose to store
>>> >> cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
>>> >> ends of a double bond means cis), and chirality using the atom parity.
>>> >> The MDL spec states that atom parity should be ignored when read,
>>> >
>>> > I know this is the spec and I don't want to get into more arguments
>>> > about
>>> > whether it should be changed. At this stage I think it is useful if
>>> > programs
>>> > have the capability to read and interpret this field.
>>>
>>> Exactly.
>>>
>>> To reiterate my earlier argument, I just can't see any reason not to
>>> include cis/trans and chiral information in 0D SD Files.  It's genuinely
>>> useful, and it doesn't hurt anything.  The fact that it stretches the CTFile
>>> spec strikes me as uninteresting.  Apps are free to ignore it.
>>>
>>> Craig
>>
>>
>> Ok - I'll add it back.
>
>
> Can you tell me where we stand on this?

I've working on it as we speak. I had to rewrite the old
implementation as it wasn't quite correct. I think it's working now,
but I need more time to be sure... In short, I intend to get this in
before release.

- Noel

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Noel O'Boyle
Administrator
In reply to this post by Craig James-2
Now added in r4898. Option "S" for read or write gives former 2.3.1 behaviour.

"obabel -:C/C=C/Cl -omol" still includes a warning message as follows:

*** Open Babel Warning  in OpenBabel::MDLFormat::WriteMolecule
  No 2D or 3D coordinates exist. Stereochemical information will be stored using
 an Open Babel extension. To generate 2D or 3D coordinates instead use
--gen2D or --gen3D.

- Noel

On 21 June 2012 16:13, Craig James <[hidden email]> wrote:

> Hi Noel,
>
> I'm back from my vacation and caught up with the seemingly inevitable
> backlog of stuff that build up... now back to reality!
>
> On Sun, May 20, 2012 at 10:15 AM, Noel O'Boyle <[hidden email]> wrote:
>>
>> On 16 May 2012 16:28, Craig James <[hidden email]> wrote:
>>>
>>> On Mon, May 14, 2012 at 3:46 AM, Noel O'Boyle <[hidden email]>
>>> wrote:
>>>>
>>>> See also this thread:
>>>> http://www.mail-archive.com/blueobelisk-discuss@.../msg00665.html
>>>>
>>>> ....where Greg Landrum dissuades me from including stereo in 0D Mol
>>>> file.
>>>
>>>
>>> Here's how that conversation ended. Greg wrote:
>>>
>>> >> So...should we retain them or not? I think what I'll do is add an
>>> >> option to allow users to retain them exactly. However, the default
>>> >> will be that the wedges/hashes in the output will be solely dependent
>>> >> on the perceived stereochemistry. *Sigh* This applies to all 2D output
>>> >> formats.
>>> >
>>> > Having the option to retain them exactly sounds sensible, but that
>>> > means retaining all of the user-provided markings, right? This almost
>>> > sounds to me like it's a read setting, not a write one. But then I'm
>>> > not familiar with the internal flow for processing mols in OB.
>>>
>>> It seemed to me like the opposite (maybe Greg is following this now?).
>>> At the end of this discussion, I was left with the understanding that
>>> OpenBabel *would* keep the 0D stereo information, or at least optionally be
>>> able to keep it.
>>>
>>> And in that same thread, Peter beat me in making all my key arguments.
>>> In particular:
>>>
>>>
>>> http://www.mail-archive.com/blueobelisk-discuss@.../msg00666.html
>>>
>>> ... in which Peter argues that editors tend to draw zig-zag bonds by
>>> default, so you can't rely on 2D coordinates to deduce whether an author
>>> really meant cis or trans.  And more to the point, Peter closes by saying:
>>>
>>> >> For 0D coordinates, there are no guidelines. I propose to store
>>> >> cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
>>> >> ends of a double bond means cis), and chirality using the atom parity.
>>> >> The MDL spec states that atom parity should be ignored when read,
>>> >
>>> > I know this is the spec and I don't want to get into more arguments
>>> > about
>>> > whether it should be changed. At this stage I think it is useful if
>>> > programs
>>> > have the capability to read and interpret this field.
>>>
>>> Exactly.
>>>
>>> To reiterate my earlier argument, I just can't see any reason not to
>>> include cis/trans and chiral information in 0D SD Files.  It's genuinely
>>> useful, and it doesn't hurt anything.  The fact that it stretches the CTFile
>>> spec strikes me as uninteresting.  Apps are free to ignore it.
>>>
>>> Craig
>>
>>
>> Ok - I'll add it back.
>
>
> Can you tell me where we stand on this?
>
> Thanks!
> Craig
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bad bug? Cis/Trans in SMILES dropped in 2.3.x

Craig James-2
Hi Noel,

On Fri, Jun 22, 2012 at 4:01 AM, Noel O'Boyle <[hidden email]> wrote:
Now added in r4898. Option "S" for read or write gives former 2.3.1 behaviour.

"obabel -:C/C=C/Cl -omol" still includes a warning message as follows:

*** Open Babel Warning  in OpenBabel::MDLFormat::WriteMolecule
 No 2D or 3D coordinates exist. Stereochemical information will be stored using
 an Open Babel extension. To generate 2D or 3D coordinates instead use
--gen2D or --gen3D.

Perfect, thanks!  This is a good solution, a mix of pragmatism and informative warnings.  I'll test it either today or Monday.

Craig
 

- Noel

On 21 June 2012 16:13, Craig James <[hidden email]> wrote:
> Hi Noel,
>
> I'm back from my vacation and caught up with the seemingly inevitable
> backlog of stuff that build up... now back to reality!
>
> On Sun, May 20, 2012 at 10:15 AM, Noel O'Boyle <[hidden email]> wrote:
>>
>> On 16 May 2012 16:28, Craig James <[hidden email]> wrote:
>>>
>>> On Mon, May 14, 2012 at 3:46 AM, Noel O'Boyle <[hidden email]>
>>> wrote:
>>>>
>>>> See also this thread:
>>>> http://www.mail-archive.com/blueobelisk-discuss@.../msg00665.html
>>>>
>>>> ....where Greg Landrum dissuades me from including stereo in 0D Mol
>>>> file.
>>>
>>>
>>> Here's how that conversation ended. Greg wrote:
>>>
>>> >> So...should we retain them or not? I think what I'll do is add an
>>> >> option to allow users to retain them exactly. However, the default
>>> >> will be that the wedges/hashes in the output will be solely dependent
>>> >> on the perceived stereochemistry. *Sigh* This applies to all 2D output
>>> >> formats.
>>> >
>>> > Having the option to retain them exactly sounds sensible, but that
>>> > means retaining all of the user-provided markings, right? This almost
>>> > sounds to me like it's a read setting, not a write one. But then I'm
>>> > not familiar with the internal flow for processing mols in OB.
>>>
>>> It seemed to me like the opposite (maybe Greg is following this now?).
>>> At the end of this discussion, I was left with the understanding that
>>> OpenBabel *would* keep the 0D stereo information, or at least optionally be
>>> able to keep it.
>>>
>>> And in that same thread, Peter beat me in making all my key arguments.
>>> In particular:
>>>
>>>
>>> http://www.mail-archive.com/blueobelisk-discuss@.../msg00666.html
>>>
>>> ... in which Peter argues that editors tend to draw zig-zag bonds by
>>> default, so you can't rely on 2D coordinates to deduce whether an author
>>> really meant cis or trans.  And more to the point, Peter closes by saying:
>>>
>>> >> For 0D coordinates, there are no guidelines. I propose to store
>>> >> cis/trans stereo using the bond stereo (you know, UP [or DOWN] at both
>>> >> ends of a double bond means cis), and chirality using the atom parity.
>>> >> The MDL spec states that atom parity should be ignored when read,
>>> >
>>> > I know this is the spec and I don't want to get into more arguments
>>> > about
>>> > whether it should be changed. At this stage I think it is useful if
>>> > programs
>>> > have the capability to read and interpret this field.
>>>
>>> Exactly.
>>>
>>> To reiterate my earlier argument, I just can't see any reason not to
>>> include cis/trans and chiral information in 0D SD Files.  It's genuinely
>>> useful, and it doesn't hurt anything.  The fact that it stretches the CTFile
>>> spec strikes me as uninteresting.  Apps are free to ignore it.
>>>
>>> Craig
>>
>>
>> Ok - I'll add it back.
>
>
> Can you tell me where we stand on this?
>
> Thanks!
> Craig
>


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel