Quantcast

[Open Babel] Re: Snapshot, 2.0 status

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Open Babel] Re: Snapshot, 2.0 status

Geoffrey Hutchison
Hi,

While I didn't mention it outright, there's a new 2005-09-01  
development snapshot. I noticed while uploading that the 2005-08-01  
snapshot was mangled -- but no one complained? Hopefully some people  
find these snapshots worthwhile, or I won't bother.
http://sourceforge.net/project/showfiles.php?
group_id=40728&package_id=154019

I'm trying to put together a checklist for the 2.0 release. I think  
it'll probably slip from the already slippery "September sometime"  
goal that I stated this summer, but I think we're generally in good  
shape. I've been using it routinely for a while and thanks to quite a  
lot of help, I think it's already higher quality than 1.100.2.

Here's what I see that's remaining. Please let me know if there are  
other things to add:
* Improve API documentation: http://openbabel.sourceforge.net/dev-api/
  - Ensure all documentation is consistent with 2.0 changes (i.e.,  
OBMol referenced the old OBFileFormat class until recently)
  - Add more example code and "tutorials"
  - Ensure good documentation coverage (currently ~75-80% there)

* A "migration" guide, explaining how to update code from OB 1.0 to 2.0

* User-level documentation for command-line tools and features  
supported in each file format
  (including limitations on the current SMARTS implementation)

* Improved, more stable CML parser (Chris Morley and others)

* Fix for stereochemistry conventions between SMILES and MDL (Nick  
England)
    http://sourceforge.net/tracker/index.php?
func=detail&aid=1257494&group_id=40728&atid=428740

* Support for gzip/zlib compression for reading/writing files

* Fixes for important bugs:
http://sourceforge.net/tracker/index.php?
func=detail&aid=1219329&group_id=40728&atid=428740
http://sourceforge.net/tracker/index.php?
func=detail&aid=1246761&group_id=40728&atid=428740
http://sourceforge.net/tracker/index.php?
func=detail&aid=1281758&group_id=40728&atid=428740
http://sourceforge.net/tracker/index.php?
func=detail&aid=1262171&group_id=40728&atid=428740

* Additional file formats as possible


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Open Babel] Behavior of UP and Wedge bonds

N.W. England
I noticed that at the moment wedge and up bonds are compleatly seperate.
Are they supposed to be used for different things? At the moment a lot of
code seems to have if (IsHash() || IsDown() etc implying they are supposed
to be equal.

Would it be a good idea to change them to both use the same flag to avoid
some formats setting Hash but another only looking for down?

mol.h:
00600     void SetHash()        { SetFlag(OB_HASH_BOND);     }
00601     void SetWedge()       { SetFlag(OB_WEDGE_BOND);    }
00602     void SetUp()          { SetFlag(OB_TORUP_BOND);    }
00603     void SetDown()        { SetFlag(OB_TORDOWN_BOND);  }

- Nick


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] Behavior of UP and Wedge bonds

Geoffrey Hutchison

On Sep 8, 2005, at 9:20 AM, N.W. England wrote:

> I noticed that at the moment wedge and up bonds are compleatly  
> seperate. Are they supposed to be used for different things? At the  
> moment a lot of code seems to have if (IsHash() || IsDown() etc  
> implying they are supposed to be equal.

These are *NOT* equal. I repeat: NOT EQUAL.

I'm more than happy to add documentation and/or rename functions  
because this is clearly confusing. Here's the current definition of  
the flags:
//! A solid black wedge in 2D representations -- i.e., "up" from the  
2D plane
#define OB_WEDGE_BOND     (1<<2)
//! A dashed "hash" bond in 2D representations -- i.e., "down" from  
the 2D plane
#define OB_HASH_BOND      (1<<3)
//! The "upper" bond in a double bond cis/trans isomer (i.e., "/" in  
SMILES)
#define OB_TORUP_BOND     (1<<5)
//! The "down" bond in a double bond cis/trans isomer (i.e., "\" in  
SMILES)
#define OB_TORDOWN_BOND   (1<<6)


OBBond::IsUp() and OBBond::IsWedge() are NOT equivalent. One implies  
a cis/trans bond, and one implies a pseudo-3D representation.

Suggestions for renaming the functions for OB 2.0 are more than  
welcome. I've certainly been trying to make this clearer in the  
documentation as well, but there's obviously still confusion.

Thoughts?
-Geoff


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] Behavior of UP and Wedge bonds

N.W. England
On Sep 8 2005, Geoffrey Hutchison wrote:
Here's the current definition of the flags:

>//! A solid black wedge in 2D representations -- i.e., "up" from the  
>2D plane
>#define OB_WEDGE_BOND     (1<<2)
>//! A dashed "hash" bond in 2D representations -- i.e., "down" from  
>the 2D plane
>#define OB_HASH_BOND      (1<<3)
>//! The "upper" bond in a double bond cis/trans isomer (i.e., "/" in  
>SMILES)
>#define OB_TORUP_BOND     (1<<5)
>//! The "down" bond in a double bond cis/trans isomer (i.e., "\" in  
>SMILES)
>#define OB_TORDOWN_BOND   (1<<6)

Thanks for clarifying Geoff!

At the moment in the molv3000 ReadBondBlock function in mdlformat.cpp:624
if (val == 1)
        {
        flag |= OB_TORUP_BOND;
        }
        else if (val == 3)
        {
        flag |= OB_TORDOWN_BOND;
        }

However this applies to Wedge/Hash bonds not to cis/trans isomers. The
V3000 format doesn't mark cis/trans with any flags, relying soley on the 2D
co-ordinates.

At the moment the code allows smiles cis/trans to round trip through Mol
v3000, but the intermediate mol v3000 is not correct as it is labeling the
bonds as wedge/hash.

The molV3000 should be changed to set the OB_WEDGE_BOND and OB_HASH_BOND
flags. Otherwise chirality from 2D co-ordiantes and bond wedge/hash won't
work.

An extra routine for generating OB_TORUP and OB_TORDOWN from 2D
co-ordinates would be neccesary to correctly convert from molv3000 to
smiles.

- Nick


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] Behavior of UP and Wedge bonds

Geoffrey Hutchison

On Sep 8, 2005, at 10:32 AM, N.W. England wrote:

> At the moment the code allows smiles cis/trans to round trip  
> through Mol v3000, but the intermediate mol v3000 is not correct as  
> it is labeling the bonds as wedge/hash.

I thought I removed all of that code before. It's nice that it  
roundtrips SMILES cis/trans, but it's completely wrong and creates an  
incorrect MDL file.

> The molV3000 should be changed to set the OB_WEDGE_BOND and  
> OB_HASH_BOND flags. Otherwise chirality from 2D co-ordiantes and  
> bond wedge/hash won't work.

Yes, that would be good -- I think the V2000 code should have this too.

> An extra routine for generating OB_TORUP and OB_TORDOWN from 2D co-
> ordinates would be neccesary to correctly convert from molv3000 to  
> smiles.

Assuming there are 2D or 3D coordinates, the SMILES code should  
correctly work out cis/trans and chiral information. Otherwise,  
there's simply no way to specify cis/trans in an MDL file.

The final "missing piece" for round-trip conversion from SMILES to  
preserve cis/trans and chirality would be to have coordinate  
generation. It might be possible to work out some of the chiral  
information (as you suggested) from CML or MDL files.

-Geoff


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] Behavior of UP and Wedge bonds

Chris Morley-3
Geoffrey Hutchison wrote:

>
> On Sep 8, 2005, at 10:32 AM, N.W. England wrote:
>
>> At the moment the code allows smiles cis/trans to round trip  through
>> Mol v3000, but the intermediate mol v3000 is not correct as  it is
>> labeling the bonds as wedge/hash.
>
>
> I thought I removed all of that code before. It's nice that it  
> roundtrips SMILES cis/trans, but it's completely wrong and creates an  
> incorrect MDL file.

I'm sorry to have committed this crime - twice. I'm glad it is
being sorted out.
>
>> The molV3000 should be changed to set the OB_WEDGE_BOND and  
>> OB_HASH_BOND flags. Otherwise chirality from 2D co-ordiantes and  bond
>> wedge/hash won't work.
>
> Yes, that would be good -- I think the V2000 code should have this too.

I think the emphasis should be on calculating the parity, rather
than using wedge/hash flags, see below.

>> An extra routine for generating OB_TORUP and OB_TORDOWN from 2D co-
>> ordinates would be neccesary to correctly convert from molv3000 to  
>> smiles.
>
>
> Assuming there are 2D or 3D coordinates, the SMILES code should  
> correctly work out cis/trans and chiral information. Otherwise,  there's
> simply no way to specify cis/trans in an MDL file.

I find this surprising. V3000 was designed as an entity (not by
evolution like V2000). It supports 0D descriptions of tetrahedral
stereochemistry and I would have expected that a cis-trans
description (at least as chemically significant) would have been
included. The MDL specification document gives values for V3000
bond CFG 1 as "up", 3 as "down" with no further explanation as to
what they mean as far as I can see.

In 0D, wedge and hash have no meaning with regard to tetrahedral
stereo, but up or down on two single bonds unambiguously describe
cis and trans isomers. It would be consistent to have atom CFG
equivalent to SMILES @,@@ and bond CFG to SMILES /,\

In 2D, interpreting bond CFG as wedge and hash duplicates atom CFG
as a description of tetrahedral stereo. Interpreting it as up and
down duplicates the 2D coordinate info on cis/trans. Both are
unambiguous. Both could be seen as additional aids: the up/down as
reinforcing that coordinates were really meant to distinguish
isomers; the wedge/hash as an indication of the way it is written.
For instance, Marvin Sketch sometimes writes V3000 files for
chiral molecules with both atom and bond CFG specified.

It seems a pity that this useful and unambiguous practice produces
"incorrect MDL files". Perhaps there is further documentations,
maybe it would it break some software, or is it just contrary to
common practice?

How to represent these concepts internally in OB is a matter of
philosophy. One possibility is to keep the description close to
the form in which it arrived, but I would prefer it to be a
standard form that described as much as is known about the
molecule. For 3D, it would be the coordinates alone. For 2D, it
could include wedge/hash, but it would be cleaner if we used atom
parity (with the ordering of the bonded atoms from a standard
rule). For 0D, atom parity and SMILES-like up/down seems fine. The
wedges and hashes could be (re)generated when needed - for 2D or
3D MDL files perhaps in response to an output option.

Incidentally, in the documentation from mol.h on OB_TORUP_BOND it
mentions the correspondence to SMILES / but uses the work "upper",
which is confusing. It is actually the direction of the bond that
is being described, not its position. So F/C=C/F is trans.
>
> The final "missing piece" for round-trip conversion from SMILES to  
> preserve cis/trans and chirality would be to have coordinate  
> generation. It might be possible to work out some of the chiral  
> information (as you suggested) from CML or MDL files.

Coordinate generation would a useful extra facility, but it seems
a pity if we have to generate artificial data to transmit real
chemical information.

I thought CML (which I'm working on at present) had a clear way of
conveying stereochemical info in 0D and 2D molecules. But there
seems to be a little untidiness when hydrogen attached to a chiral
centre is represented by hydrogenCount...



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] MDL v3000 standard (was Behavior of UP and Wedge bonds)

Geoffrey Hutchison
Not much of a crime -- it took me ages to figure out that Open Eyes  
"UP" and "DOWN" designations had nothing to do with wedge/hash. And  
if I remember correctly, some of *their* code confused wedge/hash  
with up/down too.

The key is not to continue to make the same mistakes. :-)

> Incidentally, in the documentation from mol.h on OB_TORUP_BOND it  
> mentions the correspondence to SMILES / but uses the work "upper",  
> which is confusing. It is actually the direction of the bond that  
> is being described, not its position. So F/C=C/F is trans.

If you have any suggestions on how to reword the documentation and/or  
the methods, please, please let me know. I think this is just plain  
stupid wording by the original coders and I'd prefer to break  
backwards compatibility right now and make it clear for OB 2.0 forward.

> In 0D, wedge and hash have no meaning with regard to tetrahedral  
> stereo, but up or down on two single bonds unambiguously describe  
> cis and trans isomers. It would be consistent to have atom CFG  
> equivalent to SMILES @,@@ and bond CFG to SMILES /,\

That might be consistent, but that does not seem to be how any other  
software interprets the standard, including MDL.

> wedge/hash as an indication of the way it is written. For instance,  
> Marvin Sketch sometimes writes V3000 files for chiral molecules  
> with both atom and bond CFG specified.

Every software I've tried has used the bond CFG markers to specify  
bond wedge/hash information => atom stereochemistry, regardless of  
0D, 2D, 3D coordinates. That includes a variety of commercial software.

> It seems a pity that this useful and unambiguous practice produces  
> "incorrect MDL files". Perhaps there is further documentations,  
> maybe it would it break some software, or is it just contrary to  
> common practice?

If I import an MDL v3000 file from Open Babel into other software, I  
see wedges and hashes around double bonds. I count this as a bug.  
Certainly we weren't trying to imply chirality information there, and  
it's contrary to user intuition.


But I *know* there are multiple subscribers to this mailing list from  
commercial developers. CambridgeSoft, MDLI, and Accelrys are all on  
this list. Several have complained publicly that earlier versions of  
Babel have "corrupted" data. Anyone care to clarify their  
interpretation here?

> Coordinate generation would a useful extra facility, but it seems a  
> pity if we have to generate artificial data to transmit real  
> chemical information.

How true. I'm not sure, however, if there's anything that can be done  
here. It seems pretty clear that bond CFG markers are interpreted as  
wedge/hash representations.

Cheers,
-Geoff



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Open Babel] Public engagement in OpenBabel and interoperable semantics

peter murray-rust
At 00:48 09/09/2005, Geoffrey Hutchison wrote:

Thanks very much for this Geoff.  It seems clear that OpenBabel
represents the prime forum where this issue is being constructively
addressed in public. It is technically difficult but I think there
are some areas where agreement is possible - such as isolated
one-center atom-based stereochemistry.


>But I *know* there are multiple subscribers to this mailing list from
>commercial developers. CambridgeSoft, MDLI, and Accelrys are all on
>this list. Several have complained publicly that earlier versions of
>Babel have "corrupted" data. Anyone care to clarify their
>interpretation here?

This is the key point.  History so far has shown that almost every
commercial software company develops its code without attention to
any interoperability. In the 20th Century this was understandable,
but now we are seeing the serious problems posed by its lack -
particularly in the chemistry/bioscience interface. I know that
companies' primary concern is to generate revenue and have been
consistently told that "we don't intend to address Foo because there
is no demand for it". I'm also aware that standards generally only
occur when there is a business reason for them. This is underlined by
the virtually complete apathy in the chemical input into semantics
for LifeSciences such as the OMG LSR and the SemanticWeb for Life
Sciences. I was involved with these efforts and it was clear that
neither software manufacturers or pharma are generally interested in
interoperability.  It is a pity if software companies on this list
seem to have the same negative image . (I think it is formally
incorrect to accuse OpenBabel of corruption when the semantics of the
input and output are not defined).

I would like to thank Merck for supporting Nick who is addressing the
problem of interconversion of stereochemistry. I am not sure how far
we can get before the end of summer but we hope to be able to at
least address whether atomParity and wedge/Hatch can be
interconverted within Openbabel. We also thank MDL for support to
Ramin who is working to see how V3000 can be converted to CML  (which
is capable of representing both approaches in a single
document).  Note that the results of this work are necessarily Open
(both are/will_be on sf.net though neither should be regarded as completed).

The essentials for solving stereochemical conversion include:
* formal specification of the semantics, including machine
interpretability where possible.
* test cases and roundtripping
* open specifications of the algorithms and data structures involved
* resources to implement it.

Apart from the latter, Openbabel and other efforts in the OpenSource
community are addressing these. While we do get some encouragement
from outside the community most are publicly apathetic and we also
get criticism, mainly of the type "this should be left to commercial
organisations who will do it better". It  is also noticeable that the
mailing list for InChI (which has a complementary role) has had no
contribution from anyone in the private sector. We know there are
companies intending to use it and companies intending to implement it
in software.

There are, however, signs that this disengagement  is changing (e.g.
the IUPAC group on structure representation) and so I make an appeal
to encourage wider engagement from the private sector in Openbabel
and more generally in interoperable semantics.

P.


Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road,  Cambridge CB2 1EW, UK
+44-1223-763069



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Loading...