[Open Babel] XML formats

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Open Babel] XML formats

Chris Morley-3
I have written some OpenBabel code for XML formats using libxml2,
which are now in CVS. It has been a challenge fitting this C
library to the C++ iostreams that OpenBabel uses, while at the
same time handling other things that OB does, such as multiple
input files and molecules, large files and indexing for fast
searching. The existing CML format is not written in a way that
makes it easy to fit in with these extensions and it was felt that
using libxml2 would ensure that OB could be in a position to take
advantage of its wider XML facilities.

The new format for CML will read the various forms of CML1 and
CML2 molecules and write CML2 molecules (no CML1 writing yet). It
has atomParity and bondStereo implemented but this aspect needs
tweaking to be consistent with Nick England's recent changes to
chiral implementation. Other features, such as those directed
towards crystallography, have not yet been added.

There is a format to read  the chemical structure part of PubChem
XML, minimally at present.

With the CMLReact format you can read and write chemical reactions
(again minimally). It handles the form with the molecules in a
list at the beginning, as well as the normal form. This is because
I want to move the format towards use with atmospheric and
combustion reaction mechanisms.

For files with .xml extensions, the XMLFormat can deduce how to
read from the xml namespace declarations in the file. New XML
formats can be added without changing any existing code.

Currently only UTF-8 (well, ASCII really) encoded XML files are
handled, but adding UTF-16 support shouldn't be too difficult with
the libxml2 base.

Chris


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] XML formats

Jean Bréfort-3
Le samedi 10 septembre 2005 à 15:51 +0100, Chris Morley a écrit :

> I have written some OpenBabel code for XML formats using libxml2,
> which are now in CVS. It has been a challenge fitting this C
> library to the C++ iostreams that OpenBabel uses, while at the
> same time handling other things that OB does, such as multiple
> input files and molecules, large files and indexing for fast
> searching. The existing CML format is not written in a way that
> makes it easy to fit in with these extensions and it was felt that
> using libxml2 would ensure that OB could be in a position to take
> advantage of its wider XML facilities.
>
> The new format for CML will read the various forms of CML1 and
> CML2 molecules and write CML2 molecules (no CML1 writing yet). It
> has atomParity and bondStereo implemented but this aspect needs
> tweaking to be consistent with Nick England's recent changes to
> chiral implementation. Other features, such as those directed
> towards crystallography, have not yet been added.
>
> There is a format to read  the chemical structure part of PubChem
> XML, minimally at present.
>
> With the CMLReact format you can read and write chemical reactions
> (again minimally). It handles the form with the molecules in a
> list at the beginning, as well as the normal form. This is because
> I want to move the format towards use with atmospheric and
> combustion reaction mechanisms.
>
> For files with .xml extensions, the XMLFormat can deduce how to
> read from the xml namespace declarations in the file. New XML
> formats can be added without changing any existing code.
>
> Currently only UTF-8 (well, ASCII really) encoded XML files are
> handled, but adding UTF-16 support shouldn't be too difficult with
> the libxml2 base.

Great, I should build on your work to add gchempaint files support.



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] XML formats

Michael Banck
In reply to this post by Chris Morley-3
On Sat, Sep 10, 2005 at 03:51:57PM +0100, Chris Morley wrote:
> I have written some OpenBabel code for XML formats using libxml2,
> which are now in CVS. It has been a challenge fitting this C
> library to the C++ iostreams that OpenBabel uses, while at the
> same time handling other things that OB does, such as multiple
> input files and molecules, large files and indexing for fast
> searching. The existing CML format is not written in a way that
> makes it easy to fit in with these extensions and it was felt that
> using libxml2 would ensure that OB could be in a position to take
> advantage of its wider XML facilities.

Great!

> Currently only UTF-8 (well, ASCII really) encoded XML files are
> handled, but adding UTF-16 support shouldn't be too difficult with
> the libxml2 base.

I am a bit concerned about maintaining issues.  What is the rationale
for not requiring libxml headers at configure time?  Do you think we
need to branch libxml? (or are we doing already?)

We could provide users with an archive of required third party software
accompayning releases, if that is a concern.


Michael


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] XML formats

Geoffrey Hutchison

On Sep 10, 2005, at 3:53 PM, Michael Banck wrote:

> I am a bit concerned about maintaining issues.  What is the rationale
> for not requiring libxml headers at configure time?  Do you think we
> need to branch libxml? (or are we doing already?)

I don't know.

Chris, why have you included all these libxml headers? I realize that  
the current UNIX build environment don't necessarily ensure that  
libxml headers were included. But either under Windows or UNIX,  
things like the libxml and libz (for the upcoming gzip support)  
should be indicated by the build environment.

(BTW, Michael, if you or someone else would like to make sure libxml  
headers are picked up, I'd appreciate it. I probably can't get to  
that today.)

> We could provide users with an archive of required third party  
> software
> accompayning releases, if that is a concern.

Under most recent UNIX environments, there shouldn't be much  
required. For the upcoming gzip support, zlib is required (standard  
on just about everything). For this, libxml2 is required (also pretty  
standard now).

And of course, if you want to rebuild parts of the package, you may  
need Autoconf, Automake, Libtool, or SWIG.

Cheers,
-Geoff


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] XML formats

Geoffrey Hutchison
In reply to this post by Chris Morley-3

On Sep 10, 2005, at 10:51 AM, Chris Morley wrote:


> I have written some OpenBabel code for XML formats using libxml2,  
> which are now in CVS.
>

Fantastic! Obviously, many formats are now based on XML, and it's  
nice to have some frameworks to later add support for these new file  
types.

Thanks a lot for your work!

Cheers,
-Geoff



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] XML formats

Chris Morley-3
In reply to this post by Geoffrey Hutchison


Geoffrey Hutchison wrote:

>
> On Sep 10, 2005, at 3:53 PM, Michael Banck wrote:
>
>> I am a bit concerned about maintaining issues.  What is the rationale
>> for not requiring libxml headers at configure time?  Do you think we
>> need to branch libxml? (or are we doing already?)
>
>
> I don't know.
>
> Chris, why have you included all these libxml headers? I realize that  
> the current UNIX build environment don't necessarily ensure that  libxml
> headers were included. But either under Windows or UNIX,  things like
> the libxml and libz (for the upcoming gzip support)  should be indicated
> by the build environment.

I don't really understand what the issue is here, so I apologize
if I'm missing the point. I can only comment on what was needed to
compile under Windows in order to access the precompiled libxml
DLL. The headers are needed during compilation and are not usually
present on Windows machines. I had to download them, together with
the libxml.lib (needed for linking) and libxml.dll (needed at
runtime). I should also have put these on CVS in the windows
folder. If libxml is considered standard on UNIX machines, the
headers could go in there as well. Using headers, lib and DLL is
normal on Windows and compiler independent (I think) if the
interface is C (i.e. not C++) - it was fine with InChI and will
probably be ok for zlib.

> (BTW, Michael, if you or someone else would like to make sure libxml  
> headers are picked up, I'd appreciate it. I probably can't get to  that
> today.)
>
>> We could provide users with an archive of required third party  software
>> accompayning releases, if that is a concern.
>
>
> Under most recent UNIX environments, there shouldn't be much  required.
> For the upcoming gzip support, zlib is required (standard  on just about
> everything). For this, libxml2 is required (also pretty  standard now).
>
> And of course, if you want to rebuild parts of the package, you may  
> need Autoconf, Automake, Libtool, or SWIG.
>
> Cheers,
> -Geoff
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> OpenBabel-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] XML formats

peter murray-rust
In reply to this post by Chris Morley-3
At 15:51 10/09/2005, Chris Morley wrote:
>I have written some OpenBabel code for XML formats using libxml2,
>which are now in CVS. It has been a challenge fitting this C library
>to the C++ iostreams that OpenBabel uses, while at the same time
>handling other things that OB does, such as multiple input files and
>molecules, large files and indexing for fast searching. The existing
>CML format is not written in a way that makes it easy to fit in with
>these extensions and it was felt that using libxml2 would ensure
>that OB could be in a position to take advantage of its wider XML facilities.

This sounds great, Chris. Many thanks, and we'll talk on Monday

Is it the formulation in a schema that makes it difficult? Because it
is relatively easy to change if necessary. We can create a stylesheet
to remove the worst bits of XSD (for example unions). Also remember
that in CML you can create your own schema from the components and it
would be an excellent idea to do this for OB.

Also, at least in principle, we can autogenerate the C++ code for
some of the CML.

>The new format for CML will read the various forms of CML1 and CML2
>molecules and write CML2 molecules (no CML1 writing yet). It has
>atomParity and bondStereo implemented but this aspect needs tweaking
>to be consistent with Nick England's recent changes to chiral
>implementation. Other features, such as those directed towards
>crystallography, have not yet been added.

This looks very good.  It fits in with what Ramin and I have been
doing with JUMBO5.0 (see jumbo50 module on cml.sf.net). We have been
able to simplify the JUMBO code enormously by using XOM (see xom.nu)
rather than W3C DOM (which I hope I never have to use again). I think
that libxml is also simpler than W3C DOM and so it may be easier to
create code for it

>There is a format to read  the chemical structure part of PubChem
>XML, minimally at present.
>
>With the CMLReact format you can read and write chemical reactions
>(again minimally). It handles the form with the molecules in a list
>at the beginning, as well as the normal form. This is because I want
>to move the format towards use with atmospheric and combustion
>reaction mechanisms.

I have committed the JUMBO CMLReact support to SF. It may be useful
to explore the classes even though they are in java as the
functionality could be fairly easily ported to C++ I think

>For files with .xml extensions, the XMLFormat can deduce how to read
>from the xml namespace declarations in the file. New XML formats can
>be added without changing any existing code.
>
>Currently only UTF-8 (well, ASCII really) encoded XML files are
>handled, but adding UTF-16 support shouldn't be too difficult with
>the libxml2 base.
>
>Chris

Thanks again,

P.



>-------------------------------------------------------
>SF.Net email is Sponsored by the Better Software Conference & EXPO
>September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
>Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
>Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
>_______________________________________________
>OpenBabel-discuss mailing list
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Peter Murray-Rust
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road,  Cambridge CB2 1EW, UK
+44-1223-763069



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] XML formats

Michael Banck
In reply to this post by Chris Morley-3
On Sun, Sep 11, 2005 at 12:09:16AM +0100, Chris Morley wrote:

> I don't really understand what the issue is here, so I apologize
> if I'm missing the point. I can only comment on what was needed to
> compile under Windows in order to access the precompiled libxml
> DLL. The headers are needed during compilation and are not usually
> present on Windows machines. I had to download them, together with
> the libxml.lib (needed for linking) and libxml.dll (needed at
> runtime). I should also have put these on CVS in the windows
> folder. If libxml is considered standard on UNIX machines, the
> headers could go in there as well. Using headers, lib and DLL is
> normal on Windows and compiler independent (I think) if the
> interface is C (i.e. not C++)

Well, standard praxis on Unix is to install the necessary development
libraries needed for compilation through the distributor's package
management system.

The problem I see with including libxml in our source tree is that we
need to keep an eye on libxml development as well and generally support
the source, thus duplicating work the libxml developers are doing
already.

Most software packages just say 'You need libfoo, libbar and baz to
compile me, see INSTALL for further instructions on how to install
them', not sure whether this would be acceptable for the Windows port.
I guess we provide users with archives of a libxml .dll etc.  along with
our pre-compiled Windows binary?  

I don't know how many people compile OpenBabel from source on Windows,
and how well it works (I saw some people having issues with it in the
archive).  Maybe the best thing would be for somebody to step forward
and commit to provide regular Windows binaries of the development branch
for users to test, so that not everybody needs to do all the necessary
steps.

On a technical note, how difficult would it be to change the #includes
from "libxml/xml.h" to <libxml/xml.h> and change the include path passed
to the Windows compiler accordingly?


Michael

--
Michael Banck
Debian Developer
[hidden email]
http://www.advogato.org/person/mbanck/diary.html


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Loading...