Processing a multi-conformational sd-file

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Processing a multi-conformational sd-file

Hans De Winter
Dear,

Can the OBConversion class treat an input sd-file as a multi-
conformational file and hence automatically generate molecules that  
contain more than set of coordinates?
More specifically, when reading an sd-file, can the OBConversion class  
'detect' whether subsequent molecules are identical in terms of their  
connectivity and atom type ordering, but different in terms of their  
coordinates (and therefore considered to be conformers of the same  
molecule). I know this is possible in the OEChem library of OpenEye,  
but from reading the documentation I can't figure it out for OB...

Many thx,

Hans
www.silicos.com


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Processing a multi-conformational sd-file

Noel O'Boyle
Administrator
Hi Hans,

Currently it doesn't do that automatically. Nor is it possible to
specify an option to do it. However, I think this *could* be added,
right, Chris? The only problem is that we don't currently have
isomorph code (identity checking), so in the meanwhile we would have
to specify a convention such as all molecules with the same title are
the same.

- Noel

On 7 July 2010 15:58, Hans De Winter <[hidden email]> wrote:

> Dear,
>
> Can the OBConversion class treat an input sd-file as a multi-
> conformational file and hence automatically generate molecules that
> contain more than set of coordinates?
> More specifically, when reading an sd-file, can the OBConversion class
> 'detect' whether subsequent molecules are identical in terms of their
> connectivity and atom type ordering, but different in terms of their
> coordinates (and therefore considered to be conformers of the same
> molecule). I know this is possible in the OEChem library of OpenEye,
> but from reading the documentation I can't figure it out for OB...
>
> Many thx,
>
> Hans
> www.silicos.com
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> OpenBabel-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Processing a multi-conformational sd-file

Geoffrey Hutchison
> isomorph code (identity checking), so in the meanwhile we would have
> to specify a convention such as all molecules with the same title are
> the same.

I've been thinking about this for a while. The easiest detection is that:
a) The number of atoms is the same
b) The element list is the same (and in the same order)

I don't think I'd want it to be automatic, though. There has to be a mechanism to "turn off" (or "turn on") the feature in case you want to splice out a particular frame or conformer, much like we do with multi-molecule files.

Cheers,
-Geoff
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Processing a multi-conformational sd-file

Tim Vandermeersch
On Wed, Jul 7, 2010 at 6:54 PM, Geoffrey Hutchison
<[hidden email]> wrote:
>> isomorph code (identity checking), so in the meanwhile we would have
>> to specify a convention such as all molecules with the same title are
>> the same.
>
> I've been thinking about this for a while. The easiest detection is that:
> a) The number of atoms is the same
> b) The element list is the same (and in the same order)

Another option is to keep track of canonical smiles. a & b don't
guarantee the molecules are the same (especially binding). However,
the symmetry classes encode a variety of properties (connectivity,
element, aromaticity, formal charge, in ring, bond orders) and can be
used for this purpose.

We actually have an implementation using a & b in avogadro. It's in
the ReafFileThread in libavogadro/src/moleculefile.cpp
(http://github.com/cryos/avogadro/blob/master/libavogadro/src/moleculefile.cpp).
This implementation assumes you have a file with different molecules
or all the same. It doesn't support N conformers of different
molecules in the same file but this should not be hard to write. There
is much extra code here though.

My suggestion would be to add an option to OBConversion to enable
conformer reading. When enabled, a Read call will read the same
molecules until it finds a different molecule. The first molecule is
kept and coordinates of the following conformers are added to it.

> I don't think I'd want it to be automatic, though. There has to be a mechanism to "turn off" (or "turn on") the feature in case you want to splice out a particular frame or conformer, much like we do with multi-molecule files.

Yes, usually when you need it the user will know. Or a program that
uses it (e.g. obspectrophore) can always enable it.

Tim

> Cheers,
> -Geoff
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> OpenBabel-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Processing a multi-conformational sd-file

Craig James-2
On 7/7/10 11:20 AM, Tim Vandermeersch wrote:

> On Wed, Jul 7, 2010 at 6:54 PM, Geoffrey Hutchison
> <[hidden email]>  wrote:
>>> isomorph code (identity checking), so in the meanwhile we would have
>>> to specify a convention such as all molecules with the same title are
>>> the same.
>>
>> I've been thinking about this for a while. The easiest detection is that:
>> a) The number of atoms is the same
>> b) The element list is the same (and in the same order)
>
> Another option is to keep track of canonical smiles. a&  b don't
> guarantee the molecules are the same (especially binding).

An interesting fact: in our database, only 3% have a unique molecular formula.  The rest (97%) share their MF with at least one other compound.

The MF of C20H21N3O3 is the most common with 1990 molecules, and there are 169 molecular formulas that are shared by over 1000 different molecules.

It's not completely relevant to the question, but I thought it was interesting.

Craig

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Processing a multi-conformational sd-file

Hans De Winter
In reply to this post by Tim Vandermeersch
> On Wed, Jul 7, 2010 at 6:54 PM, Geoffrey Hutchison
> <[hidden email]> wrote:
>>> isomorph code (identity checking), so in the meanwhile we would have
>>> to specify a convention such as all molecules with the same title  
>>> are
>>> the same.
>>
>> I've been thinking about this for a while. The easiest detection is  
>> that:
>> a) The number of atoms is the same
>> b) The element list is the same (and in the same order)
>
> Another option is to keep track of canonical smiles. a & b don't
> guarantee the molecules are the same (especially binding). However,
> the symmetry classes encode a variety of properties (connectivity,
> element, aromaticity, formal charge, in ring, bond orders) and can be
> used for this purpose.

I agree that keeping track of the canonical smiles would be the most  
robust way of doing it, and also easy to implement
if one imposes the requirement that different conformers of the same  
molecule should be subsequent to each other in
the input file.


>
> We actually have an implementation using a & b in avogadro. It's in
> the ReafFileThread in libavogadro/src/moleculefile.cpp
> (http://github.com/cryos/avogadro/blob/master/libavogadro/src/moleculefile.cpp 
> ).
> This implementation assumes you have a file with different molecules
> or all the same. It doesn't support N conformers of different
> molecules in the same file but this should not be hard to write. There
> is much extra code here though.
>
> My suggestion would be to add an option to OBConversion to enable
> conformer reading. When enabled, a Read call will read the same
> molecules until it finds a different molecule. The first molecule is
> kept and coordinates of the following conformers are added to it.

Right

>
>> I don't think I'd want it to be automatic, though. There has to be  
>> a mechanism to "turn off" (or "turn on") the feature in case you  
>> want to splice out a particular frame or conformer, much like we do  
>> with multi-molecule files.
>
> Yes, usually when you need it the user will know. Or a program that
> uses it (e.g. obspectrophore) can always enable it.

Yes

Bye,
Hans


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Processing a multi-conformational sd-file

Noel O'Boyle
Administrator
On 8 July 2010 08:08, Hans De Winter <[hidden email]> wrote:

>> On Wed, Jul 7, 2010 at 6:54 PM, Geoffrey Hutchison
>> <[hidden email]> wrote:
>>>>
>>>> isomorph code (identity checking), so in the meanwhile we would have
>>>> to specify a convention such as all molecules with the same title are
>>>> the same.
>>>
>>> I've been thinking about this for a while. The easiest detection is that:
>>> a) The number of atoms is the same
>>> b) The element list is the same (and in the same order)
>>
>> Another option is to keep track of canonical smiles. a & b don't
>> guarantee the molecules are the same (especially binding). However,
>> the symmetry classes encode a variety of properties (connectivity,
>> element, aromaticity, formal charge, in ring, bond orders) and can be
>> used for this purpose.
>
> I agree that keeping track of the canonical smiles would be the most robust
> way of doing it, and also easy to implement
> if one imposes the requirement that different conformers of the same
> molecule should be subsequent to each other in
> the input file.

Not to prolong this thread, but I think regular SMILES is more
appropriate here. We want to keep the atoms in the input order.

>
>>
>> We actually have an implementation using a & b in avogadro. It's in
>> the ReafFileThread in libavogadro/src/moleculefile.cpp
>>
>> (http://github.com/cryos/avogadro/blob/master/libavogadro/src/moleculefile.cpp).
>> This implementation assumes you have a file with different molecules
>> or all the same. It doesn't support N conformers of different
>> molecules in the same file but this should not be hard to write. There
>> is much extra code here though.
>>
>> My suggestion would be to add an option to OBConversion to enable
>> conformer reading. When enabled, a Read call will read the same
>> molecules until it finds a different molecule. The first molecule is
>> kept and coordinates of the following conformers are added to it.
>
> Right
>
>>
>>> I don't think I'd want it to be automatic, though. There has to be a
>>> mechanism to "turn off" (or "turn on") the feature in case you want to
>>> splice out a particular frame or conformer, much like we do with
>>> multi-molecule files.
>>
>> Yes, usually when you need it the user will know. Or a program that
>> uses it (e.g. obspectrophore) can always enable it.
>
> Yes
>
> Bye,
> Hans
>
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: Processing a multi-conformational sd-file

Tim Vandermeersch
On Thu, Jul 8, 2010 at 10:13 AM, Noel O'Boyle <[hidden email]> wrote:

> On 8 July 2010 08:08, Hans De Winter <[hidden email]> wrote:
>>> On Wed, Jul 7, 2010 at 6:54 PM, Geoffrey Hutchison
>>> <[hidden email]> wrote:
>>>>>
>>>>> isomorph code (identity checking), so in the meanwhile we would have
>>>>> to specify a convention such as all molecules with the same title are
>>>>> the same.
>>>>
>>>> I've been thinking about this for a while. The easiest detection is that:
>>>> a) The number of atoms is the same
>>>> b) The element list is the same (and in the same order)
>>>
>>> Another option is to keep track of canonical smiles. a & b don't
>>> guarantee the molecules are the same (especially binding). However,
>>> the symmetry classes encode a variety of properties (connectivity,
>>> element, aromaticity, formal charge, in ring, bond orders) and can be
>>> used for this purpose.
>>
>> I agree that keeping track of the canonical smiles would be the most robust
>> way of doing it, and also easy to implement
>> if one imposes the requirement that different conformers of the same
>> molecule should be subsequent to each other in
>> the input file.
>
> Not to prolong this thread, but I think regular SMILES is more
> appropriate here. We want to keep the atoms in the input order.

Yes, since the atom order will be the same for conformers, a regular
smiles should also work...

>>
>>>
>>> We actually have an implementation using a & b in avogadro. It's in
>>> the ReafFileThread in libavogadro/src/moleculefile.cpp
>>>
>>> (http://github.com/cryos/avogadro/blob/master/libavogadro/src/moleculefile.cpp).
>>> This implementation assumes you have a file with different molecules
>>> or all the same. It doesn't support N conformers of different
>>> molecules in the same file but this should not be hard to write. There
>>> is much extra code here though.
>>>
>>> My suggestion would be to add an option to OBConversion to enable
>>> conformer reading. When enabled, a Read call will read the same
>>> molecules until it finds a different molecule. The first molecule is
>>> kept and coordinates of the following conformers are added to it.
>>
>> Right
>>
>>>
>>>> I don't think I'd want it to be automatic, though. There has to be a
>>>> mechanism to "turn off" (or "turn on") the feature in case you want to
>>>> splice out a particular frame or conformer, much like we do with
>>>> multi-molecule files.
>>>
>>> Yes, usually when you need it the user will know. Or a program that
>>> uses it (e.g. obspectrophore) can always enable it.
>>
>> Yes
>>
>> Bye,
>> Hans
>>
>>
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss