Change to SMILES writer for hypervalent atoms

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Change to SMILES writer for hypervalent atoms

Noel O'Boyle
Administrator
Hi there,

In the course of sorting out the handling of implicit Hs, I've found
that the current SMILES writer writes hypervalent atoms from the
organic subset in square brackets. E.g. Texas carbons:

>obabel -:C(C)(C)(C)(C)C -osmi
[C](C)(C)(C)(C)C

or "FIF" as "F[I]F".

This is unusual behaviour compared to other toolkits and I think lack
of brackets are preferred where possible, so I've changed this (on my
branch). If this is an issue for anyone, now's the time to duke it
out.

- Noel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Change to SMILES writer for hypervalent atoms

Craig James-2
Hi Noel,

On Thu, Mar 2, 2017 at 10:11 AM, Noel O'Boyle <[hidden email]> wrote:
In the course of sorting out the handling of implicit Hs, I've found
that the current SMILES writer writes hypervalent atoms from the
organic subset in square brackets. E.g. Texas carbons:

>obabel -:C(C)(C)(C)(C)C -osmi
[C](C)(C)(C)(C)C

or "FIF" as "F[I]F".

This is unusual behaviour compared to other toolkits and I think lack
of brackets are preferred where possible, so I've changed this (on my
branch). If this is an issue for anyone, now's the time to duke it
out.

Well, "FIF" violates the OpenSMILES spec in section 3.1.5, which states that the "organic subset" are only allowed outside of brackets if they're in their normal lowest-valence state. Actually, now that I read it, it's not well written and has room for (mis)interpretation. The phrase that I think applies in OpenSMILES is:

An atom is specified [without brackets] has the following properties:
  • "implicit hydrogens" are added such that valence of the atom is in the lowest normal state for that element
You might argue from this that since you don't have to add any hydrogens, it's clear what it means. But someone else might say, "you have to add charge to balance it."

Daylight's page is more clear. It says:
... the "organic subset" B, C, N, O, P, S, F, Cl, Br, and I may be written without brackets if the number of attached hydrogens conforms to the lowest normal valence consistent with explicit bonds.

If you can say, "It's obvious ...", and this is a feature everyone would like, then the OpenSMILES spec could be changed.

Craig
 

- Noel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Change to SMILES writer for hypervalent atoms

Noel O'Boyle
Administrator
Hi Craig,

>From what you say, it sounds like this is a better discussion for the
OpenSMILES list. My intention is to match existing usage and what I
believe to be Daylight usage. Let's have this discussion over there,
and can clarify the spec either way depending on the outcome.

- Noel



On 2 March 2017 at 19:34, Craig James <[hidden email]> wrote:

> Hi Noel,
>
> On Thu, Mar 2, 2017 at 10:11 AM, Noel O'Boyle <[hidden email]> wrote:
>>
>> In the course of sorting out the handling of implicit Hs, I've found
>> that the current SMILES writer writes hypervalent atoms from the
>> organic subset in square brackets. E.g. Texas carbons:
>>
>> >obabel -:C(C)(C)(C)(C)C -osmi
>> [C](C)(C)(C)(C)C
>>
>> or "FIF" as "F[I]F".
>>
>> This is unusual behaviour compared to other toolkits and I think lack
>> of brackets are preferred where possible, so I've changed this (on my
>> branch). If this is an issue for anyone, now's the time to duke it
>> out.
>
>
> Well, "FIF" violates the OpenSMILES spec in section 3.1.5, which states that
> the "organic subset" are only allowed outside of brackets if they're in
> their normal lowest-valence state. Actually, now that I read it, it's not
> well written and has room for (mis)interpretation. The phrase that I think
> applies in OpenSMILES is:
>
> An atom is specified [without brackets] has the following properties:
>
> "implicit hydrogens" are added such that valence of the atom is in the
> lowest normal state for that element
>
> You might argue from this that since you don't have to add any hydrogens,
> it's clear what it means. But someone else might say, "you have to add
> charge to balance it."
>
> Daylight's page is more clear. It says:
>
> ... the "organic subset" B, C, N, O, P, S, F, Cl, Br, and I may be written
> without brackets if the number of attached hydrogens conforms to the lowest
> normal valence consistent with explicit bonds.
>
>
> If you can say, "It's obvious ...", and this is a feature everyone would
> like, then the OpenSMILES spec could be changed.
>
> Craig
>
>>
>>
>> - Noel
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> OpenBabel-Devel mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/openbabel-devel
>
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Change to SMILES writer for hypervalent atoms

Andrew Dalke
In reply to this post by Craig James-2
On Mar 2, 2017, at 20:34, Craig James <[hidden email]> wrote:
> Well, "FIF" violates the OpenSMILES spec in section 3.1.5, which states that the "organic subset" are only allowed outside of brackets if they're in their normal lowest-valence state. Actually, now that I read it, it's not well written and has room for (mis)interpretation. The phrase that I think applies in OpenSMILES is:

My understanding of Daylight SMILES is that when the explicit valence based on the bonds is higher than the maximum natural valence then the deduced hydrogen count is 0.

For example, quoting http://www.daylight.com/meetings/summerschool98/course/dave/smiles-intro.html :

> In practice, one chemist might represent nitromethane as C[N+](=O)[O-] with a nitrogen of valence 3 in a charge-separated structure while another might represent it as CN(=O)=O with a neutral 5-valent nitrogen. Which SMILES is correct? Both are.




On Mar 2, 2017, at 20:34, Craig James <[hidden email]> wrote:
> If you can say, "It's obvious ...", and this is a feature everyone would like, then the OpenSMILES spec could be changed.

I think it should be changed.

                                Andrew
                                [hidden email]



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Change to SMILES writer for hypervalent atoms

Noel O'Boyle
Administrator
To avoid multiple threads, let's move this over to the opensmiles list.

On 2 March 2017 at 20:31, Andrew Dalke <[hidden email]> wrote:

> On Mar 2, 2017, at 20:34, Craig James <[hidden email]> wrote:
>> Well, "FIF" violates the OpenSMILES spec in section 3.1.5, which states that the "organic subset" are only allowed outside of brackets if they're in their normal lowest-valence state. Actually, now that I read it, it's not well written and has room for (mis)interpretation. The phrase that I think applies in OpenSMILES is:
>
> My understanding of Daylight SMILES is that when the explicit valence based on the bonds is higher than the maximum natural valence then the deduced hydrogen count is 0.
>
> For example, quoting http://www.daylight.com/meetings/summerschool98/course/dave/smiles-intro.html :
>
>> In practice, one chemist might represent nitromethane as C[N+](=O)[O-] with a nitrogen of valence 3 in a charge-separated structure while another might represent it as CN(=O)=O with a neutral 5-valent nitrogen. Which SMILES is correct? Both are.
>
>
>
>
> On Mar 2, 2017, at 20:34, Craig James <[hidden email]> wrote:
>> If you can say, "It's obvious ...", and this is a feature everyone would like, then the OpenSMILES spec could be changed.
>
> I think it should be changed.
>
>                                 Andrew
>                                 [hidden email]
>
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> OpenBabel-Devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-devel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Change to SMILES writer for hypervalent atoms

Andrew Dalke
In reply to this post by Andrew Dalke
On Mar 2, 2017, at 21:31, Andrew Dalke <[hidden email]> wrote:
>> In practice, one chemist might represent nitromethane as C[N+](=O)[O-] with a nitrogen of valence 3 in a charge-separated structure while another might represent it as CN(=O)=O with a neutral 5-valent nitrogen. Which SMILES is correct? Both are.

I'm sorry. I sent that without fully double-checking/proof-reading. That is not a counter-example.

I'll see if I can find an actual counter-example.

I still think it should be changed.


                                Andrew
                                [hidden email]



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|

Re: Change to SMILES writer for hypervalent atoms

Noel O'Boyle
Administrator
The discussion over at opensmiles was that both forms are reasonable and Geoff prefers the current behavior, so I'll stick to that.

On Thursday, 2 March 2017, Andrew Dalke <[hidden email]> wrote:
On Mar 2, 2017, at 21:31, Andrew Dalke <<a href="javascript:;" onclick="_e(event, &#39;cvml&#39;, &#39;dalke@dalkescientific.com&#39;)">dalke@...> wrote:
>> In practice, one chemist might represent nitromethane as C[N+](=O)[O-] with a nitrogen of valence 3 in a charge-separated structure while another might represent it as CN(=O)=O with a neutral 5-valent nitrogen. Which SMILES is correct? Both are.

I'm sorry. I sent that without fully double-checking/proof-reading. That is not a counter-example.

I'll see if I can find an actual counter-example.

I still think it should be changed.


                                Andrew
                                <a href="javascript:;" onclick="_e(event, &#39;cvml&#39;, &#39;dalke@dalkescientific.com&#39;)">dalke@...



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
<a href="javascript:;" onclick="_e(event, &#39;cvml&#39;, &#39;OpenBabel-Devel@lists.sourceforge.net&#39;)">OpenBabel-Devel@...
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel