Canonical SMILES with disconnected parts

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Canonical SMILES with disconnected parts

xh s
In my program I convert the molecule ClCCl.O.[Cl-] (from both SMILES and OBMol) to canonical SMILES but get two different answers, ClCCl.O.[Cl-] and ClCCl.[Cl-].O

My question is, is the order of disconnected parts controlled by the algorithm so that there is only one unique way?

Thanks!
Xianghai

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Canonical SMILES with disconnected parts

Noel O'Boyle
Administrator
Should be the same. Can you provide the files?

On 27 May 2017 12:51 a.m., "xh s" <[hidden email]> wrote:
In my program I convert the molecule ClCCl.O.[Cl-] (from both SMILES and OBMol) to canonical SMILES but get two different answers, ClCCl.O.[Cl-] and ClCCl.[Cl-].O

My question is, is the order of disconnected parts controlled by the algorithm so that there is only one unique way?

Thanks!
Xianghai

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Canonical SMILES with disconnected parts

Geoff Hutchison
No, this is a known bug with disconnected fragments. The canonical algorithm does not canonicalize the fragments.

Geoff

On May 27, 2017, at 4:20 AM, Noel O'Boyle <[hidden email]> wrote:

Should be the same. Can you provide the files?

On 27 May 2017 12:51 a.m., "xh s" <[hidden email]> wrote:
In my program I convert the molecule ClCCl.O.[Cl-] (from both SMILES and OBMol) to canonical SMILES but get two different answers, ClCCl.O.[Cl-] and ClCCl.[Cl-].O

My question is, is the order of disconnected parts controlled by the algorithm so that there is only one unique way?

Thanks!
Xianghai

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Canonical SMILES with disconnected parts

Tim Vandermeersch
Hi,

I can't reproduce this. The order of fragments should be canonical (the algorithm takes this into account). What is the original source of the OBMol that gives a different result?

For example, the following SMILES all convert to the same canonical form:

ClCCl.O.[Cl-]
O.ClCCl.[Cl-]
O.[Cl-].ClCCl
ClCCl.[Cl-].O
[Cl-].ClCCl.O
[Cl-].O.ClCCl

=> ClCCl.O.[Cl-]

Kind regards,
Tim

On Sat, May 27, 2017 at 4:50 PM, Geoff Hutchison <[hidden email]> wrote:
No, this is a known bug with disconnected fragments. The canonical algorithm does not canonicalize the fragments.

Geoff

On May 27, 2017, at 4:20 AM, Noel O'Boyle <[hidden email]> wrote:

Should be the same. Can you provide the files?

On 27 May 2017 12:51 a.m., "xh s" <[hidden email]> wrote:
In my program I convert the molecule ClCCl.O.[Cl-] (from both SMILES and OBMol) to canonical SMILES but get two different answers, ClCCl.O.[Cl-] and ClCCl.[Cl-].O

My question is, is the order of disconnected parts controlled by the algorithm so that there is only one unique way?

Thanks!
Xianghai

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Canonical SMILES with disconnected parts

xh s
Hi Tim, 

I was able to convert the problematic molecule to a SDF file and reproduce the error. Here's the test.sdf file. I used "obabel test.sdf -Otest.can". 


 OpenBabel05311714443D

  7  4  0  0  0  0  0  0  0  0999 V2000
    1.6789   -1.9571    0.9205 Cl  0  5  0  0  0  0  0  0  0  0  0  0
    2.9546   -0.8420    0.3747 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7238   -0.0642    1.7843 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    4.2107   -1.8015   -0.4843 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    0.7232    1.4157   -0.6233 O   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0756    1.0202   -0.3210 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.5841   -0.0998   -0.3221 H   0  0  0  0  0  0  0  0  0  0  0  0
  2  3  1  0  0  0  0
  2  4  1  0  0  0  0
  6  5  1  0  0  0  0
  7  5  1  0  0  0  0
M  CHG  1   1  -1
M  END
$$$$


Best regards,
Xianghai

On Wed, May 31, 2017 at 8:09 AM, Tim Vandermeersch <[hidden email]> wrote:
Hi,

I can't reproduce this. The order of fragments should be canonical (the algorithm takes this into account). What is the original source of the OBMol that gives a different result?

For example, the following SMILES all convert to the same canonical form:

ClCCl.O.[Cl-]
O.ClCCl.[Cl-]
O.[Cl-].ClCCl
ClCCl.[Cl-].O
[Cl-].ClCCl.O
[Cl-].O.ClCCl

=> ClCCl.O.[Cl-]

Kind regards,
Tim

On Sat, May 27, 2017 at 4:50 PM, Geoff Hutchison <[hidden email]> wrote:
No, this is a known bug with disconnected fragments. The canonical algorithm does not canonicalize the fragments.

Geoff

On May 27, 2017, at 4:20 AM, Noel O'Boyle <[hidden email]> wrote:

Should be the same. Can you provide the files?

On 27 May 2017 12:51 a.m., "xh s" <[hidden email]> wrote:
In my program I convert the molecule ClCCl.O.[Cl-] (from both SMILES and OBMol) to canonical SMILES but get two different answers, ClCCl.O.[Cl-] and ClCCl.[Cl-].O

My question is, is the order of disconnected parts controlled by the algorithm so that there is only one unique way?

Thanks!
Xianghai

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Canonical SMILES with disconnected parts

Noel O'Boyle
Administrator
Ah, it is a bug (or I regard it as one). This is a difference in how explicit hydrogens are handled, and is fixed by #1576 (https://github.com/openbabel/openbabel/pull/1576).

For the moment, to get the same canonical smiles string in this case, you need to remove the hydrogens using "-d".

ClCCl.[Cl-].O vs ClCCl.O.[Cl-]
(with #1576 they both give ClCCl.[Cl-].O)

There is still a "unimplemented feature", and I think this is what Geoff was thinking of. The order within the individual dot-disconnected SMILES is not canonical (if I remember correctly).

- Noel

On 31 May 2017 10:54 p.m., "xh s" <[hidden email]> wrote:
Hi Tim, 

I was able to convert the problematic molecule to a SDF file and reproduce the error. Here's the test.sdf file. I used "obabel test.sdf -Otest.can". 


 OpenBabel05311714443D

  7  4  0  0  0  0  0  0  0  0999 V2000
    1.6789   -1.9571    0.9205 Cl  0  5  0  0  0  0  0  0  0  0  0  0
    2.9546   -0.8420    0.3747 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.7238   -0.0642    1.7843 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    4.2107   -1.8015   -0.4843 Cl  0  0  0  0  0  0  0  0  0  0  0  0
    0.7232    1.4157   -0.6233 O   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0756    1.0202   -0.3210 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.5841   -0.0998   -0.3221 H   0  0  0  0  0  0  0  0  0  0  0  0
  2  3  1  0  0  0  0
  2  4  1  0  0  0  0
  6  5  1  0  0  0  0
  7  5  1  0  0  0  0
M  CHG  1   1  -1
M  END
$$$$


Best regards,
Xianghai

On Wed, May 31, 2017 at 8:09 AM, Tim Vandermeersch <[hidden email]> wrote:
Hi,

I can't reproduce this. The order of fragments should be canonical (the algorithm takes this into account). What is the original source of the OBMol that gives a different result?

For example, the following SMILES all convert to the same canonical form:

ClCCl.O.[Cl-]
O.ClCCl.[Cl-]
O.[Cl-].ClCCl
ClCCl.[Cl-].O
[Cl-].ClCCl.O
[Cl-].O.ClCCl

=> ClCCl.O.[Cl-]

Kind regards,
Tim

On Sat, May 27, 2017 at 4:50 PM, Geoff Hutchison <[hidden email]> wrote:
No, this is a known bug with disconnected fragments. The canonical algorithm does not canonicalize the fragments.

Geoff

On May 27, 2017, at 4:20 AM, Noel O'Boyle <[hidden email]> wrote:

Should be the same. Can you provide the files?

On 27 May 2017 12:51 a.m., "xh s" <[hidden email]> wrote:
In my program I convert the molecule ClCCl.O.[Cl-] (from both SMILES and OBMol) to canonical SMILES but get two different answers, ClCCl.O.[Cl-] and ClCCl.[Cl-].O

My question is, is the order of disconnected parts controlled by the algorithm so that there is only one unique way?

Thanks!
Xianghai

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Loading...