1-substituted adamantane InChIKeys

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

1-substituted adamantane InChIKeys

M.D. Driver

Hi,

I've got a problem with the InChIKeys being generated from CML for a series of adamantanes. The structures attached in the cml were generated in torch (from a smiles string) and then converted from and sdf to CML using open babel. I'm trying to use the function in the python script to add the InChIKey of the CML to the attributes (the function takes an lxml.etree.Element representation of the molecule CML block as input, and adds the generated InChIKey). I want to be able to match these 3D structures to experimental data for them that is stored in xml, which uses the InChIKey as an id for the molecule.

From the csv file the expected InChIKey and the canonicalised smiles used to generate it (in the columns exp_inchikey and exp_smiles respectively). The InChIKey that was actually generated for the cml is in the cml_inchikey column. The second part of the inchikey is different, and I was wondering why this is the case? Is it to do with some unseen stereo-chemistry that isn't in the smiles used to generate it, or is it to do with the options I'm using for the conversion or something else that I haven't thought of?

Note: the expected inchikey is taken from the chemspider entry for the molecule.

Thanks,

Mark Driver

PhD student

University of Cambridge


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

adamantaneInChIKeys.csv (500 bytes) Download Attachment
obabelInchKeyfunction.py (788 bytes) Download Attachment
adamantaneexamples.cml (21K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: 1-substituted adamantane InChIKeys

John May
Hi,

I think this is pretty easy to explain but just to clarify, you converted SMILES to SDF/CML with 3D coordinates? If you're input didn't have stereochemistry before this conversion it will always have it defined after. A simpler example of butan-2-ol demonstrates this:

> obabel -:'CCC(C)O' -osdf --gen3d | obabel -isdf -osmi
1 molecule converted
CC[C@@H](C)O

> obabel -:'CCC(C)O' -osdf | obabel -isdf -oinchikey
==============================
*** Open Babel Warning  in WriteMolecule
  No 2D or 3D coordinates exist. Stereochemical information will be stored using an Open Babel extension. To generate 2D or 3D coordinates instead use --gen2D or --gen3D.
1 molecule converted
==============================
*** Open Babel Warning  in InChI code
  #1 :Omitted undefined stereo
BTANRVKWQNVYAZ-UHFFFAOYSA-N
 
> obabel -:'CCC(C)O' -osdf --gen3d | obabel -isdf -oinchikey
1 molecule converted
BTANRVKWQNVYAZ-SCSAIBSYSA-N 

What's more fun is I can get a different key by changing the input order:

> obabel -:'CCC(O)C' -osdf --gen3d | obabel -isdf -oinchikey
1 molecule converted
BTANRVKWQNVYAZ-BYPYZUCNSA-N
 
John

Regards,
John W May
[hidden email]

On 8 March 2016 at 11:55, M.D. Driver <[hidden email]> wrote:

Hi,

I've got a problem with the InChIKeys being generated from CML for a series of adamantanes. The structures attached in the cml were generated in torch (from a smiles string) and then converted from and sdf to CML using open babel. I'm trying to use the function in the python script to add the InChIKey of the CML to the attributes (the function takes an lxml.etree.Element representation of the molecule CML block as input, and adds the generated InChIKey). I want to be able to match these 3D structures to experimental data for them that is stored in xml, which uses the InChIKey as an id for the molecule.

From the csv file the expected InChIKey and the canonicalised smiles used to generate it (in the columns exp_inchikey and exp_smiles respectively). The InChIKey that was actually generated for the cml is in the cml_inchikey column. The second part of the inchikey is different, and I was wondering why this is the case? Is it to do with some unseen stereo-chemistry that isn't in the smiles used to generate it, or is it to do with the options I'm using for the conversion or something else that I haven't thought of?

Note: the expected inchikey is taken from the chemspider entry for the molecule.

Thanks,

Mark Driver

PhD student

University of Cambridge


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss



------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: 1-substituted adamantane InChIKeys

Geoff Hutchison
In reply to this post by M.D. Driver
On Mar 9, 2016, at 11:53 AM, John M <[hidden email]> wrote:
I think this is pretty easy to explain but just to clarify, you converted SMILES to SDF/CML with 3D coordinates? If you're input didn't have stereochemistry before this conversion it will always have it defined after. A simpler example of butan-2-ol demonstrates this:


No, I think the question is "why is there stereochemistry in these modified adamantanes, which don't look chiral."

Indeed, Open Babel is declaring that the SMILES have undefined stereochemistry:

obabel -:'CC(C)(C)C(=O)C12CC3CC(CC(C3)C1)C2' -oinchi
==============================
*** Open Babel Warning  in InChI code
  #1 :Omitted undefined stereo

The key to debugging this is not in the InChI Key but in the InChI itself. I don't know where the supposedly "undefined" stereo center is.

-Geoff

On Mar 8, 2016, at 6:55 AM, M.D. Driver <[hidden email]> wrote:

Hi,

I've got a problem with the InChIKeys being generated from CML for a series of adamantanes. The structures attached in the cml were generated in torch (from a smiles string) and then converted from and sdf to CML using open babel. I'm trying to use the function in the python script to add the InChIKey of the CML to the attributes (the function takes an lxml.etree.Element representation of the molecule CML block as input, and adds the generated InChIKey). I want to be able to match these 3D structures to experimental data for them that is stored in xml, which uses the InChIKey as an id for the molecule.

From the csv file the expected InChIKey and the canonicalised smiles used to generate it (in the columns exp_inchikey and exp_smiles respectively). The InChIKey that was actually generated for the cml is in the cml_inchikey column. The second part of the inchikey is different, and I was wondering why this is the case? Is it to do with some unseen stereo-chemistry that isn't in the smiles used to generate it, or is it to do with the options I'm using for the conversion or something else that I haven't thought of?

Note: the expected inchikey is taken from the chemspider entry for the molecule.

Thanks,

Mark Driver

PhD student

University of Cambridge

<adamantaneInChIKeys.csv><obabelInchKeyfunction.py><adamantaneexamples.cml>------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: 1-substituted adamantane InChIKeys

Stefano Forli
I had a similar experience on that, and it seems the Open Babel has a problem with
adamantane in general (i.e., unsubstituted):

  obabel -:'C1C3CC2CC(CC1C2)C3' -oinchi
  ==============================
  *** Open Babel Warning  in InChI code
    #1 :Omitted undefined stereo
  InChI=1S/C10H16/c1-7-2-9-4-8(1)5-10(3-7)6-9/h7-10H,1-6H2
  1 molecule converted

In my code, I had to blacklist the adamantane group to be analyzed for chirality purposes.


On 03/09/2016 10:28 AM, Geoffrey Hutchison wrote:

>> On Mar 9, 2016, at 11:53 AM, John M <[hidden email]
>> <mailto:[hidden email]>> wrote:
>> I think this is pretty easy to explain but just to clarify, you converted SMILES to
>> SDF/CML with 3D coordinates? If you're input didn't have stereochemistry before this
>> conversion it will always have it defined after. A simpler example of butan-2-ol
>> demonstrates this:
>
>
> No, I think the question is "why is there stereochemistry in these modified adamantanes,
> which don't look chiral."
>
> Indeed, Open Babel is declaring that the SMILES have undefined stereochemistry:
>
> obabel -:'CC(C)(C)C(=O)C12CC3CC(CC(C3)C1)C2' -oinchi
> ==============================
> *** Open Babel Warning  in InChI code
>    #1 :Omitted undefined stereo
>
> The key to debugging this is not in the InChI Key but in the InChI itself. I don't know
> where the supposedly "undefined" stereo center is.
>
> -Geoff
>
>> On Mar 8, 2016, at 6:55 AM, M.D. Driver <[hidden email] <mailto:[hidden email]>> wrote:
>>
>> Hi,
>>
>> I've got a problem with the InChIKeys being generated from CML for a series of
>> adamantanes. The structures attached in the cml were generated in torch (from a smiles
>> string) and then converted from and sdf to CML using open babel. I'm trying to use the
>> function in the python script to add the InChIKey of the CML to the attributes (the
>> function takes an lxml.etree.Element representation of the molecule CML block as input,
>> and adds the generated InChIKey). I want to be able to match these 3D structures to
>> experimental data for them that is stored in xml, which uses the InChIKey as an id for
>> the molecule.
>>
>> From the csv file the expected InChIKey and the canonicalised smiles used to generate it
>> (in the columns exp_inchikey and exp_smiles respectively). The InChIKey that was
>> actually generated for the cml is in the cml_inchikey column. The second part of the
>> inchikey is different, and I was wondering why this is the case? Is it to do with some
>> unseen stereo-chemistry that isn't in the smiles used to generate it, or is it to do
>> with the options I'm using for the conversion or something else that I haven't thought of?
>>
>> Note: the expected inchikey is taken from the chemspider entry for the molecule.
>>
>> Thanks,
>>
>> Mark Driver
>>
>> PhD student
>>
>> University of Cambridge
>>
>> <adamantaneInChIKeys.csv><obabelInchKeyfunction.py><adamantaneexamples.cml>------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://makebettercode.com/inteldaal-eval_______________________________________________
>> OpenBabel-discuss mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>

--

  Stefano Forli, PhD

  Assistant Professor of Integrative
  Structural and Computational Biology,
  Molecular Graphics Laboratory

  Dept. of Integrative Structural
   and Computational Biology, MB-112A
  The Scripps Research Institute
  10550  North Torrey Pines Road
  La Jolla,  CA 92037-1000,  USA.

     tel: +1 (858)784-2055
     fax: +1 (858)784-2860
     email: [hidden email]
     http://www.scripps.edu/~forli/

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss