(no subject)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

(no subject)

Marcos Villarreal
Hello,

For an application we are developing, we would like to get an atom typing independent of the input format.
For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens of the same molecule (i.e. identical heavy atom coordinates) should get the same atom types.
The attached program is our try in that direction, but unfortunately without success. How could one get ride off all the input information and let babel do all the new calculations of atom types?

Thank you in advance.


int main(int argc,char **argv)
{

  OpenBabel::OBConversion conv;
  OpenBabel::OBMol mol;
  std::string filename;
  filename = argv[1];

  conv.ReadFile(&mol,filename);

  mol.DeleteHydrogens();
  mol.ConnectTheDots();
  mol.PerceiveBondOrders();

  int i=0;
  FOR_ATOMS_OF_MOL(atom, mol) {
     i++;
     std::cout << i << ": " << atom->GetType() << std::endl ;
  }

}



--
Marcos Villarreal
Dpto de Química Teórica y Computacional
Facultad de Ciencias Químicas
Universidad Nacional de Córdoba
Argentina.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: (no subject)

Noel O'Boyle
Administrator
In other words, you want to assign atom types based on the structure.
The source of the structure is immaterial except in so far as it
introduces noise. For example, to read a PDB file you need to guess
various things. To read a MOL file, you don't need to guess anything.

Regarding your code, you should never throw away information and then
try to guess it. Also, I note in passing that DeleteHydrogens()
doesn't delete anything, it just suppresses any explicit hydrogens.

I'm a bit unclear why you are using the internal Open Babel atom
types. Personally, I would avoid this as the atom types may not be
suitable. Instead, just implement your own atom type function to suit
your needs. Any atom typing can be implemented as a function that
takes an OBAtom* and returns the type, perhaps as an enum.

- Noel

On 22 May 2017 at 18:56, Marcos Villarreal <[hidden email]> wrote:

> Hello,
>
> For an application we are developing, we would like to get an atom typing
> independent of the input format.
> For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens of
> the same molecule (i.e. identical heavy atom coordinates) should get the
> same atom types.
> The attached program is our try in that direction, but unfortunately without
> success. How could one get ride off all the input information and let babel
> do all the new calculations of atom types?
>
> Thank you in advance.
>
>
> int main(int argc,char **argv)
> {
>
>   OpenBabel::OBConversion conv;
>   OpenBabel::OBMol mol;
>   std::string filename;
>   filename = argv[1];
>
>   conv.ReadFile(&mol,filename);
>
>   mol.DeleteHydrogens();
>   mol.ConnectTheDots();
>   mol.PerceiveBondOrders();
>
>   int i=0;
>   FOR_ATOMS_OF_MOL(atom, mol) {
>      i++;
>      std::cout << i << ": " << atom->GetType() << std::endl ;
>   }
>
> }
>
>
>
> --
> Marcos Villarreal
> Dpto de Química Teórica y Computacional
> Facultad de Ciencias Químicas
> Universidad Nacional de Córdoba
> Argentina.
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> OpenBabel-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: (no subject)

mirix
In reply to this post by Marcos Villarreal
Quick and dirty workaround: Convert it to .xyz (removing the Hs if needed) then compute the atom types from that file and see what happens... 

On 22 May 2017 19:24, "Marcos Villarreal" <[hidden email]> wrote:
Hello,

For an application we are developing, we would like to get an atom typing independent of the input format.
For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens of the same molecule (i.e. identical heavy atom coordinates) should get the same atom types.
The attached program is our try in that direction, but unfortunately without success. How could one get ride off all the input information and let babel do all the new calculations of atom types?

Thank you in advance.


int main(int argc,char **argv)
{

  OpenBabel::OBConversion conv;
  OpenBabel::OBMol mol;
  std::string filename;
  filename = argv[1];

  conv.ReadFile(&mol,filename);

  mol.DeleteHydrogens();
  mol.ConnectTheDots();
  mol.PerceiveBondOrders();

  int i=0;
  FOR_ATOMS_OF_MOL(atom, mol) {
     i++;
     std::cout << i << ": " << atom->GetType() << std::endl ;
  }

}



--
Marcos Villarreal
Dpto de Química Teórica y Computacional
Facultad de Ciencias Químicas
Universidad Nacional de Córdoba
Argentina.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: (no subject)

Marcos Villarreal
In reply to this post by Noel O'Boyle

Dear Noel Thank you for your answer. Please see my comments bellow.

2017-05-22 16:00 GMT-03:00 Noel O'Boyle <[hidden email]>:
In other words, you want to assign atom types based on the structure.

   Yes, that's right.
 
The source of the structure is immaterial except in so far as it
introduces noise. For example, to read a PDB file you need to guess
various things. To read a MOL file, you don't need to guess anything.
 
That noise is what we are trying to avoid by always calculating (guessing) things with the same algorithm.
 
Regarding your code, you should never throw away information and then
try to guess it.
 
Well, that depend on your faith on the quality of the information putted in the input format.
One can always set a flag to keep the input information if its considered accurate enough, but if you want consistency regarding the input file format I don't see other way but to strip off all the information in the input and recalculate it.

Also, I note in passing that DeleteHydrogens()
doesn't delete anything, it just suppresses any explicit hydrogens.

I'm a bit unclear why you are using the internal Open Babel atom
types. Personally, I would avoid this as the atom types may not be
suitable.
Instead, just implement your own atom type function to suit
your needs. Any atom typing can be implemented as a function that
takes an OBAtom* and returns the type, perhaps as an enum.

Are you referring to functions like "IsAmideNitrogen" or so?.  We used these functions, and they worked just fine for our needs.
The problem we faced was with "IsAromatic" that we couldn't make it input-format agnostic. Our guess is that some information of the input format is always remaining when calling it, regardless UnsetAromaticPerceived and the like were called before.
This lead us to try the route of put all the atom types in internal Open Babel types and build upon it.
 
- Noel

On 22 May 2017 at 18:56, Marcos Villarreal <[hidden email]> wrote:
> Hello,
>
> For an application we are developing, we would like to get an atom typing
> independent of the input format.
> For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens of
> the same molecule (i.e. identical heavy atom coordinates) should get the
> same atom types.
> The attached program is our try in that direction, but unfortunately without
> success. How could one get ride off all the input information and let babel
> do all the new calculations of atom types?
>
> Thank you in advance.
>
>
> int main(int argc,char **argv)
> {
>
>   OpenBabel::OBConversion conv;
>   OpenBabel::OBMol mol;
>   std::string filename;
>   filename = argv[1];
>
>   conv.ReadFile(&mol,filename);
>
>   mol.DeleteHydrogens();
>   mol.ConnectTheDots();
>   mol.PerceiveBondOrders();
>
>   int i=0;
>   FOR_ATOMS_OF_MOL(atom, mol) {
>      i++;
>      std::cout << i << ": " << atom->GetType() << std::endl ;
>   }
>
> }
>
>
>
> --
> Marcos Villarreal
> Dpto de Química Teórica y Computacional
> Facultad de Ciencias Químicas
> Universidad Nacional de Córdoba
> Argentina.
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> OpenBabel-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>



--
Marcos Villarreal
Dpto de Química Teórica y Computacional
Facultad de Ciencias Químicas
Universidad Nacional de Cordoba

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: (no subject)

Marcos Villarreal
In reply to this post by mirix
Thank you Miro for your answer. Tha is in the spirit of we want to do, but without writing an intermediate file. We think that all the conversions can be done inside the code.

Marcos.

2017-05-22 16:06 GMT-03:00 Miro Moman <[hidden email]>:
Quick and dirty workaround: Convert it to .xyz (removing the Hs if needed) then compute the atom types from that file and see what happens... 

On 22 May 2017 19:24, "Marcos Villarreal" <[hidden email]> wrote:
Hello,

For an application we are developing, we would like to get an atom typing independent of the input format.
For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens of the same molecule (i.e. identical heavy atom coordinates) should get the same atom types.
The attached program is our try in that direction, but unfortunately without success. How could one get ride off all the input information and let babel do all the new calculations of atom types?

Thank you in advance.


int main(int argc,char **argv)
{

  OpenBabel::OBConversion conv;
  OpenBabel::OBMol mol;
  std::string filename;
  filename = argv[1];

  conv.ReadFile(&mol,filename);

  mol.DeleteHydrogens();
  mol.ConnectTheDots();
  mol.PerceiveBondOrders();

  int i=0;
  FOR_ATOMS_OF_MOL(atom, mol) {
     i++;
     std::cout << i << ": " << atom->GetType() << std::endl ;
  }

}



--
Marcos Villarreal
Dpto de Química Teórica y Computacional
Facultad de Ciencias Químicas
Universidad Nacional de Córdoba
Argentina.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss




--
Marcos Villarreal
Dpto de Química Teórica y Computacional
Facultad de Ciencias Químicas
Universidad Nacional de Cordoba

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: (no subject)

Noel O'Boyle
Administrator
In reply to this post by Marcos Villarreal
Maybe if you can give an example of the problem with aromaticity, we
can help? The only information that is used by that function is the
structure, so it was probably wrong at that point.

On 23 May 2017 at 13:16, Marcos Villarreal <[hidden email]> wrote:

>
> Dear Noel Thank you for your answer. Please see my comments bellow.
>
> 2017-05-22 16:00 GMT-03:00 Noel O'Boyle <[hidden email]>:
>>
>> In other words, you want to assign atom types based on the structure.
>
>
>    Yes, that's right.
>
>>
>> The source of the structure is immaterial except in so far as it
>> introduces noise. For example, to read a PDB file you need to guess
>> various things. To read a MOL file, you don't need to guess anything.
>
>
> That noise is what we are trying to avoid by always calculating (guessing)
> things with the same algorithm.
>
>>
>> Regarding your code, you should never throw away information and then
>> try to guess it.
>
>
> Well, that depend on your faith on the quality of the information putted in
> the input format.
> One can always set a flag to keep the input information if its considered
> accurate enough, but if you want consistency regarding the input file format
> I don't see other way but to strip off all the information in the input and
> recalculate it.
>
>> Also, I note in passing that DeleteHydrogens()
>> doesn't delete anything, it just suppresses any explicit hydrogens.
>
>
>> I'm a bit unclear why you are using the internal Open Babel atom
>> types. Personally, I would avoid this as the atom types may not be
>> suitable.
>>
>> Instead, just implement your own atom type function to suit
>> your needs. Any atom typing can be implemented as a function that
>> takes an OBAtom* and returns the type, perhaps as an enum.
>
>
> Are you referring to functions like "IsAmideNitrogen" or so?.  We used these
> functions, and they worked just fine for our needs.
> The problem we faced was with "IsAromatic" that we couldn't make it
> input-format agnostic. Our guess is that some information of the input
> format is always remaining when calling it, regardless
> UnsetAromaticPerceived and the like were called before.
> This lead us to try the route of put all the atom types in internal Open
> Babel types and build upon it.
>
>>
>> - Noel
>>
>> On 22 May 2017 at 18:56, Marcos Villarreal <[hidden email]> wrote:
>> > Hello,
>> >
>> > For an application we are developing, we would like to get an atom
>> > typing
>> > independent of the input format.
>> > For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens
>> > of
>> > the same molecule (i.e. identical heavy atom coordinates) should get the
>> > same atom types.
>> > The attached program is our try in that direction, but unfortunately
>> > without
>> > success. How could one get ride off all the input information and let
>> > babel
>> > do all the new calculations of atom types?
>> >
>> > Thank you in advance.
>> >
>> >
>> > int main(int argc,char **argv)
>> > {
>> >
>> >   OpenBabel::OBConversion conv;
>> >   OpenBabel::OBMol mol;
>> >   std::string filename;
>> >   filename = argv[1];
>> >
>> >   conv.ReadFile(&mol,filename);
>> >
>> >   mol.DeleteHydrogens();
>> >   mol.ConnectTheDots();
>> >   mol.PerceiveBondOrders();
>> >
>> >   int i=0;
>> >   FOR_ATOMS_OF_MOL(atom, mol) {
>> >      i++;
>> >      std::cout << i << ": " << atom->GetType() << std::endl ;
>> >   }
>> >
>> > }
>> >
>> >
>> >
>> > --
>> > Marcos Villarreal
>> > Dpto de Química Teórica y Computacional
>> > Facultad de Ciencias Químicas
>> > Universidad Nacional de Córdoba
>> > Argentina.
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> > _______________________________________________
>> > OpenBabel-discuss mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>> >
>
>
>
>
> --
> Marcos Villarreal
> Dpto de Química Teórica y Computacional
> Facultad de Ciencias Químicas
> Universidad Nacional de Cordoba

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: (no subject)

Noel O'Boyle
Administrator
When I convert the molecules as given with obabel, you're right - you
run into a bug that's been fixed on the development branch -
aromaticity is perceived differently depending on the presence/absence
of explicit hydrogens:

> obabel 3rlb_ligand.* -osmi
Cc1nc(N)c(Cn2csc(CCO)c2C)cn1    3rlb_ligand
Cc1nc(N)c(CN2CSC(=C2C)CCO)cn1   ./3rlb_ligand.pdb

If you delete the explicit Hs first, you can get the same aromaticity
perception for both:
>obabel 3rlb_ligand.* -d -O tmp.sdf
>obabel tmp.sdf -osmi
Cc1nc(N)c(CN2=CSC(=C2C)CCO)cn1  3rlb_ligand
Cc1nc(N)c(CN2CSC(=C2C)CCO)cn1   ./3rlb_ligand.pdb

If you paste these SMILES into Marvin Sketch you can see the
difference. The MOL2 file contains an extra double bond to a nitrogen.
So what's going on?...

I'm guessing that the correct structure is in the MOL2 file, but it
was read incorrectly by Open Babel and so is missing the charge on the
4-valent nitrogen. MOL2 is a horrible format but we should do a better
job. I note in passing that MarvinSketch interprets it the same as
Open Babel but that's no excuse.

The PDB file of course does not contain any bond orders and so we
guess them. We do an okay job - this is an example where we miss the
bond. If you removed these bond orders from the MOL2 file you would
get the same wrong structure too.

- Noel



On 23 May 2017 at 15:24, Marcos Villarreal <[hidden email]> wrote:

> Here is one example from the PDBBind refine data set.
> Please find bellow the code, the output, and attached the mol2 and the pdb
> input files.
>
> Code:
>
> #include <iostream>
> #include <openbabel/obconversion.h>
> #include <openbabel/obiter.h>
> #include <openbabel/mol.h>
> #include <openbabel/atom.h>
>
> int main(int argc,char **argv)
> {
>
>   OpenBabel::OBConversion conv;
>   OpenBabel::OBMol mol;
>   std::string filename;
>   filename = argv[1];
>
>   conv.ReadFile(&mol,filename);
>
>   mol.DeleteHydrogens();
>   mol.ConnectTheDots();
>   mol.PerceiveBondOrders();
>   mol.UnsetAromaticPerceived();
>
>   FOR_ATOMS_OF_MOL(atom, mol) {
>      std::cout << atom->IsAromatic() ;
>   }
>
> }
>
> Output:
> 000000111110000000 (mol2)
> 000000000000111111 (pdb)
>
>
>
> 2017-05-23 9:43 GMT-03:00 Noel O'Boyle <[hidden email]>:
>>
>> Maybe if you can give an example of the problem with aromaticity, we
>> can help? The only information that is used by that function is the
>> structure, so it was probably wrong at that point.
>>
>> On 23 May 2017 at 13:16, Marcos Villarreal <[hidden email]> wrote:
>> >
>> > Dear Noel Thank you for your answer. Please see my comments bellow.
>> >
>> > 2017-05-22 16:00 GMT-03:00 Noel O'Boyle <[hidden email]>:
>> >>
>> >> In other words, you want to assign atom types based on the structure.
>> >
>> >
>> >    Yes, that's right.
>> >
>> >>
>> >> The source of the structure is immaterial except in so far as it
>> >> introduces noise. For example, to read a PDB file you need to guess
>> >> various things. To read a MOL file, you don't need to guess anything.
>> >
>> >
>> > That noise is what we are trying to avoid by always calculating
>> > (guessing)
>> > things with the same algorithm.
>> >
>> >>
>> >> Regarding your code, you should never throw away information and then
>> >> try to guess it.
>> >
>> >
>> > Well, that depend on your faith on the quality of the information putted
>> > in
>> > the input format.
>> > One can always set a flag to keep the input information if its
>> > considered
>> > accurate enough, but if you want consistency regarding the input file
>> > format
>> > I don't see other way but to strip off all the information in the input
>> > and
>> > recalculate it.
>> >
>> >> Also, I note in passing that DeleteHydrogens()
>> >> doesn't delete anything, it just suppresses any explicit hydrogens.
>> >
>> >
>> >> I'm a bit unclear why you are using the internal Open Babel atom
>> >> types. Personally, I would avoid this as the atom types may not be
>> >> suitable.
>> >>
>> >> Instead, just implement your own atom type function to suit
>> >> your needs. Any atom typing can be implemented as a function that
>> >> takes an OBAtom* and returns the type, perhaps as an enum.
>> >
>> >
>> > Are you referring to functions like "IsAmideNitrogen" or so?.  We used
>> > these
>> > functions, and they worked just fine for our needs.
>> > The problem we faced was with "IsAromatic" that we couldn't make it
>> > input-format agnostic. Our guess is that some information of the input
>> > format is always remaining when calling it, regardless
>> > UnsetAromaticPerceived and the like were called before.
>> > This lead us to try the route of put all the atom types in internal Open
>> > Babel types and build upon it.
>> >
>> >>
>> >> - Noel
>> >>
>> >> On 22 May 2017 at 18:56, Marcos Villarreal <[hidden email]>
>> >> wrote:
>> >> > Hello,
>> >> >
>> >> > For an application we are developing, we would like to get an atom
>> >> > typing
>> >> > independent of the input format.
>> >> > For example a mol2 with all Hydrogen atoms and a pdb without
>> >> > Hydrogens
>> >> > of
>> >> > the same molecule (i.e. identical heavy atom coordinates) should get
>> >> > the
>> >> > same atom types.
>> >> > The attached program is our try in that direction, but unfortunately
>> >> > without
>> >> > success. How could one get ride off all the input information and let
>> >> > babel
>> >> > do all the new calculations of atom types?
>> >> >
>> >> > Thank you in advance.
>> >> >
>> >> >
>> >> > int main(int argc,char **argv)
>> >> > {
>> >> >
>> >> >   OpenBabel::OBConversion conv;
>> >> >   OpenBabel::OBMol mol;
>> >> >   std::string filename;
>> >> >   filename = argv[1];
>> >> >
>> >> >   conv.ReadFile(&mol,filename);
>> >> >
>> >> >   mol.DeleteHydrogens();
>> >> >   mol.ConnectTheDots();
>> >> >   mol.PerceiveBondOrders();
>> >> >
>> >> >   int i=0;
>> >> >   FOR_ATOMS_OF_MOL(atom, mol) {
>> >> >      i++;
>> >> >      std::cout << i << ": " << atom->GetType() << std::endl ;
>> >> >   }
>> >> >
>> >> > }
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Marcos Villarreal
>> >> > Dpto de Química Teórica y Computacional
>> >> > Facultad de Ciencias Químicas
>> >> > Universidad Nacional de Córdoba
>> >> > Argentina.
>> >> >
>> >> >
>> >> >
>> >> > ------------------------------------------------------------------------------
>> >> > Check out the vibrant tech community on one of the world's most
>> >> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> >> > _______________________________________________
>> >> > OpenBabel-discuss mailing list
>> >> > [hidden email]
>> >> > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > Marcos Villarreal
>> > Dpto de Química Teórica y Computacional
>> > Facultad de Ciencias Químicas
>> > Universidad Nacional de Cordoba
>
>
>
>
> --
> Marcos Villarreal
> Dpto de Química Teórica y Computacional
> Facultad de Ciencias Químicas
> Universidad Nacional de Cordoba

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: (no subject)

Marcos Villarreal
Thank you Noel for look into this.

So how do you suggest to do this inside the code, that is without passing for and intermediate file.
I remind you that our gol is to get the same atom types (say aromatics) regardless the input format.
For now we are interested in consistency before "accuracy", which is another subject. As a related note, we have tested several atom typing programs (Knodle, I-interpret, Unicon and also Open Babel) and the perception of the number of aromatic atoms typically differ in 10-20 % when analyzing a 3600 structures in the PDBbind database.


2017-05-23 11:47 GMT-03:00 Noel O'Boyle <[hidden email]>:
When I convert the molecules as given with obabel, you're right - you
run into a bug that's been fixed on the development branch -
aromaticity is perceived differently depending on the presence/absence
of explicit hydrogens:

> obabel 3rlb_ligand.* -osmi
Cc1nc(N)c(Cn2csc(CCO)c2C)cn1    3rlb_ligand
Cc1nc(N)c(CN2CSC(=C2C)CCO)cn1   ./3rlb_ligand.pdb

If you delete the explicit Hs first, you can get the same aromaticity
perception for both:
>obabel 3rlb_ligand.* -d -O tmp.sdf
>obabel tmp.sdf -osmi
Cc1nc(N)c(CN2=CSC(=C2C)CCO)cn1  3rlb_ligand
Cc1nc(N)c(CN2CSC(=C2C)CCO)cn1   ./3rlb_ligand.pdb

If you paste these SMILES into Marvin Sketch you can see the
difference. The MOL2 file contains an extra double bond to a nitrogen.
So what's going on?...

I'm guessing that the correct structure is in the MOL2 file, but it
was read incorrectly by Open Babel and so is missing the charge on the
4-valent nitrogen. MOL2 is a horrible format but we should do a better
job. I note in passing that MarvinSketch interprets it the same as
Open Babel but that's no excuse.

The PDB file of course does not contain any bond orders and so we
guess them. We do an okay job - this is an example where we miss the
bond. If you removed these bond orders from the MOL2 file you would
get the same wrong structure too.

- Noel



On 23 May 2017 at 15:24, Marcos Villarreal <[hidden email]> wrote:
> Here is one example from the PDBBind refine data set.
> Please find bellow the code, the output, and attached the mol2 and the pdb
> input files.
>
> Code:
>
> #include <iostream>
> #include <openbabel/obconversion.h>
> #include <openbabel/obiter.h>
> #include <openbabel/mol.h>
> #include <openbabel/atom.h>
>
> int main(int argc,char **argv)
> {
>
>   OpenBabel::OBConversion conv;
>   OpenBabel::OBMol mol;
>   std::string filename;
>   filename = argv[1];
>
>   conv.ReadFile(&mol,filename);
>
>   mol.DeleteHydrogens();
>   mol.ConnectTheDots();
>   mol.PerceiveBondOrders();
>   mol.UnsetAromaticPerceived();
>
>   FOR_ATOMS_OF_MOL(atom, mol) {
>      std::cout << atom->IsAromatic() ;
>   }
>
> }
>
> Output:
> 000000111110000000 (mol2)
> 000000000000111111 (pdb)
>
>
>
> 2017-05-23 9:43 GMT-03:00 Noel O'Boyle <[hidden email]>:
>>
>> Maybe if you can give an example of the problem with aromaticity, we
>> can help? The only information that is used by that function is the
>> structure, so it was probably wrong at that point.
>>
>> On 23 May 2017 at 13:16, Marcos Villarreal <[hidden email]> wrote:
>> >
>> > Dear Noel Thank you for your answer. Please see my comments bellow.
>> >
>> > 2017-05-22 16:00 GMT-03:00 Noel O'Boyle <[hidden email]>:
>> >>
>> >> In other words, you want to assign atom types based on the structure.
>> >
>> >
>> >    Yes, that's right.
>> >
>> >>
>> >> The source of the structure is immaterial except in so far as it
>> >> introduces noise. For example, to read a PDB file you need to guess
>> >> various things. To read a MOL file, you don't need to guess anything.
>> >
>> >
>> > That noise is what we are trying to avoid by always calculating
>> > (guessing)
>> > things with the same algorithm.
>> >
>> >>
>> >> Regarding your code, you should never throw away information and then
>> >> try to guess it.
>> >
>> >
>> > Well, that depend on your faith on the quality of the information putted
>> > in
>> > the input format.
>> > One can always set a flag to keep the input information if its
>> > considered
>> > accurate enough, but if you want consistency regarding the input file
>> > format
>> > I don't see other way but to strip off all the information in the input
>> > and
>> > recalculate it.
>> >
>> >> Also, I note in passing that DeleteHydrogens()
>> >> doesn't delete anything, it just suppresses any explicit hydrogens.
>> >
>> >
>> >> I'm a bit unclear why you are using the internal Open Babel atom
>> >> types. Personally, I would avoid this as the atom types may not be
>> >> suitable.
>> >>
>> >> Instead, just implement your own atom type function to suit
>> >> your needs. Any atom typing can be implemented as a function that
>> >> takes an OBAtom* and returns the type, perhaps as an enum.
>> >
>> >
>> > Are you referring to functions like "IsAmideNitrogen" or so?.  We used
>> > these
>> > functions, and they worked just fine for our needs.
>> > The problem we faced was with "IsAromatic" that we couldn't make it
>> > input-format agnostic. Our guess is that some information of the input
>> > format is always remaining when calling it, regardless
>> > UnsetAromaticPerceived and the like were called before.
>> > This lead us to try the route of put all the atom types in internal Open
>> > Babel types and build upon it.
>> >
>> >>
>> >> - Noel
>> >>
>> >> On 22 May 2017 at 18:56, Marcos Villarreal <[hidden email]>
>> >> wrote:
>> >> > Hello,
>> >> >
>> >> > For an application we are developing, we would like to get an atom
>> >> > typing
>> >> > independent of the input format.
>> >> > For example a mol2 with all Hydrogen atoms and a pdb without
>> >> > Hydrogens
>> >> > of
>> >> > the same molecule (i.e. identical heavy atom coordinates) should get
>> >> > the
>> >> > same atom types.
>> >> > The attached program is our try in that direction, but unfortunately
>> >> > without
>> >> > success. How could one get ride off all the input information and let
>> >> > babel
>> >> > do all the new calculations of atom types?
>> >> >
>> >> > Thank you in advance.
>> >> >
>> >> >
>> >> > int main(int argc,char **argv)
>> >> > {
>> >> >
>> >> >   OpenBabel::OBConversion conv;
>> >> >   OpenBabel::OBMol mol;
>> >> >   std::string filename;
>> >> >   filename = argv[1];
>> >> >
>> >> >   conv.ReadFile(&mol,filename);
>> >> >
>> >> >   mol.DeleteHydrogens();
>> >> >   mol.ConnectTheDots();
>> >> >   mol.PerceiveBondOrders();
>> >> >
>> >> >   int i=0;
>> >> >   FOR_ATOMS_OF_MOL(atom, mol) {
>> >> >      i++;
>> >> >      std::cout << i << ": " << atom->GetType() << std::endl ;
>> >> >   }
>> >> >
>> >> > }
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Marcos Villarreal
>> >> > Dpto de Química Teórica y Computacional
>> >> > Facultad de Ciencias Químicas
>> >> > Universidad Nacional de Córdoba
>> >> > Argentina.
>> >> >
>> >> >
>> >> >
>> >> > ------------------------------------------------------------------------------
>> >> > Check out the vibrant tech community on one of the world's most
>> >> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> >> > _______________________________________________
>> >> > OpenBabel-discuss mailing list
>> >> > [hidden email]
>> >> > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > Marcos Villarreal
>> > Dpto de Química Teórica y Computacional
>> > Facultad de Ciencias Químicas
>> > Universidad Nacional de Cordoba
>
>
>
>
> --
> Marcos Villarreal
> Dpto de Química Teórica y Computacional
> Facultad de Ciencias Químicas
> Universidad Nacional de Cordoba



--
Marcos Villarreal
Dpto de Química Teórica y Computacional
Facultad de Ciencias Químicas
Universidad Nacional de Cordoba

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: (no subject)

Geoff Hutchison
> For now we are interested in consistency before "accuracy", which is another subject. As a related note, we have tested several atom typing programs (Knodle, I-interpret, Unicon and also Open Babel) and the perception of the number of aromatic atoms typically differ in 10-20 % when analyzing a 3600 structures in the PDBbind database.

This is hardly surprising. For one, if I take 10 organic chemists in a room and ask them to identify aromatic rings, I’ll get at least 10-20% variation.

More specifically, there is not one uniform cheminformatics model for aromaticity - because there is no well-defined chemical definition. That’s omitting the hard cases, even given a specific aromatic model. I’d guess we get 5-10 bug reports per year on specific cases for OB aromaticity detection.

But your question is how do you get uniform atom types, regardless of the input file format. This is probably impossible. If you have data in format X with correct bond and formal charge assignments (e.g., SDF) and data in XYZ format with atoms and no bonds or formal charges, you have to assume that all the bond perception is perfect. I don’t have a good metric for OB’s implementation, but I’d guess somewhere in the ~90-95% range.

In short, please don’t throw away good data. Stick to file formats that retain as much information as possible.

-Geoff
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: (no subject)

Marcos Villarreal
Hello Geoff, thank you for your answer. Please see my comments which are inline with yours comments below

2017-05-23 12:38 GMT-03:00 Geoffrey Hutchison <[hidden email]>:
> For now we are interested in consistency before "accuracy", which is another subject. As a related note, we have tested several atom typing programs (Knodle, I-interpret, Unicon and also Open Babel) and the perception of the number of aromatic atoms typically differ in 10-20 % when analyzing a 3600 structures in the PDBbind database.
This is hardly surprising. For one, if I take 10 organic chemists in a room and ask them to identify aromatic rings, I’ll get at least 10-20% variation.

More specifically, there is not one uniform cheminformatics model for aromaticity - because there is no well-defined chemical definition. That’s omitting the hard cases, even given a specific aromatic model. I’d guess we get 5-10 bug reports per year on specific cases for OB aromaticity detection.


  That was exactly the point implied in this comment. Open Babel seems as good as any other program at detecting aromaticity.

 
But your question is how do you get uniform atom types, regardless of the input file format. This is probably impossible. If you have data in format X with correct bond and formal charge assignments (e.g., SDF) and data in XYZ format with atoms and no bonds or formal charges, you have to assume that all the bond perception is perfect. I don’t have a good metric for OB’s implementation, but I’d guess somewhere in the ~90-95% range.

 
 Well, as long as coordinates and atomic numbers are provided in a file, it should be possible to always come up with the same atom typing, regardless the format. Indeed you will have to loose information for the sake of consistency.
 
In short, please don’t throw away good data. Stick to file formats that retain as much information as possible.

I agree with you in principle, but consider the following not uncommon scenario. We are working on docking  (autodock vina) whose score depends on atom typing. As you know the ligands come in different formats, usually pdb, mol2 or sdf. We would expect to obtain the same docking result regardless the input format.

-Marcos. 


-Geoff



--
Marcos Villarreal
Dpto de Química Teórica y Computacional
Facultad de Ciencias Químicas
Universidad Nacional de Cordoba

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: (no subject)

Dimitri Maziuk
On 05/23/2017 12:24 PM, Marcos Villarreal wrote:

> I agree with you in principle, but consider the following not uncommon
> scenario. We are working on docking  (autodock vina) whose score depends on
> atom typing. As you know the ligands come in different formats, usually
> pdb, mol2 or sdf. We would expect to obtain the same docking result
> regardless the input format.

Why? PDB files contain a 3D structure, complete with stereo config
(because that's how the crystal structure works). MOL/SDF doesn't have
to include 3D coordinates, nor any usable stereo flags. Unless all my
MOL/SDFs were generated from PDBs with zero information loss, I wouldn't
expect anything from them.

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

signature.asc (197 bytes) Download Attachment
Loading...