Hi I have numerous multimolecule sd files of unknown origin. Each file reads OK and a few initial tests in pybel work (e.g. number of molecules etc). I now want to do molecular weight distributions, similarity and identity searching etc across them.
This is not a problem but since each file may well be generated from different sources, they might have different charge assignments, hydrogens added/not added. Will pybel fp_1 | fp_2 type similarity and canonical smiles identity deal with this kind of issue automagically or is manual standardisation required before the searches? I was thinking along the lines of stripping any existing hydrogens then adding them by calling OB to ensure consistency. Is this the best thing to do and how best to deal with formal charges and possibly salts? thanks, Andy