PAINS filtering

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

PAINS filtering

mirix
Hello,

I have downloaded a subset of the PubChem database in SDF format. On the other hand, I have a file containing SMARTS for about 500 undesirable functionalities.

I am able to filter my SDF file using one SMARTS pattern at a time, but I was wondering if OB provides a simple way of filtering all PAINS in one go.

Kind regards,

Miro
Reply | Threaded
Open this post in threaded view
|

Re: PAINS filtering

Geoff Hutchison
> I am able to filter my SDF file using one SMARTS pattern at a time, but I
> was wondering if OB provides a simple way of filtering all PAINS in one go.

At the moment, no. But if someone posts the set of SMARTS patterns, it’s easy to add this as a filter to plugindefines.txt.

-Geoff
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: PAINS filtering

mirix
Reply | Threaded
Open this post in threaded view
|

Re: PAINS filtering

Stefano Forli
About that, a while ago I had compiled an OB-compatible data file with the three pattern
classes (L15, L150 and M150) from Raj Gua [1].

Dirt cheap implementation with Pybel works fine, but the main issue is timing. Processing
a Mol2 file with 1000 random molecules from the ZINC database takes about 13 seconds.
Not bad, but not even fast: to process a pretty large library for virtual screning  (e.g,
ChemBridge library, 1.5M compounds) would take about 5.4 hours.

I'm not sure if there's a way to speed up the process, but if so, it should be definitely
considered.

Anyway, let me know what would be the best way to share the files, and I'll do it.

Cheers,

S


[1] http://blog.rguha.net/?p=850


On 03/15/2016 08:21 AM, mirix wrote:

> There you go:
>
> http://pastebin.ca/raw/3401877 <http://pastebin.ca/raw/3401877>
>
> This comes from here:
>
> http://blog.rguha.net/?p=850
>
>
>
>
> --
> View this message in context: http://forums.openbabel.org/PAINS-filtering-tp4659223p4659226.html
> Sent from the General discussion mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
> _______________________________________________
> OpenBabel-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>

--

  Stefano Forli, PhD

  Assistant Professor of Integrative
  Structural and Computational Biology,
  Molecular Graphics Laboratory

  Dept. of Integrative Structural
   and Computational Biology, MB-112A
  The Scripps Research Institute
  10550  North Torrey Pines Road
  La Jolla,  CA 92037-1000,  USA.

     tel: +1 (858)784-2055
     fax: +1 (858)784-2860
     email: [hidden email]
     http://www.scripps.edu/~forli/

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: PAINS filtering

mwojcikowski
Hi,
There are improved SMARTS for PAINS in RDKit. 


You can also use ODDT to do the filtering http://oddt.readthedocs.org/en/latest/#oddt-command-line-interface-cli example #2. (although currently it uses Rajarshi's SMARTS definition). I have to update those to Greg's revised version.
----
Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
[hidden email]

2016-03-15 18:54 GMT+01:00 Stefano Forli <[hidden email]>:
About that, a while ago I had compiled an OB-compatible data file with the three pattern
classes (L15, L150 and M150) from Raj Gua [1].

Dirt cheap implementation with Pybel works fine, but the main issue is timing. Processing
a Mol2 file with 1000 random molecules from the ZINC database takes about 13 seconds.
Not bad, but not even fast: to process a pretty large library for virtual screning  (e.g,
ChemBridge library, 1.5M compounds) would take about 5.4 hours.

I'm not sure if there's a way to speed up the process, but if so, it should be definitely
considered.

Anyway, let me know what would be the best way to share the files, and I'll do it.

Cheers,

S


[1] http://blog.rguha.net/?p=850


On 03/15/2016 08:21 AM, mirix wrote:
> There you go:
>
> http://pastebin.ca/raw/3401877 <http://pastebin.ca/raw/3401877>
>
> This comes from here:
>
> http://blog.rguha.net/?p=850
>
>
>
>
> --
> View this message in context: http://forums.openbabel.org/PAINS-filtering-tp4659223p4659226.html
> Sent from the General discussion mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
> _______________________________________________
> OpenBabel-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>

--

  Stefano Forli, PhD

  Assistant Professor of Integrative
  Structural and Computational Biology,
  Molecular Graphics Laboratory

  Dept. of Integrative Structural
   and Computational Biology, MB-112A
  The Scripps Research Institute
  10550  North Torrey Pines Road
  La Jolla,  CA 92037-1000,  USA.

     tel: <a href="tel:%2B1%20%28858%29784-2055" value="+18587842055" target="_blank">+1 (858)784-2055
     fax: <a href="tel:%2B1%20%28858%29784-2860" value="+18587842860" target="_blank">+1 (858)784-2860
     email: [hidden email]
     http://www.scripps.edu/~forli/

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss