[Open Babel] Fastsearch

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Open Babel] Fastsearch

drc-2
Hi,

Just started using the fastsearch facility within the latest snapshot and I'm really impressed. Substructure searching a million record SMILES file takes about 15 seconds on my G4
laptop (Once you have created the index).  A quick question, if I now want to add a few hundred records will I have to rebuild the whole index?

Thanks

Chris



-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] Fastsearch

Chris Morley-3
[hidden email] wrote:

> Just started using the fastsearch facility within the latest snapshot
> and I'm really impressed. Substructure searching a million record
> SMILES file takes about 15 seconds on my G4
> laptop (Once you have created the index).  
 > A quick question, if I now want to add a few hundred
records will
 > I have to rebuild the whole index?
>
I'm afraid you will have to rebuild. If the records were
added at the end of the data file and the rest was
untouched, I think it would be possible to have an
'update' facility. This might only take a few seconds for
a hundred records rather than the 2 hours(?) that it would
take when redoing the whole index. Thanks for the
suggestion; I'll think about it.

Chris


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] Fastsearch

peter murray-rust
On Nov 16 2005, Chris Morley wrote:

>[hidden email] wrote:
>
>> Just started using the fastsearch facility within the latest snapshot
>> and I'm really impressed. Substructure searching a million record
>> SMILES file takes about 15 seconds on my G4
>> laptop (Once you have created the index).  
> > A quick question, if I now want to add a few hundred
>records will
> > I have to rebuild the whole index?
>>
>I'm afraid you will have to rebuild. If the records were
>added at the end of the data file and the rest was
>untouched, I think it would be possible to have an
>'update' facility. This might only take a few seconds for
>a hundred records rather than the 2 hours(?) that it would
>take when redoing the whole index. Thanks for the
>suggestion; I'll think about it.

This looks very exciting. There are and will be many cases where there are
semi-static collections - e.g. they are updated once a month or whatever.
For example we have computed MOPAC structures for the whole of the NCI
database and put these in our Institutional repository. This resource might
get updated at irregular intervals and it would be easy to recompute the
index at update time. It is also possible to download the whole of PubChem
- 5,000,000 structures and your figures suggest that an index could be
rebuilt overnight. (Of course this can be searched on the PubChem site at
present but it gives an idea of the scale of the problem.)

P.


>
>Chris
>
>
>-------------------------------------------------------
>This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
>Register for a JBoss Training Course.  Free Certification Exam
>for All Training Attendees Through End of 2005. For more info visit:
>http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
>_______________________________________________
>OpenBabel-discuss mailing list
>[hidden email]
>https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
>

--
Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069 Fax: +44 1223 763076



-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Open Babel] Fastsearch

Chris Morley-3

>
>> [hidden email] wrote:
>>
>>> Just started using the fastsearch facility within the latest snapshot
>>> and I'm really impressed. Substructure searching a million record
>>> SMILES file takes about 15 seconds on my G4 laptop (Once you have
>>> created the index).  
>>
>> > A quick question, if I now want to add a few hundred records will
>> > I have to rebuild the whole index?

> On Nov 16 2005, Chris Morley wrote:
>> I'm afraid you will have to rebuild. If the records were added at the
>> end of the data file and the rest was untouched, I think it would be
>> possible to have an 'update' facility. This might only take a few
>> seconds for a hundred records rather than the 2 hours(?) that it would
>> take when redoing the whole index. Thanks for the suggestion; I'll
>> think about it.

>Dr P. Murray-Rust wrote:
> This looks very exciting. There are and will be many cases where there
> are semi-static collections - e.g. they are updated once a month or
> whatever. For example we have computed MOPAC structures for the whole of
> the NCI database and put these in our Institutional repository. This
> resource might get updated at irregular intervals and it would be easy
> to recompute the index at update time. It is also possible to download
> the whole of PubChem - 5,000,000 structures and your figures suggest
> that an index could be rebuilt overnight. (Of course this can be
> searched on the PubChem site at present but it gives an idea of the
> scale of the problem.)
>
I have now added this update facility to the fastsearch
format. CVS is currently frozen, but it happens that the
compiled Windows snapshot OBwin2.0.0rc1-20051118 (now
available for download) contains it. The main code will
get it later.

Illustration of its use

Make an index:
   babel bigdata.xxx -ofs     (Takes minutes or hours)

Use the index bigdata.fs for substructure searches, etc.:
   babel bigdata.fs -sSMILES -oyyy   (Takes seconds)

Add more molecules to bigdata.xxx:
   copy bigdata.xxx+extramols.xxx bigdata.xxx
(Or equivalent in UNIX)

Update the index:
   babel bigdata.xxx -ofs -xu    (Takes seconds, probably)

Continue using bigdata.fs

Chris


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.  Get Certified Today
Register for a JBoss Training Course.  Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Loading...