SMARTS with optional branches

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

SMARTS with optional branches

Andreas Maunz-2
Hi list,

this is strictly speaking not an OB question: How can I match a SMARTS
pattern with optional branch?
Example: My compound is "C-C(-O)=C-C". The SMARTS expressions are
s1="[#6]-[#6](-[#8])=[#6]" matches, as well as s2="[#6]-[#6]=[#6]".
Since s2 is contained in s1 in terms of subgraphs, I was thinking about
combining s1 and s2 into a single SMARTS by enforcing the common part
and make the part that solely occurs in s1 (i.e. the branch to oxygen)
optional.

Is there a way to combine s1 and s2 into a single SMARTS expression? I
know about "([#6]-[#6](-[#8])=[#6],[#6]-[#6]=[#6])", but that is not
what I want.
I want an expression that states:

 carbon - single_edge - carbon - optional(begin) (single_edge oxygen)
optional(end) - double_edge - carbon

i.e. the optional part is "inside" the SMARTS pattern. Any ideas?

Greetings
Andreas

--
http://www.maunz.de
OpenPGP key: http://www.maunz.de/andreas@...

   I do know everything, just not all at once. It's a virtual memory
problem.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: SMARTS with optional branches

Andrew Dalke
On Nov 16, 2009, at 1:46 PM, Andreas Maunz wrote:
> this is strictly speaking not an OB question: How can I match a SMARTS
> pattern with optional branch?

SMARTS matches each atom and bond term to an atom and bond. It does  
not support optional terms.

> Is there a way to combine s1 and s2 into a single SMARTS expression?

Sadly, no. I seem to recall other molecular structure query languages  
which are a bit more capable and which might support this, but my  
recollection is they are a lot more verbose, and I can't think of any  
references off-hand.


                                Andrew
                                [hidden email]



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: SMARTS with optional branches

Rajarshi Guha-4


On Mon, Nov 16, 2009 at 8:52 AM, Andrew Dalke <[hidden email]> wrote:
On Nov 16, 2009, at 1:46 PM, Andreas Maunz wrote:

> Is there a way to combine s1 and s2 into a single SMARTS expression?

Sadly, no. I seem to recall other molecular structure query languages
which are a bit more capable and which might support this, but my
recollection is they are a lot more verbose, and I can't think of any
references off-hand.

MQL (http://en.wikipedia.org/wiki/Molecular_Query_Language)  might support this, but as Andrew pointed out, it can be quite a bit more verbose than SMARTS



--
Rajarshi Guha
NIH Chemical Genomics Center

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: SMARTS with optional branches

Andreas Maunz-2
Thanks for the replies. I guess it's time for a new query language with
capabilities beyond SMARTS.
I found at least a "workaround" with SMARTS: one can mandatory demand
the part "before" the branching point and then attach two versions, with
and without the branch using OR.

Consider these examples created with the ruby code by Rich
(http://depth-first.com/articles/2006/10/31/obruby-a-ruby-interface-to-open-babel):

1) demand core structure "PCCC" and optional branch to "O" at position 2
(the 2nd "C"):
Found 1 instances of the SMARTS pattern 'P-[$(C-C),$(C(-O)-C)]' in the
SMILES string 'C-P-C(-O)-C-S'. Here are the atom indices:
  Hit 0: [ 2 3 ]

2) demand the same pattern for the reduced molecule without branch to "O":
Found 1 instances of the SMARTS pattern 'P-[$(C-C),$(C(-O)-C)]' in the
SMILES string 'C-P-C-C-S'. Here are the atom indices:
  Hit 0: [ 2 3 ]

3) also works on reduced mol with "exchanged" branches, i.e. core
structure "PCO" and optional branch "CS" at position 2:
Found 1 instances of the SMARTS pattern 'P-[$(C-O),$(C(-C)-O)]' in the
SMILES string 'C-P-C(-C-S)-O'. Here are the atom indices:
  Hit 0: [ 2 3 ]

The drawback is that it is necessary to specify both possibilities, with
and without branch. On the positive side we save some space compared to
writing out all combinations of ground patterns, but that hurts
readability.
Also OB returns only one match which consists of the part up to the
branching position.

Greetings
Andreas


SMARTS: "P-[$(C-C),$(C(-O)-C-C)]" <= either PCC or PC(O)C
This matches on both "C-P-C(-O)-C-S" and "C-P-C-C-S". It makes the part
up to the position where the branch to "O" is attached mandatory ("P-"),
but not the part after the branch

Rajarshi Guha wrote on 11/16/2009 03:10 PM:

> On Mon, Nov 16, 2009 at 8:52 AM, Andrew Dalke <[hidden email]>wrote:
>
>> On Nov 16, 2009, at 1:46 PM, Andreas Maunz wrote:
>>
>>> Is there a way to combine s1 and s2 into a single SMARTS expression?
>> Sadly, no. I seem to recall other molecular structure query languages
>> which are a bit more capable and which might support this, but my
>> recollection is they are a lot more verbose, and I can't think of any
>> references off-hand.
>>
>
> MQL (http://en.wikipedia.org/wiki/Molecular_Query_Language)  might support
> this, but as Andrew pointed out, it can be quite a bit more verbose than
> SMARTS
>
>
>
>
>
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> OpenBabel-discuss mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

--
http://www.maunz.de
OpenPGP key: http://www.maunz.de/andreas@...

   I do know everything, just not all at once. It's a virtual memory
problem.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: SMARTS with optional branches

Craig James-2
In reply to this post by Andreas Maunz-2
Andreas Maunz wrote:

> Hi list,
>
> this is strictly speaking not an OB question: How can I match a SMARTS
> pattern with optional branch?
> Example: My compound is "C-C(-O)=C-C". The SMARTS expressions are
> s1="[#6]-[#6](-[#8])=[#6]" matches, as well as s2="[#6]-[#6]=[#6]".
> Since s2 is contained in s1 in terms of subgraphs, I was thinking about
> combining s1 and s2 into a single SMARTS by enforcing the common part
> and make the part that solely occurs in s1 (i.e. the branch to oxygen)
> optional.
>
> Is there a way to combine s1 and s2 into a single SMARTS expression? I
> know about "([#6]-[#6](-[#8])=[#6],[#6]-[#6]=[#6])", but that is not
> what I want.
> I want an expression that states:
>
>  carbon - single_edge - carbon - optional(begin) (single_edge oxygen)
> optional(end) - double_edge - carbon
>
> i.e. the optional part is "inside" the SMARTS pattern. Any ideas?

You may be able to do this using recursive SMARTS.  In a recursive SMARTS, an atom expression can itself be a whole SMARTS, ad infinitum.  And since an atom can be an OR list of alternatives, you can have an atom that is "This OR that", where "this" and "that" are entire SMARTS expressions.  I'm not sure if this is what you want, but it might be something like this:

  [#6]-[$([#6]),$([#6]-[#8])]=[#6]

In this case, the second atom in the SMARTS has two comma-separated recursive expressions, [$(...),$(...)], and if the whole expression matches, then the atom is considered a match.

But beware of recursion, you have to understand it.  The recursive expression matches a SINGLE ATOM, that is, the match of the oxygen is "forgotten" in this case once it's determined that the carbon to which is attached matches.

Craig

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: SMARTS with optional branches

Andreas Maunz-2
Hello Craig,

Craig A. James wrote on 11/16/2009 06:12 PM:
> You may be able to do this using recursive SMARTS.  In a recursive
> SMARTS, an atom expression can itself be a whole SMARTS, ad infinitum.
> And since an atom can be an OR list of alternatives, you can have an
> atom that is "This OR that", where "this" and "that" are entire SMARTS
> expressions.  I'm not sure if this is what you want, but it might be
> something like this:
>
>  [#6]-[$([#6]),$([#6]-[#8])]=[#6]

This reads:

1. (carbon) attached to
2. ((a) a carbon OR (b) a carbon attached to an oxygen) attached to
3. (an oxygen),

whereas it is implicitly meant that in case of 2.(b) a branch occurs:

1.2.3.
C-C=C
  |
  O

this is due to the semantics of a recursive expression as "a single atom
with special properties".
In case of 2.(b) the C has the special property of being
(branch-)connected to an O, compared to case 2.(a)
with no special properties for the C.
Am I right here? If yes, this would be the solution I was looking for.

Another aspect of this specific example:
Case 2.(b) is a special case of 2.(a), so that every time 2.(b) matches,
2.(a) also matches.
Therefore, in this case, 2.(b) is not needed at all and the expression
[#6]-[#6]=[#6] would have done the job as well.
In general, a recursive smarts is only necessary if the alternatives are
mutually exclusive.
Is this also correct?

Best regards
Andreas

--
http://www.maunz.de
OpenPGP key: http://www.maunz.de/andreas@...

Real programmers don't document. If it was hard to write, it should be
hard to understand.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: SMARTS with optional branches

Craig James-2
Andreas Maunz wrote:

> Hello Craig,
>
> Craig A. James wrote on 11/16/2009 06:12 PM:
>> You may be able to do this using recursive SMARTS.  In a recursive
>> SMARTS, an atom expression can itself be a whole SMARTS, ad infinitum.
>> And since an atom can be an OR list of alternatives, you can have an
>> atom that is "This OR that", where "this" and "that" are entire SMARTS
>> expressions.  I'm not sure if this is what you want, but it might be
>> something like this:
>>
>>  [#6]-[$([#6]),$([#6]-[#8])]=[#6]
>
> This reads:
>
> 1. (carbon) attached to
> 2. ((a) a carbon OR (b) a carbon attached to an oxygen) attached to
> 3. (an oxygen),
>
> whereas it is implicitly meant that in case of 2.(b) a branch occurs:
>
> 1.2.3.
> C-C=C
>   |
>   O
>
> this is due to the semantics of a recursive expression as "a single atom
> with special properties".
> In case of 2.(b) the C has the special property of being
> (branch-)connected to an O, compared to case 2.(a)
> with no special properties for the C.
> Am I right here? If yes, this would be the solution I was looking for.

Yes, that's right.

> Another aspect of this specific example:
> Case 2.(b) is a special case of 2.(a), so that every time 2.(b) matches,
> 2.(a) also matches.
> Therefore, in this case, 2.(b) is not needed at all and the expression
> [#6]-[#6]=[#6] would have done the job as well.

Exactly.  This was just a toy example.  In real life, your alternatives would be chemically meaningful, such as electronegativity or aromaticity, things that can't be easily expressed as a single atom expression.

> In general, a recursive smarts is only necessary if the alternatives are
> mutually exclusive.
> Is this also correct?

That's right.  Simple atomic expressions are often sufficient, for example it's easy to write "halogens but not iodine" as [Cl,Br,F].  It's very rare to need a recursive SMARTS, but when you do, it's invaluable.

One last thing to beware of.  The recursive match is truly recursive, which means that only the "head" atom of the recursive expression is marked as "used" once the recursion finishes.  For example, the smarts "C[$(CO)]O" would match the molecule "CCO".  The recursive "CO" expression would match the center carbon, because it does in fact have an attached oxygen.  But then the final "O" in the SMARTS would *also* match that same oxygen, because once match for [$(CO)] finished, the pattern matcher "forgot" about the oxygen, leaving only the central "C" is marked as the matched atom.

Craig

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: SMARTS with optional branches

Andreas Maunz-2
Craig,

Craig A. James wrote on 11/17/2009 04:47 PM:
> One last thing to beware of.  The recursive match is truly recursive,
> which means that only the "head" atom of the recursive expression is
> marked as "used" once the recursion finishes.  For example, the smarts
> "C[$(CO)]O" would match the molecule "CCO".  The recursive "CO"
> expression would match the center carbon, because it does in fact have
> an attached oxygen.  But then the final "O" in the SMARTS would *also*
> match that same oxygen, because once match for [$(CO)] finished, the
> pattern matcher "forgot" about the oxygen, leaving only the central "C"
> is marked as the matched atom.

This is an important point. I want to find

CCO
 |
 O

and not

CCO

I found two ways to deal with it. The first enforces a certain degree on
the center node where the branch starts, like so: C[$(CD3O)]O, for our
toy example.
If you have more than one branch on the second C you will have to
increase to D4 and so on. This leads to a disjunction of node degrees
like C[$(CD3,D4O),$(CD3,D4O)]O if you also want to detect

 O
 |
CCO
 |
 O

with either one or both of the branches attached. It would be much
cleaner if it was possible to specify a minimum degree instead of exact
degrees. A more elegant solution that achieves this is to enforce a
certain environment around the node, like this: C[C;$(C(O)*)]O. It
specifies a degree of at least 3 for the second C because '*' must match
something different than the branch to O. Do you agree?

Best regards
Andreas

--
http://www.maunz.de
OpenPGP key: http://www.maunz.de/andreas@...

Real programmers don't document. If it was hard to write, it should be
hard to understand.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Reply | Threaded
Open this post in threaded view
|

Re: SMARTS with optional branches

Andrew Dalke
In reply to this post by Rajarshi Guha-4
On Nov 16, 2009, at 3:10 PM, Rajarshi Guha wrote:
> MQL (http://en.wikipedia.org/wiki/Molecular_Query_Language)  might  
> support this, but as Andrew pointed out, it can be quite a bit more  
> verbose than SMARTS

Thanks! Yes, MQL was the one I was trying to think of.

                                Andrew
                                [hidden email]



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
OpenBabel-discuss mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss