JSON parsing of elements

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

JSON parsing of elements

Noel O'Boyle
Administrator
Hi Matt,

I'm in the middle of
https://github.com/openbabel/enhancement-proposals/pull/4 and have
come to the JSON formats.

When parsing the PubChem JSON you try first whether it's an integer
and then later if it's a string. I think it's always an integer and
plan to remove the string code - is this okay? I assume that this is a
copy+paste of logic from the ChemDoodle JSON parsing where
(presumably) this can occur.

- Noel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: JSON parsing of elements

Matt Swain
It’s been so long since I wrote this, I’ve just had a quick look to refresh my memory.

It looks to me like my code always assumes the element to be provided as a string containing the element symbol. But you are right, the PubChem REST API is clearly now returning the element as an integer atomic number, so the code actually currently fails completely. The fact that no one noticed shows how widely used this format is :)

I checked some old files that I have and it definitely used to be provided as a string (all lowercase element symbol, with some additional special cases). I doubt anyone else will have old files like this, so it’s probably safe to switch completely to integers and remove the string code? The writer will need updating also, to write integers instead of strings.

By the way, I suspect the pubchem ASN spec is the closest thing to a spec for the JSON format:
Here’s the element section:

PC-Element::= INTEGER {
    -- Illegal Atom Numbers that may be Interpreted to be something else
    a  (255),                                    -- Unspecified Atom (Asterick)
    d  (254),                                    -- Dummy Atom
    r  (253),                                    -- Rgroup Label
    lp (252),                                    -- Lone Pair

    -- Elements
    h  (1), he (2), li (3), be (4), b  (5),
    c  (6), n  (7), o  (8), f  (9), ne(10),
    na(11), mg(12), al(13), si(14), p (15),
    s (16), cl(17), ar(18), k (19), ca(20),
    sc(21), ti(22), v (23), cr(24), mn(25),
    fe(26), co(27), ni(28), cu(29), zn(30),
    ga(31), ge(32), as(33), se(34), br(35),
    kr(36), rb(37), sr(38), y (39), zr(40),
    nb(41), mo(42), tc(43), ru(44), rh(45),
    pd(46), ag(47), cd(48), in(49), sn(50),
    sb(51), te(52), i (53), xe(54), cs(55),
    ba(56), la(57), ce(58), pr(59), nd(60),
    pm(61), sm(62), eu(63), gd(64), tb(65),
    dy(66), ho(67), er(68), tm(69), yb(70),
    lu(71), hf(72), ta(73), w (74), re(75),
    os(76), ir(77), pt(78), au(79), hg(80),
    tl(81), pb(82), bi(83), po(84), at(85),
    rn(86), fr(87), ra(88), ac(89), th(90),
    pa(91), u(92),  np(93), pu(94), am(95),
    cm(96), bk(97), cf(98), es(99), fm(100),
    md(101), no(102), lr(103), rf(104), db(105),
    sg(106), bh(107), hs(108), mt(109), ds(110),
    rg(111)
}


Matt


On 29 June 2017 at 08:41:31, Noel O'Boyle ([hidden email]) wrote:

Hi Matt,

I'm in the middle of
https://github.com/openbabel/enhancement-proposals/pull/4 and have
come to the JSON formats.

When parsing the PubChem JSON you try first whether it's an integer
and then later if it's a string. I think it's always an integer and
plan to remove the string code - is this okay? I assume that this is a
copy+paste of logic from the ChemDoodle JSON parsing where
(presumably) this can occur.

- Noel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: JSON parsing of elements

Noel O'Boyle
Administrator
It took me a little while to realise that was the case also. I thought
it was fallback code if it wasn't parsed as an integer. I'll file a
bug on this and point to this discussion...

On 30 June 2017 at 20:49, Matt Swain <[hidden email]> wrote:

> It’s been so long since I wrote this, I’ve just had a quick look to refresh
> my memory.
>
> It looks to me like my code always assumes the element to be provided as a
> string containing the element symbol. But you are right, the PubChem REST
> API is clearly now returning the element as an integer atomic number, so the
> code actually currently fails completely. The fact that no one noticed shows
> how widely used this format is :)
>
> I checked some old files that I have and it definitely used to be provided
> as a string (all lowercase element symbol, with some additional special
> cases). I doubt anyone else will have old files like this, so it’s probably
> safe to switch completely to integers and remove the string code? The writer
> will need updating also, to write integers instead of strings.
>
> By the way, I suspect the pubchem ASN spec is the closest thing to a spec
> for the JSON format:
> ftp://ftp.ncbi.nih.gov//pubchem/specifications/pubchem.asn
> Here’s the element section:
>
> PC-Element::= INTEGER {
>     -- Illegal Atom Numbers that may be Interpreted to be something else
>     a  (255),                                    -- Unspecified Atom
> (Asterick)
>     d  (254),                                    -- Dummy Atom
>     r  (253),                                    -- Rgroup Label
>     lp (252),                                    -- Lone Pair
>
>     -- Elements
>     h  (1), he (2), li (3), be (4), b  (5),
>     c  (6), n  (7), o  (8), f  (9), ne(10),
>     na(11), mg(12), al(13), si(14), p (15),
>     s (16), cl(17), ar(18), k (19), ca(20),
>     sc(21), ti(22), v (23), cr(24), mn(25),
>     fe(26), co(27), ni(28), cu(29), zn(30),
>     ga(31), ge(32), as(33), se(34), br(35),
>     kr(36), rb(37), sr(38), y (39), zr(40),
>     nb(41), mo(42), tc(43), ru(44), rh(45),
>     pd(46), ag(47), cd(48), in(49), sn(50),
>     sb(51), te(52), i (53), xe(54), cs(55),
>     ba(56), la(57), ce(58), pr(59), nd(60),
>     pm(61), sm(62), eu(63), gd(64), tb(65),
>     dy(66), ho(67), er(68), tm(69), yb(70),
>     lu(71), hf(72), ta(73), w (74), re(75),
>     os(76), ir(77), pt(78), au(79), hg(80),
>     tl(81), pb(82), bi(83), po(84), at(85),
>     rn(86), fr(87), ra(88), ac(89), th(90),
>     pa(91), u(92),  np(93), pu(94), am(95),
>     cm(96), bk(97), cf(98), es(99), fm(100),
>     md(101), no(102), lr(103), rf(104), db(105),
>     sg(106), bh(107), hs(108), mt(109), ds(110),
>     rg(111)
> }
>
>
> Matt
>
>
> On 29 June 2017 at 08:41:31, Noel O'Boyle ([hidden email]) wrote:
>
> Hi Matt,
>
> I'm in the middle of
> https://github.com/openbabel/enhancement-proposals/pull/4 and have
> come to the JSON formats.
>
> When parsing the PubChem JSON you try first whether it's an integer
> and then later if it's a string. I think it's always an integer and
> plan to remove the string code - is this okay? I assume that this is a
> copy+paste of logic from the ChemDoodle JSON parsing where
> (presumably) this can occur.
>
> - Noel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel
Loading...