Quantcast

Re: [netcdfgroup] How to dump netCDF to JSON?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [netcdfgroup] How to dump netCDF to JSON?

Pedro Vicente
Hi Charlie !


So, I am doing that exact same thing.

I wrote

1) The specification to convert netCDF/HDF5 to "a" JSON format (note the "a"
here)

2) I wrote the parsing of the JSON format using an open source C JSON
library (janson).

http://www.digip.org/jansson/

3) left to do: the actual C code of reading/writing netCDF/HDF5 to JSON and
vice-versa (the straightforward part).

The "a" above means that JSON is not really a format in the sense of netCDF
but really a format that allows to define formats , for a lack of a better
explanation.

This means that anyone that writes this tool has to write code that write in
a particular JSON representation , only valid for that tool.

Like you , I searched and there was not a good one, so I wrote one.

The first criteria was that it had to be obvious for anyone looking at the
JSON text file, that that was indeed a netCDF/HDF5 file: hierarchy clearly
show, metadata and data
clearly shown

My first look  was HDF5-JSON

http://hdf5-json.readthedocs.io/en/latest/

but the format seemed like a  mess to look at

example

http://hdf5-json.readthedocs.io/en/latest/examples/nullspace_dset.html


and the reader is written in Python

@John Readey

(why Phyton? HDF5 developer tools should be all about writing in C/C++)



The specification is here

http://www.space-research.org/

Click on menu
"Code blog",
then
"netCDF/HDF5 to JSON and vice-versa"


In the process I learned all about JSON and it is a neat format to represent
data  .

In particular, it allows nested structures and arrays, which suits perfectly
for netCDF

here are two nested groups

{
 "group_name1":
 {
  "group_name2": "group"
 }
}

a dataset

{
 "dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12]]
}



This is still under development,

I would like to make this some kind of "official" netCDF/HDF5 JSON format
for the community, so I encourage anyone to read the specification

direct link

http://www.space-research.org/blog/star_json.html

If you see any flaw in the design or antything in the design that you would
like to have change please let me know now

At the moment it only (intentionally) uses common generic features of both
netCDF and HDF5, which are the numeric atomic types and strings.

Enjoy


----------------------
Pedro Vicente
[hidden email]
http://www.space-research.org/




----- Original Message -----
From: "Charlie Zender" <[hidden email]>
To: "netCDF Mail List" <[hidden email]>
Sent: Thursday, October 13, 2016 11:10 PM
Subject: [netcdfgroup] How to dump netCDF to JSON?


> Hello netCDFers,
>
> A project I am working on wants to convert netCDF files to JSON.
> The requirements are to dump an arbitrary netCDF-extended file
> (with groups but without funky vlen/compound types) to JSON.
> The first few solutions that we googled (ncdump-json, netcdf2json.py)
> do not satisfy these requirements. What is the most robust and easy
> command-line tool (not web-service) that dumps netCDF to JSON?
> Ideally it would be somewhat configurable like ncdump -h/-x or
> ncks --cdl/--xml.
>
> Charlie
> --
> Charlie Zender, Earth System Sci. & Computer Sci.
> University of California, Irvine 949-891-2429 )'(
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> [hidden email]
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/ 


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [netcdfgroup] How to dump netCDF to JSON?

Walter Landry
Hi Pedro,

Is there a way to represent compound HDF5 objects in your scheme?  I
see STAR_INT8, STAR_FLOAT, etc., but no STAR_COMPOUND.

In any event, FYI, I have implemented a similar thing but in a
narrower domain.  I needed to map VOTable [1] (an xml format with a
defined data model for astronomy) to HDF5.  Along the way, I also
mapped it (with varying levels of fidelity) to JSON, CSV, HTML, FITS,
and plain old ascii.  This makes my work very table focused, with a
particular eye towards what relational databases like to output.

The code is on github [2].  It is mainly meant to be used as a
library, but the code also builds a simple program that converts files
between formats.  In case you are interested, I am attaching the same
file in plain text, JSON, and HDF5.

Cheers,
Walter Landry

[1] http://www.ivoa.net/documents/VOTable/
[2] https://github.com/Caltech-IPAC/tablator

Pedro Vicente <[hidden email]> wrote:

> Hi Charlie !
>
>
> So, I am doing that exact same thing.
>
> I wrote
>
> 1) The specification to convert netCDF/HDF5 to "a" JSON format (note
> the "a" here)
>
> 2) I wrote the parsing of the JSON format using an open source C JSON
> library (janson).
>
> http://www.digip.org/jansson/
>
> 3) left to do: the actual C code of reading/writing netCDF/HDF5 to
> JSON and vice-versa (the straightforward part).
>
> The "a" above means that JSON is not really a format in the sense of
> netCDF but really a format that allows to define formats , for a lack
> of a better
> explanation.
>
> This means that anyone that writes this tool has to write code that
> write in a particular JSON representation , only valid for that tool.
>
> Like you , I searched and there was not a good one, so I wrote one.
>
> The first criteria was that it had to be obvious for anyone looking at
> the JSON text file, that that was indeed a netCDF/HDF5 file: hierarchy
> clearly show, metadata and data
> clearly shown
>
> My first look  was HDF5-JSON
>
> http://hdf5-json.readthedocs.io/en/latest/
>
> but the format seemed like a  mess to look at
>
> example
>
> http://hdf5-json.readthedocs.io/en/latest/examples/nullspace_dset.html
>
>
> and the reader is written in Python
>
> @John Readey
>
> (why Phyton? HDF5 developer tools should be all about writing in
> C/C++)
>
>
>
> The specification is here
>
> http://www.space-research.org/
>
> Click on menu
> "Code blog",
> then
> "netCDF/HDF5 to JSON and vice-versa"
>
>
> In the process I learned all about JSON and it is a neat format to
> represent data .
>
> In particular, it allows nested structures and arrays, which suits
> perfectly for netCDF
>
> here are two nested groups
>
> {
> "group_name1":
> {
>  "group_name2": "group"
> }
> }
>
> a dataset
>
> {
> "dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7,
> 8, 9, 10, 11, 12]]
> }
>
>
>
> This is still under development,
>
> I would like to make this some kind of "official" netCDF/HDF5 JSON
> format for the community, so I encourage anyone to read the
> specification
>
> direct link
>
> http://www.space-research.org/blog/star_json.html
>
> If you see any flaw in the design or antything in the design that you
> would like to have change please let me know now
>
> At the moment it only (intentionally) uses common generic features of
> both netCDF and HDF5, which are the numeric atomic types and strings.
>
> Enjoy
>
>
> ----------------------
> Pedro Vicente
> [hidden email]
> http://www.space-research.org/
>
>
>
>
> ----- Original Message -----
> From: "Charlie Zender" <[hidden email]>
> To: "netCDF Mail List" <[hidden email]>
> Sent: Thursday, October 13, 2016 11:10 PM
> Subject: [netcdfgroup] How to dump netCDF to JSON?
>
>
>> Hello netCDFers,
>>
>> A project I am working on wants to convert netCDF files to JSON.
>> The requirements are to dump an arbitrary netCDF-extended file
>> (with groups but without funky vlen/compound types) to JSON.
>> The first few solutions that we googled (ncdump-json, netcdf2json.py)
>> do not satisfy these requirements. What is the most robust and easy
>> command-line tool (not web-service) that dumps netCDF to JSON?
>> Ideally it would be somewhat configurable like ncdump -h/-x or
>> ncks --cdl/--xml.
>>
>> Charlie
>> --
>> Charlie Zender, Earth System Sci. & Computer Sci.
>> University of California, Irvine 949-891-2429 )'(
>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web.  Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> netcdfgroup mailing list
>> [hidden email]
>> For list information or to unsubscribe, visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

| object                                   | ra        | dec       | htm20               | htm7      | htm3   | shtm20              | shtm7      |  shtm3 | flags | SSO   |
| char                                     | double    | real      | ulong               | uint      | ushort | long                |  int       |  short | byte  | bool  |
 118289arstratraetratratsrastratsrastrats    359.88703   50.832570  16446744073709551616  3294967296    12000   8223372036854775808   1147483648    12000    122    0      
 113368                                      344.41273   -29.622250 8446744073709551616   294967296     43002  -7223372036854775808  -2047483648    13002    0xf2   true  
 113368                                      344.41273   -29.622250 8446744073709551616   294967296     43002  -7223372036854775808  -2047483648   -23002    211    False  
 113368                                      344.41273   -29.622250 8446744073709551616   294967296     43002  -7223372036854775808  -2047483648   -31002    211    1      

{
    "VOTABLE":
    {
        "<xmlattr>":
        {
            "version": "1.3",
            "xmlns:xsi": "http:\/\/www.w3.org\/2001\/XMLSchema-instance",
            "xmlns": "http:\/\/www.ivoa.net\/xml\/VOTable\/v1.3",
            "xmlns:stc": "http:\/\/www.ivoa.net\/xml\/STC\/v1.30"
        },
        "RESOURCE":
        {
            "TABLE":
            {
                "FIELD":
                {
                    "<xmlattr>":
                    {
                        "name": "object",
                        "datatype": "char",
                        "arraysize": "*"
                    }
                },
                "FIELD":
                {
                    "<xmlattr>":
                    {
                        "name": "ra",
                        "datatype": "double"
                    }
                },
                "FIELD":
                {
                    "<xmlattr>":
                    {
                        "name": "dec",
                        "datatype": "double"
                    }
                },
                "FIELD":
                {
                    "<xmlattr>":
                    {
                        "name": "htm20",
                        "datatype": "ulong"
                    }
                },
                "FIELD":
                {
                    "<xmlattr>":
                    {
                        "name": "htm7",
                        "datatype": "uint"
                    }
                },
                "FIELD":
                {
                    "<xmlattr>":
                    {
                        "name": "htm3",
                        "datatype": "ushort"
                    }
                },
                "FIELD":
                {
                    "<xmlattr>":
                    {
                        "name": "shtm20",
                        "datatype": "long"
                    }
                },
                "FIELD":
                {
                    "<xmlattr>":
                    {
                        "name": "shtm7",
                        "datatype": "int"
                    }
                },
                "FIELD":
                {
                    "<xmlattr>":
                    {
                        "name": "shtm3",
                        "datatype": "short"
                    }
                },
                "FIELD":
                {
                    "<xmlattr>":
                    {
                        "name": "flags",
                        "datatype": "unsignedByte"
                    }
                },
                "FIELD":
                {
                    "<xmlattr>":
                    {
                        "name": "SSO",
                        "datatype": "boolean"
                    }
                },
                "DATA":
                {
                    "TABLEDATA":
                    [
                        [
                            "118289arstratraetratratsrastratsrastrats",
                            "359.88702999999998",
                            "50.832569999999997",
                            "16446744073709551616",
                            "3294967296",
                            "12000",
                            "8223372036854775808",
                            "1147483648",
                            "12000",
                            "0x7a",
                            "0"
                        ],
                        [
                            "113368",
                            "344.41273000000001",
                            "-29.622250000000001",
                            "8446744073709551616",
                            "294967296",
                            "43002",
                            "-7223372036854775808",
                            "-2047483648",
                            "13002",
                            "0xf2",
                            "1"
                        ],
                        [
                            "113368",
                            "344.41273000000001",
                            "-29.622250000000001",
                            "8446744073709551616",
                            "294967296",
                            "43002",
                            "-7223372036854775808",
                            "-2047483648",
                            "-23002",
                            "0xd3",
                            "0"
                        ],
                        [
                            "113368",
                            "344.41273000000001",
                            "-29.622250000000001",
                            "8446744073709551616",
                            "294967296",
                            "43002",
                            "-7223372036854775808",
                            "-2047483648",
                            "-31002",
                            "0xd3",
                            "1"
                        ]
                    ]
                }
            }
        }
    }
}

�HDF

���������0�G�AOHDR �rX�rX�rX�rXx����������������
table�D"%=�OHDR!�rX�rX�rX�rX 6 Xnull_bitfield_flags:object(ra* ?@4 4�dec2 ?@4 4�htm20:@htm7B htm3Fshtm20H@shtm7P shtm3TflagsVSSOW
0`���������������� � �METADATA960namevalueattributes 96 namevalue4) �118289arstratraetratratsrastratsrastrats"�^F1~v@��\��jI@8���>�6e��.�XLIr6eD�.z113368@�Ŋ��u@���K�=�v��8uؔ���Ngm������2�113368@�Ŋ��u@���K�=�v��8uؔ���Ngm�����&��113368@�Ŋ��u@���K�=�v��8uؔ���Ngm�������
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [netcdfgroup] How to dump netCDF to JSON?

JohnReadey
In reply to this post by Pedro Vicente

Hey,

 

The hdf5-json code is here: https://github.com/HDFGroup/hdf5-json and docs are here:  http://hdf5-json.readthedocs.io/en/latest/

 

The package is both a library of HFD5 <-> JSON conversion functions and some simple scripts for converting HDF5 to JSON and vice-versa.  E.g.

$ python h5tojson.py –D <hdf5-file>

outputs JSON minus the dataset data values.

 

While it may not be the most elegant JSON schema, it’s designed with the following goals in mind:

1.       Complete fidelity to all HDF5 features (i.e. the goal is that you should be able to take any HDF5 files, convert it to JSON, convert back to HDF5 and wind up with a file that is semantically equivalent to what you started with.

2.       Support graphs that are not acyclic.  I.e. a group structure like <root> links with A, and B.  And A and B links to C.  The output should only produce one representation of C.

Since NetCDF doesn’t use all these features, it’s certainly possible to come up with something simpler for just netCDF files.

 

Suggestions, feedback, and pull requests are welcome!

 

Cheers,

John

 

From: Chris Barker <[hidden email]>
Date: Friday, October 14, 2016 at 12:32 PM
To: Pedro Vicente <[hidden email]>
Cc: netCDF Mail List <[hidden email]>, Charlie Zender <[hidden email]>, John Readey <[hidden email]>, HDF Users Discussion List <[hidden email]>, David Pearah <[hidden email]>
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

Pedro,

 

When I first started reading this thread, I thought "there should be a spec for how to represent netcdf in JSON"

 

and then I read:

 

1) The specification to convert netCDF/HDF5 to "a" JSON format (note the "a" here)

 

Awesome -- that's exactly what we need -- as you say there is not one way to represent netcdf data in JSON, and probably far more than one "obvious" way.

 

Without looking at your spec yet, I do think it should probably look as much like CDL as possible -- we are all familiar with that.

 

(why Python? HDF5 developer tools should be all about writing in C/C++)

 

Because Python is an excellent language with which to "drive" C/C++ libraries like HDF5 and netcdf4. If I were to do this, I'd sure use Python. Even if you want to get to a C++ implementation eventually, you'd probably benefit from prototyping and working out the kinks with a Python version first.

 

But whoever is writing the code....

 

 

The specification is here

http://www.space-research.org/

 

Just took a quick look -- nice start. 

 

I've only used HDF through the netcdf4 spec, so there may be richness needed that I'm missing, but my first thought is to a make greater use of "objects" in JSON (key-value structures, hash tables, dicts in python), rather than array position for heterogeneous structures. For instance, you have:

 

 a dataset


{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}

 

I would perhaps do that as something like:

 

{

...

"dset1":{"object_type": "dataset",

         "dtype": "INT32"

         "rank": 2,

         "dimensions": [3,4],

         "data": [[1,2,3,4],

                  [5,6,7,8],

                  [9,10,11,12]]

         }

...

}

 

NOTES:

 

* I used nested arrays, rather than flattening the 2-d array -- this maps nicely to things like numpy arrays, for example -- not sure about the C++ world. (you can flatten and un-flatten numpy arrays easily, too, but this seems like a better mapping to the structure) And HDF is storing this all in chunks and who knows what -- so it's not a direct mapping to the memory layout anyway.

 

* Do you need "rank"? -- can't you check the length of the dimensions array?

 

* Do you  need "object_type" -- will it always be a dataset? Or you could have something like:

 

{

...

"datasets": {"dset1": {the actual dataset object},

             "dset2": {another dataset object},

 ....

 

Then you don't need object_type or a name

 

 

(BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

 

I would like to make this some kind of "official" netCDF/HDF5 JSON format for the community, so I encourage anyone to read the specification

 

If you see any flaw in the design or anything in the design that you would like to have change please let me know now

 

done :-)

 

It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest:

 

1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project:

 

 

(NOTE that that one got put on gitHub after there was a pretty complete draft spec, so there isn't THAT much discussion captured. But also note that that is too bad -- there is no good record of the decision process that led to the spec)

 

At the moment it only (intentionally) uses common generic features of both netCDF and HDF5, which are the numeric atomic types and strings.

 

Good plan.

 

-Chris

 

 

--


Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [netcdfgroup] How to dump netCDF to JSON?

Pedro Vicente

@John
 
>> 1.       Complete fidelity to all HDF5 features
>> 2.       Support graphs that are not acyclic.
 
ok, understood.
 
In my case I needed a simple schema for a particular set of files.
 
But why didn't you start with the official HDF5 DDL
 
 
and try to adapt to JSON?
 
Same thing for netCDF, there is already an official CDL, so any JSON spec should be "identical".
 
 
 
@Chris
 
{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}
 
>> * Do you need "rank"?
 
sometimes a bit of redundancy is useful, to make it visually clear
 
>> BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)
 
yes
 
>>It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest 
>>1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project: 
 
 
ok, anyone interested send me an off list  email
 
 
-Pedro

 

 
 
----- Original Message -----
Sent: Tuesday, October 18, 2016 11:15 PM
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

Hey,

 

The hdf5-json code is here: https://github.com/HDFGroup/hdf5-json and docs are here:  http://hdf5-json.readthedocs.io/en/latest/

 

The package is both a library of HFD5 <-> JSON conversion functions and some simple scripts for converting HDF5 to JSON and vice-versa.  E.g.

$ python h5tojson.py –D <hdf5-file>

outputs JSON minus the dataset data values.

 

While it may not be the most elegant JSON schema, it’s designed with the following goals in mind:

1.       Complete fidelity to all HDF5 features (i.e. the goal is that you should be able to take any HDF5 files, convert it to JSON, convert back to HDF5 and wind up with a file that is semantically equivalent to what you started with.

2.       Support graphs that are not acyclic.  I.e. a group structure like <root> links with A, and B.  And A and B links to C.  The output should only produce one representation of C.

Since NetCDF doesn’t use all these features, it’s certainly possible to come up with something simpler for just netCDF files.

 

Suggestions, feedback, and pull requests are welcome!

 

Cheers,

John

 

From: Chris Barker <[hidden email]>
Date: Friday, October 14, 2016 at 12:32 PM
To: Pedro Vicente <[hidden email]>
Cc: netCDF Mail List <[hidden email]>, Charlie Zender <[hidden email]>, John Readey <[hidden email]>, HDF Users Discussion List <[hidden email]>, David Pearah <[hidden email]>
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

Pedro,

 

When I first started reading this thread, I thought "there should be a spec for how to represent netcdf in JSON"

 

and then I read:

 

1) The specification to convert netCDF/HDF5 to "a" JSON format (note the "a" here)

 

Awesome -- that's exactly what we need -- as you say there is not one way to represent netcdf data in JSON, and probably far more than one "obvious" way.

 

Without looking at your spec yet, I do think it should probably look as much like CDL as possible -- we are all familiar with that.

 

(why Python? HDF5 developer tools should be all about writing in C/C++)

 

Because Python is an excellent language with which to "drive" C/C++ libraries like HDF5 and netcdf4. If I were to do this, I'd sure use Python. Even if you want to get to a C++ implementation eventually, you'd probably benefit from prototyping and working out the kinks with a Python version first.

 

But whoever is writing the code....

 

 

The specification is here

http://www.space-research.org/

 

Just took a quick look -- nice start. 

 

I've only used HDF through the netcdf4 spec, so there may be richness needed that I'm missing, but my first thought is to a make greater use of "objects" in JSON (key-value structures, hash tables, dicts in python), rather than array position for heterogeneous structures. For instance, you have:

 

 a dataset


{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}

 

I would perhaps do that as something like:

 

{

...

"dset1":{"object_type": "dataset",

         "dtype": "INT32"

         "rank": 2,

         "dimensions": [3,4],

         "data": [[1,2,3,4],

                  [5,6,7,8],

                  [9,10,11,12]]

         }

...

}

 

NOTES:

 

* I used nested arrays, rather than flattening the 2-d array -- this maps nicely to things like numpy arrays, for example -- not sure about the C++ world. (you can flatten and un-flatten numpy arrays easily, too, but this seems like a better mapping to the structure) And HDF is storing this all in chunks and who knows what -- so it's not a direct mapping to the memory layout anyway.

 

* Do you need "rank"? -- can't you check the length of the dimensions array?

 

* Do you  need "object_type" -- will it always be a dataset? Or you could have something like:

 

{

...

"datasets": {"dset1": {the actual dataset object},

             "dset2": {another dataset object},

 ....

 

Then you don't need object_type or a name

 

 

(BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

 

I would like to make this some kind of "official" netCDF/HDF5 JSON format for the community, so I encourage anyone to read the specification

 

If you see any flaw in the design or anything in the design that you would like to have change please let me know now

 

done :-)

 

It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest:

 

1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project:

 

 

(NOTE that that one got put on gitHub after there was a pretty complete draft spec, so there isn't THAT much discussion captured. But also note that that is too bad -- there is no good record of the decision process that led to the spec)

 

At the moment it only (intentionally) uses common generic features of both netCDF and HDF5, which are the numeric atomic types and strings.

 

Good plan.

 

-Chris

 

 

--


Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [netcdfgroup] How to dump netCDF to JSON?

Pedro Vicente
In reply to this post by Walter Landry
I knew that something was left unanswered

@Walter

> Is there a way to represent compound HDF5 objects in your scheme?  I
> see STAR_INT8, STAR_FLOAT, etc., but no STAR_COMPOUND.

no, only the atomic types.
see spec

http://www.space-research.org/blog/star_json.html


> In any event, FYI, I have implemented a similar thing but in a
> narrower domain.  I needed to map VOTable [1] (an xml format with a
> defined data model for astronomy) to HDF5.
> The code is on github [2].  It is mainly meant to be used as a
> library, but the code also builds a simple program that converts files
> between formats.

ok, thanks,
I did a clone of

https://github.com/Caltech-IPAC/tablator

but it seems there is not a build system?
by the way HDF5 has its own table API, it seems you did not use it

https://support.hdfgroup.org/HDF5/doc/HL/RM_H5TB.html



-Pedro


----- Original Message -----
From: "Walter Landry" <[hidden email]>
To: <[hidden email]>; <[hidden email]>
Cc: <[hidden email]>; <[hidden email]>
Sent: Friday, October 14, 2016 2:10 AM
Subject: Re: [Hdf-forum] [netcdfgroup] How to dump netCDF to JSON?


> Hi Pedro,
>
> Is there a way to represent compound HDF5 objects in your scheme?  I
> see STAR_INT8, STAR_FLOAT, etc., but no STAR_COMPOUND.
>
> In any event, FYI, I have implemented a similar thing but in a
> narrower domain.  I needed to map VOTable [1] (an xml format with a
> defined data model for astronomy) to HDF5.  Along the way, I also
> mapped it (with varying levels of fidelity) to JSON, CSV, HTML, FITS,
> and plain old ascii.  This makes my work very table focused, with a
> particular eye towards what relational databases like to output.
>
> The code is on github [2].  It is mainly meant to be used as a
> library, but the code also builds a simple program that converts files
> between formats.  In case you are interested, I am attaching the same
> file in plain text, JSON, and HDF5.
>
> Cheers,
> Walter Landry
>
> [1] http://www.ivoa.net/documents/VOTable/
> [2] https://github.com/Caltech-IPAC/tablator
>
> Pedro Vicente <[hidden email]> wrote:
>> Hi Charlie !
>>
>>
>> So, I am doing that exact same thing.
>>
>> I wrote
>>
>> 1) The specification to convert netCDF/HDF5 to "a" JSON format (note
>> the "a" here)
>>
>> 2) I wrote the parsing of the JSON format using an open source C JSON
>> library (janson).
>>
>> http://www.digip.org/jansson/
>>
>> 3) left to do: the actual C code of reading/writing netCDF/HDF5 to
>> JSON and vice-versa (the straightforward part).
>>
>> The "a" above means that JSON is not really a format in the sense of
>> netCDF but really a format that allows to define formats , for a lack
>> of a better
>> explanation.
>>
>> This means that anyone that writes this tool has to write code that
>> write in a particular JSON representation , only valid for that tool.
>>
>> Like you , I searched and there was not a good one, so I wrote one.
>>
>> The first criteria was that it had to be obvious for anyone looking at
>> the JSON text file, that that was indeed a netCDF/HDF5 file: hierarchy
>> clearly show, metadata and data
>> clearly shown
>>
>> My first look  was HDF5-JSON
>>
>> http://hdf5-json.readthedocs.io/en/latest/
>>
>> but the format seemed like a  mess to look at
>>
>> example
>>
>> http://hdf5-json.readthedocs.io/en/latest/examples/nullspace_dset.html
>>
>>
>> and the reader is written in Python
>>
>> @John Readey
>>
>> (why Phyton? HDF5 developer tools should be all about writing in
>> C/C++)
>>
>>
>>
>> The specification is here
>>
>> http://www.space-research.org/
>>
>> Click on menu
>> "Code blog",
>> then
>> "netCDF/HDF5 to JSON and vice-versa"
>>
>>
>> In the process I learned all about JSON and it is a neat format to
>> represent data .
>>
>> In particular, it allows nested structures and arrays, which suits
>> perfectly for netCDF
>>
>> here are two nested groups
>>
>> {
>> "group_name1":
>> {
>>  "group_name2": "group"
>> }
>> }
>>
>> a dataset
>>
>> {
>> "dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7,
>> 8, 9, 10, 11, 12]]
>> }
>>
>>
>>
>> This is still under development,
>>
>> I would like to make this some kind of "official" netCDF/HDF5 JSON
>> format for the community, so I encourage anyone to read the
>> specification
>>
>> direct link
>>
>> http://www.space-research.org/blog/star_json.html
>>
>> If you see any flaw in the design or antything in the design that you
>> would like to have change please let me know now
>>
>> At the moment it only (intentionally) uses common generic features of
>> both netCDF and HDF5, which are the numeric atomic types and strings.
>>
>> Enjoy
>>
>>
>> ----------------------
>> Pedro Vicente
>> [hidden email]
>> http://www.space-research.org/
>>
>>
>>
>>
>> ----- Original Message -----
>> From: "Charlie Zender" <[hidden email]>
>> To: "netCDF Mail List" <[hidden email]>
>> Sent: Thursday, October 13, 2016 11:10 PM
>> Subject: [netcdfgroup] How to dump netCDF to JSON?
>>
>>
>>> Hello netCDFers,
>>>
>>> A project I am working on wants to convert netCDF files to JSON.
>>> The requirements are to dump an arbitrary netCDF-extended file
>>> (with groups but without funky vlen/compound types) to JSON.
>>> The first few solutions that we googled (ncdump-json, netcdf2json.py)
>>> do not satisfy these requirements. What is the most robust and easy
>>> command-line tool (not web-service) that dumps netCDF to JSON?
>>> Ideally it would be somewhat configurable like ncdump -h/-x or
>>> ncks --cdl/--xml.
>>>
>>> Charlie
>>> --
>>> Charlie Zender, Earth System Sci. & Computer Sci.
>>> University of California, Irvine 949-891-2429 )'(
>>>
>>> _______________________________________________
>>> NOTE: All exchanges posted to Unidata maintained email lists are
>>> recorded in the Unidata inquiry tracking system and made publicly
>>> available through the web.  Users who post to any of the lists we
>>> maintain are reminded to remove any personal information that they
>>> do not want to be made public.
>>>
>>>
>>> netcdfgroup mailing list
>>> [hidden email]
>>> For list information or to unsubscribe, visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>


--------------------------------------------------------------------------------


>| object                                   | ra        | dec       | htm20
>| htm7      | htm3   | shtm20              | shtm7      |  shtm3 | flags |
>SSO   |
> | char                                     | double    | real      | ulong
> | uint      | ushort | long                |  int       |  short | byte  |
> bool  |
> 118289arstratraetratratsrastratsrastrats    359.88703   50.832570
> 16446744073709551616  3294967296    12000   8223372036854775808
> 1147483648    12000    122    0
> 113368                                      344.41273   -29.622250
> 8446744073709551616   294967296
>    43002  -7223372036854775808  -2047483648    13002    0xf2   true
> 113368                                      344.41273   -29.622250
> 8446744073709551616   294967296
>    43002  -7223372036854775808  -2047483648   -23002    211    False
> 113368                                      344.41273   -29.622250
> 8446744073709551616   294967296
>    43002  -7223372036854775808  -2047483648   -31002    211    1
>


--------------------------------------------------------------------------------


> {
>    "VOTABLE":
>    {
>        "<xmlattr>":
>        {
>            "version": "1.3",
>            "xmlns:xsi": "http:\/\/www.w3.org\/2001\/XMLSchema-instance",
>            "xmlns": "http:\/\/www.ivoa.net\/xml\/VOTable\/v1.3",
>            "xmlns:stc": "http:\/\/www.ivoa.net\/xml\/STC\/v1.30"
>        },
>        "RESOURCE":
>        {
>            "TABLE":
>            {
>                "FIELD":
>                {
>                    "<xmlattr>":
>                    {
>                        "name": "object",
>                        "datatype": "char",
>                        "arraysize": "*"
>                    }
>                },
>                "FIELD":
>                {
>                    "<xmlattr>":
>                    {
>                        "name": "ra",
>                        "datatype": "double"
>                    }
>                },
>                "FIELD":
>                {
>                    "<xmlattr>":
>                    {
>                        "name": "dec",
>                        "datatype": "double"
>                    }
>                },
>                "FIELD":
>                {
>                    "<xmlattr>":
>                    {
>                        "name": "htm20",
>                        "datatype": "ulong"
>                    }
>                },
>                "FIELD":
>                {
>                    "<xmlattr>":
>                    {
>                        "name": "htm7",
>                        "datatype": "uint"
>                    }
>                },
>                "FIELD":
>                {
>                    "<xmlattr>":
>                    {
>                        "name": "htm3",
>                        "datatype": "ushort"
>                    }
>                },
>                "FIELD":
>                {
>                    "<xmlattr>":
>                    {
>                        "name": "shtm20",
>                        "datatype": "long"
>                    }
>                },
>                "FIELD":
>                {
>                    "<xmlattr>":
>                    {
>                        "name": "shtm7",
>                        "datatype": "int"
>                    }
>                },
>                "FIELD":
>                {
>                    "<xmlattr>":
>                    {
>                        "name": "shtm3",
>                        "datatype": "short"
>                    }
>                },
>                "FIELD":
>                {
>                    "<xmlattr>":
>                    {
>                        "name": "flags",
>                        "datatype": "unsignedByte"
>                    }
>                },
>                "FIELD":
>                {
>                    "<xmlattr>":
>                    {
>                        "name": "SSO",
>                        "datatype": "boolean"
>                    }
>                },
>                "DATA":
>                {
>                    "TABLEDATA":
>                    [
>                        [
>                            "118289arstratraetratratsrastratsrastrats",
>                            "359.88702999999998",
>                            "50.832569999999997",
>                            "16446744073709551616",
>                            "3294967296",
>                            "12000",
>                            "8223372036854775808",
>                            "1147483648",
>                            "12000",
>                            "0x7a",
>                            "0"
>                        ],
>                        [
>                            "113368",
>                            "344.41273000000001",
>                            "-29.622250000000001",
>                            "8446744073709551616",
>                            "294967296",
>                            "43002",
>                            "-7223372036854775808",
>                            "-2047483648",
>                            "13002",
>                            "0xf2",
>                            "1"
>                        ],
>                        [
>                            "113368",
>                            "344.41273000000001",
>                            "-29.622250000000001",
>                            "8446744073709551616",
>                            "294967296",
>                            "43002",
>                            "-7223372036854775808",
>                            "-2047483648",
>                            "-23002",
>                            "0xd3",
>                            "0"
>                        ],
>                        [
>                            "113368",
>                            "344.41273000000001",
>                            "-29.622250000000001",
>                            "8446744073709551616",
>                            "294967296",
>                            "43002",
>                            "-7223372036854775808",
>                            "-2047483648",
>                            "-31002",
>                            "0xd3",
>                            "1"
>                        ]
>                    ]
>                }
>            }
>        }
>    }
> }
>


--------------------------------------------------------------------------------


> ?HDF
> 
> ÿÿÿÿÿÿÿÿ0ÓGâAOHDR TrXTrXTrXTrXxÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
> tableÃD"%=?OHDR!TrXTrXTrXTrX 6
> Xnull_bitfield_flags:object(ra* ?@4 4ÿdec2 ?@4
> 4ÿhtm20:@htm7B htm3Fshtm20H@shtm7P
> shtm3TflagsVSSOW
> 0`ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
> ¼ -METADATA960namevalueattributes 96
> namevalue4)
> ±118289arstratraetratratsrastratsrastrats"?^F1~v@»Ð\§'jI@8±~'>ä6eÄà.oXLIr6eDà.z113368@ÁÅSs?u@zï§ÆKY=ÀvûÜ8uØ"ú§ÈNgmÁ>áõ.Ê2ò113368@ÁÅSs?u@zï§ÆKY=ÀvûÜ8uØ"ú§ÈNgmÁ>áõ.&¦Ó113368@ÁÅSs?u@zï§ÆKY=ÀvûÜ8uØ"ú§ÈNgmÁ>áõ.æ?Ó


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [netcdfgroup] How to dump netCDF to JSON?

Walter Landry
Pedro Vicente <[hidden email]> wrote:

>> In any event, FYI, I have implemented a similar thing but in a
>> narrower domain.  I needed to map VOTable [1] (an xml format with a
>> defined data model for astronomy) to HDF5.
>> The code is on github [2].  It is mainly meant to be used as a
>> library, but the code also builds a simple program that converts files
>> between formats.
>
> ok, thanks,
> I did a clone of
>
> https://github.com/Caltech-IPAC/tablator
>
> but it seems there is not a build system?

It has a build system, but is missing an INSTALL file :( I have now
added one.

> by the way HDF5 has its own table API, it seems you did not use it
>
> https://support.hdfgroup.org/HDF5/doc/HL/RM_H5TB.html

Correct.  The tablator library reads the metadata to set up the column
types and the number of rows.  Then populating the table is a binary
read().  I needed the absolute best performance, and this strategy
avoids the need to parse each column and row.  This puts some limits
on what kind of HDF5 tables it can read.  For example, it can not
handle variable length strings.

Cheers,
Walter Landry

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [netcdfgroup] How to dump netCDF to JSON?

JohnReadey
In reply to this post by Pedro Vicente

Hi Pedro,

 

  For the purposed of the HDF Server project, we needed a way to identify resources (Datasets, Groups, Datatypes) that were independent of paths.  Since a path to an object can change without effecting the object itself, to have a RESTful service we wanted a canonical identifier to an object that didn’t rely on a path lookup.  So we came up with a scheme of Group, Dataset, and Datatype collections with a UUID to identify each object.  That way if you a reference to a specific UUID, you can always access the object regardless of what shenanigans may be happening with the links in the file.

 

It’s true that this makes path look ups a bit more cumbersome, but it’s a more general way of specify a directed graph (the HDF5 link structure) on a tree (the JSON hierarchy).

 

John

 

From: Pedro Vicente <[hidden email]>
Date: Tuesday, October 18, 2016 at 9:37 PM
To: John Readey <[hidden email]>, Chris Barker <[hidden email]>
Cc: netCDF Mail List <[hidden email]>, HDF Users Discussion List <[hidden email]>
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

@John

 

>> 1.       Complete fidelity to all HDF5 features

>> 2.       Support graphs that are not acyclic.

 

ok, understood.

 

In my case I needed a simple schema for a particular set of files.

 

But why didn't you start with the official HDF5 DDL

 

 

and try to adapt to JSON?

 

Same thing for netCDF, there is already an official CDL, so any JSON spec should be "identical".

 

 

 

@Chris

 

{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}

 

>> * Do you need "rank"?

 

sometimes a bit of redundancy is useful, to make it visually clear

 

>> BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

 

yes

 

>>It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest 

>>1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project: 

 

 

ok, anyone interested send me an off list  email

 

 

-Pedro

 

 

 

----- Original Message -----

Sent: Tuesday, October 18, 2016 11:15 PM

Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

Hey,

 

The hdf5-json code is here: https://github.com/HDFGroup/hdf5-json and docs are here:  http://hdf5-json.readthedocs.io/en/latest/

 

The package is both a library of HFD5 <-> JSON conversion functions and some simple scripts for converting HDF5 to JSON and vice-versa.  E.g.

$ python h5tojson.py –D <hdf5-file>

outputs JSON minus the dataset data values.

 

While it may not be the most elegant JSON schema, it’s designed with the following goals in mind:

1.       Complete fidelity to all HDF5 features (i.e. the goal is that you should be able to take any HDF5 files, convert it to JSON, convert back to HDF5 and wind up with a file that is semantically equivalent to what you started with.

2.       Support graphs that are not acyclic.  I.e. a group structure like <root> links with A, and B.  And A and B links to C.  The output should only produce one representation of C.

Since NetCDF doesn’t use all these features, it’s certainly possible to come up with something simpler for just netCDF files.

 

Suggestions, feedback, and pull requests are welcome!

 

Cheers,

John

 

From: Chris Barker <[hidden email]>
Date: Friday, October 14, 2016 at 12:32 PM
To: Pedro Vicente <[hidden email]>
Cc: netCDF Mail List <[hidden email]>, Charlie Zender <[hidden email]>, John Readey <[hidden email]>, HDF Users Discussion List <[hidden email]>, David Pearah <[hidden email]>
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

Pedro,

 

When I first started reading this thread, I thought "there should be a spec for how to represent netcdf in JSON"

 

and then I read:

 

1) The specification to convert netCDF/HDF5 to "a" JSON format (note the "a" here)

 

Awesome -- that's exactly what we need -- as you say there is not one way to represent netcdf data in JSON, and probably far more than one "obvious" way.

 

Without looking at your spec yet, I do think it should probably look as much like CDL as possible -- we are all familiar with that.

 

(why Python? HDF5 developer tools should be all about writing in C/C++)

 

Because Python is an excellent language with which to "drive" C/C++ libraries like HDF5 and netcdf4. If I were to do this, I'd sure use Python. Even if you want to get to a C++ implementation eventually, you'd probably benefit from prototyping and working out the kinks with a Python version first.

 

But whoever is writing the code....

 

 

The specification is here

http://www.space-research.org/

 

Just took a quick look -- nice start. 

 

I've only used HDF through the netcdf4 spec, so there may be richness needed that I'm missing, but my first thought is to a make greater use of "objects" in JSON (key-value structures, hash tables, dicts in python), rather than array position for heterogeneous structures. For instance, you have:

 

 a dataset


{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}

 

I would perhaps do that as something like:

 

{

...

"dset1":{"object_type": "dataset",

         "dtype": "INT32"

         "rank": 2,

         "dimensions": [3,4],

         "data": [[1,2,3,4],

                  [5,6,7,8],

                  [9,10,11,12]]

         }

...

}

 

NOTES:

 

* I used nested arrays, rather than flattening the 2-d array -- this maps nicely to things like numpy arrays, for example -- not sure about the C++ world. (you can flatten and un-flatten numpy arrays easily, too, but this seems like a better mapping to the structure) And HDF is storing this all in chunks and who knows what -- so it's not a direct mapping to the memory layout anyway.

 

* Do you need "rank"? -- can't you check the length of the dimensions array?

 

* Do you  need "object_type" -- will it always be a dataset? Or you could have something like:

 

{

...

"datasets": {"dset1": {the actual dataset object},

             "dset2": {another dataset object},

 ....

 

Then you don't need object_type or a name

 

 

(BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

 

I would like to make this some kind of "official" netCDF/HDF5 JSON format for the community, so I encourage anyone to read the specification

 

If you see any flaw in the design or anything in the design that you would like to have change please let me know now

 

done :-)

 

It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest:

 

1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project:

 

 

(NOTE that that one got put on gitHub after there was a pretty complete draft spec, so there isn't THAT much discussion captured. But also note that that is too bad -- there is no good record of the decision process that led to the spec)

 

At the moment it only (intentionally) uses common generic features of both netCDF and HDF5, which are the numeric atomic types and strings.

 

Good plan.

 

-Chris

 

 

--


Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [netcdfgroup] How to dump netCDF to JSON?

JohnReadey

Right, a reasonable approach would be to use the hdf5-json code to output json for datasets and attributes, but arrange it by groups in a tree hierarchy.  All I suspect that you’d want dimension scales treated as a distinct object type.

 

John

 

From: Chris Barker <[hidden email]>
Date: Thursday, October 20, 2016 at 1:48 PM
To: John Readey <[hidden email]>
Cc: Pedro Vicente <[hidden email]>, netCDF Mail List <[hidden email]>, HDF Users Discussion List <[hidden email]>
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

On Thu, Oct 20, 2016 at 12:02 PM, John Readey <[hidden email]> wrote:

So we came up with a scheme of Group, Dataset, and Datatype collections with a UUID to identify each object.  That way if you a reference to a specific UUID, you can always access the object regardless of what shenanigans may be happening with the links in the file.

 

It’s true that this makes path look ups a bit more cumbersome, but it’s a more general way of specify a directed graph (the HDF5 link structure) on a tree (the JSON hierarchy).

 

Hmm -- interesting. I hadn't realized that HDF was this flexible. For my part, I've only really used netcdf.

 

This is making me think that we may want a spec for netcdf-json that would be a subset of the hdf-json spec.

 

That way they can be as compatible as possible without "cluttering up" the netcdf spec too much.

 

-CHB

 

 

 

 

 

John

 

From: Pedro Vicente <[hidden email]>
Date: Tuesday, October 18, 2016 at 9:37 PM
To: John Readey <[hidden email]>, Chris Barker <[hidden email]>
Cc: netCDF Mail List <[hidden email]>, HDF Users Discussion List <[hidden email]>


Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

@John

 

>> 1.       Complete fidelity to all HDF5 features

>> 2.       Support graphs that are not acyclic.

 

ok, understood.

 

In my case I needed a simple schema for a particular set of files.

 

But why didn't you start with the official HDF5 DDL

 

 

and try to adapt to JSON?

 

Same thing for netCDF, there is already an official CDL, so any JSON spec should be "identical".

 

 

 

@Chris

 

{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}

 

>> * Do you need "rank"?

 

sometimes a bit of redundancy is useful, to make it visually clear

 

>> BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

 

yes

 

>>It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest 

>>1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project: 

 

 

ok, anyone interested send me an off list  email

 

 

-Pedro

 

 

 

----- Original Message -----

Sent: Tuesday, October 18, 2016 11:15 PM

Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

Hey,

 

The hdf5-json code is here: https://github.com/HDFGroup/hdf5-json and docs are here:  http://hdf5-json.readthedocs.io/en/latest/

 

The package is both a library of HFD5 <-> JSON conversion functions and some simple scripts for converting HDF5 to JSON and vice-versa.  E.g.

$ python h5tojson.py –D <hdf5-file>

outputs JSON minus the dataset data values.

 

While it may not be the most elegant JSON schema, it’s designed with the following goals in mind:

1.       Complete fidelity to all HDF5 features (i.e. the goal is that you should be able to take any HDF5 files, convert it to JSON, convert back to HDF5 and wind up with a file that is semantically equivalent to what you started with.

2.       Support graphs that are not acyclic.  I.e. a group structure like <root> links with A, and B.  And A and B links to C.  The output should only produce one representation of C.

Since NetCDF doesn’t use all these features, it’s certainly possible to come up with something simpler for just netCDF files.

 

Suggestions, feedback, and pull requests are welcome!

 

Cheers,

John

 

From: Chris Barker <[hidden email]>
Date: Friday, October 14, 2016 at 12:32 PM
To: Pedro Vicente <[hidden email]>
Cc: netCDF Mail List <[hidden email]>, Charlie Zender <[hidden email]>, John Readey <[hidden email]>, HDF Users Discussion List <[hidden email]>, David Pearah <[hidden email]>
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

Pedro,

 

When I first started reading this thread, I thought "there should be a spec for how to represent netcdf in JSON"

 

and then I read:

 

1) The specification to convert netCDF/HDF5 to "a" JSON format (note the "a" here)

 

Awesome -- that's exactly what we need -- as you say there is not one way to represent netcdf data in JSON, and probably far more than one "obvious" way.

 

Without looking at your spec yet, I do think it should probably look as much like CDL as possible -- we are all familiar with that.

 

(why Python? HDF5 developer tools should be all about writing in C/C++)

 

Because Python is an excellent language with which to "drive" C/C++ libraries like HDF5 and netcdf4. If I were to do this, I'd sure use Python. Even if you want to get to a C++ implementation eventually, you'd probably benefit from prototyping and working out the kinks with a Python version first.

 

But whoever is writing the code....

 

 

The specification is here

http://www.space-research.org/

 

Just took a quick look -- nice start. 

 

I've only used HDF through the netcdf4 spec, so there may be richness needed that I'm missing, but my first thought is to a make greater use of "objects" in JSON (key-value structures, hash tables, dicts in python), rather than array position for heterogeneous structures. For instance, you have:

 

 a dataset


{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}

 

I would perhaps do that as something like:

 

{

...

"dset1":{"object_type": "dataset",

         "dtype": "INT32"

         "rank": 2,

         "dimensions": [3,4],

         "data": [[1,2,3,4],

                  [5,6,7,8],

                  [9,10,11,12]]

         }

...

}

 

NOTES:

 

* I used nested arrays, rather than flattening the 2-d array -- this maps nicely to things like numpy arrays, for example -- not sure about the C++ world. (you can flatten and un-flatten numpy arrays easily, too, but this seems like a better mapping to the structure) And HDF is storing this all in chunks and who knows what -- so it's not a direct mapping to the memory layout anyway.

 

* Do you need "rank"? -- can't you check the length of the dimensions array?

 

* Do you  need "object_type" -- will it always be a dataset? Or you could have something like:

 

{

...

"datasets": {"dset1": {the actual dataset object},

             "dset2": {another dataset object},

 ....

 

Then you don't need object_type or a name

 

 

(BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

 

I would like to make this some kind of "official" netCDF/HDF5 JSON format for the community, so I encourage anyone to read the specification

 

If you see any flaw in the design or anything in the design that you would like to have change please let me know now

 

done :-)

 

It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest:

 

1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project:

 

 

(NOTE that that one got put on gitHub after there was a pretty complete draft spec, so there isn't THAT much discussion captured. But also note that that is too bad -- there is no good record of the decision process that led to the spec)

 

At the moment it only (intentionally) uses common generic features of both netCDF and HDF5, which are the numeric atomic types and strings.

 

Good plan.

 

-Chris

 

 

--


Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            <a href="tel:%28206%29%20526-6959" target="_blank">(206) 526-6959   voice
7600 Sand Point Way NE   <a href="tel:%28206%29%20526-6329" target="_blank">(206) 526-6329   fax
Seattle, WA  98115       <a href="tel:%28206%29%20526-6317" target="_blank">(206) 526-6317   main reception

[hidden email]



 

--


Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [netcdfgroup] How to dump netCDF to JSON?

Pedro Vicente
In reply to this post by JohnReadey

>>> This is making me think that we may want a spec for netcdf-json that would be a subset of the hdf-json spec.
 
that is one option;
other option is to make a JSON form of netCDF CDL , completely unaware of HDF5 (just like the netCDF API is)
 
 
with the "data" part being optional, which was one of the goals of my design, to transmit just metadata over the web, for a quick remote inspection
 
-Pedro
----- Original Message -----
Sent: Thursday, October 20, 2016 4:48 PM
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

On Thu, Oct 20, 2016 at 12:02 PM, John Readey <[hidden email]> wrote:

So we came up with a scheme of Group, Dataset, and Datatype collections with a UUID to identify each object.  That way if you a reference to a specific UUID, you can always access the object regardless of what shenanigans may be happening with the links in the file.

 

It’s true that this makes path look ups a bit more cumbersome, but it’s a more general way of specify a directed graph (the HDF5 link structure) on a tree (the JSON hierarchy).


Hmm -- interesting. I hadn't realized that HDF was this flexible. For my part, I've only really used netcdf.

This is making me think that we may want a spec for netcdf-json that would be a subset of the hdf-json spec.

That way they can be as compatible as possible without "cluttering up" the netcdf spec too much.

-CHB



 

 

John

 

From: Pedro Vicente <[hidden email]>
Date: Tuesday, October 18, 2016 at 9:37 PM
To: John Readey <[hidden email]>, Chris Barker <[hidden email]>
Cc: netCDF Mail List <[hidden email]>, HDF Users Discussion List <[hidden email]>


Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

@John

 

>> 1.       Complete fidelity to all HDF5 features

>> 2.       Support graphs that are not acyclic.

 

ok, understood.

 

In my case I needed a simple schema for a particular set of files.

 

But why didn't you start with the official HDF5 DDL

 

 

and try to adapt to JSON?

 

Same thing for netCDF, there is already an official CDL, so any JSON spec should be "identical".

 

 

 

@Chris

 

{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}

 

>> * Do you need "rank"?

 

sometimes a bit of redundancy is useful, to make it visually clear

 

>> BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

 

yes

 

>>It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest 

>>1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project: 

 

 

ok, anyone interested send me an off list  email

 

 

-Pedro

 

 

 

----- Original Message -----

Sent: Tuesday, October 18, 2016 11:15 PM

Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

Hey,

 

The hdf5-json code is here: https://github.com/HDFGroup/hdf5-json and docs are here:  http://hdf5-json.readthedocs.io/en/latest/

 

The package is both a library of HFD5 <-> JSON conversion functions and some simple scripts for converting HDF5 to JSON and vice-versa.  E.g.

$ python h5tojson.py –D <hdf5-file>

outputs JSON minus the dataset data values.

 

While it may not be the most elegant JSON schema, it’s designed with the following goals in mind:

1.       Complete fidelity to all HDF5 features (i.e. the goal is that you should be able to take any HDF5 files, convert it to JSON, convert back to HDF5 and wind up with a file that is semantically equivalent to what you started with.

2.       Support graphs that are not acyclic.  I.e. a group structure like <root> links with A, and B.  And A and B links to C.  The output should only produce one representation of C.

Since NetCDF doesn’t use all these features, it’s certainly possible to come up with something simpler for just netCDF files.

 

Suggestions, feedback, and pull requests are welcome!

 

Cheers,

John

 

From: Chris Barker <[hidden email]>
Date: Friday, October 14, 2016 at 12:32 PM
To: Pedro Vicente <[hidden email]>
Cc: netCDF Mail List <[hidden email]>, Charlie Zender <[hidden email]>, John Readey <[hidden email]>, HDF Users Discussion List <[hidden email]>, David Pearah <[hidden email]>
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

Pedro,

 

When I first started reading this thread, I thought "there should be a spec for how to represent netcdf in JSON"

 

and then I read:

 

1) The specification to convert netCDF/HDF5 to "a" JSON format (note the "a" here)

 

Awesome -- that's exactly what we need -- as you say there is not one way to represent netcdf data in JSON, and probably far more than one "obvious" way.

 

Without looking at your spec yet, I do think it should probably look as much like CDL as possible -- we are all familiar with that.

 

(why Python? HDF5 developer tools should be all about writing in C/C++)

 

Because Python is an excellent language with which to "drive" C/C++ libraries like HDF5 and netcdf4. If I were to do this, I'd sure use Python. Even if you want to get to a C++ implementation eventually, you'd probably benefit from prototyping and working out the kinks with a Python version first.

 

But whoever is writing the code....

 

 

The specification is here

http://www.space-research.org/

 

Just took a quick look -- nice start. 

 

I've only used HDF through the netcdf4 spec, so there may be richness needed that I'm missing, but my first thought is to a make greater use of "objects" in JSON (key-value structures, hash tables, dicts in python), rather than array position for heterogeneous structures. For instance, you have:

 

 a dataset


{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}

 

I would perhaps do that as something like:

 

{

...

"dset1":{"object_type": "dataset",

         "dtype": "INT32"

         "rank": 2,

         "dimensions": [3,4],

         "data": [[1,2,3,4],

                  [5,6,7,8],

                  [9,10,11,12]]

         }

...

}

 

NOTES:

 

* I used nested arrays, rather than flattening the 2-d array -- this maps nicely to things like numpy arrays, for example -- not sure about the C++ world. (you can flatten and un-flatten numpy arrays easily, too, but this seems like a better mapping to the structure) And HDF is storing this all in chunks and who knows what -- so it's not a direct mapping to the memory layout anyway.

 

* Do you need "rank"? -- can't you check the length of the dimensions array?

 

* Do you  need "object_type" -- will it always be a dataset? Or you could have something like:

 

{

...

"datasets": {"dset1": {the actual dataset object},

             "dset2": {another dataset object},

 ....

 

Then you don't need object_type or a name

 

 

(BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

 

I would like to make this some kind of "official" netCDF/HDF5 JSON format for the community, so I encourage anyone to read the specification

 

If you see any flaw in the design or anything in the design that you would like to have change please let me know now

 

done :-)

 

It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest:

 

1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project:

 

 

(NOTE that that one got put on gitHub after there was a pretty complete draft spec, so there isn't THAT much discussion captured. But also note that that is too bad -- there is no good record of the decision process that led to the spec)

 

At the moment it only (intentionally) uses common generic features of both netCDF and HDF5, which are the numeric atomic types and strings.

 

Good plan.

 

-Chris

 

 

--


Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            <A href="tel:%28206%29%20526-6959" target=_blank value="+12065266959">(206) 526-6959   voice
7600 Sand Point Way NE   <A href="tel:%28206%29%20526-6329" target=_blank value="+12065266329">(206) 526-6329   fax
Seattle, WA  98115       <A href="tel:%28206%29%20526-6317" target=_blank value="+12065266317">(206) 526-6317   main reception

[hidden email]




--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [netcdfgroup] How to dump netCDF to JSON?

Pedro Vicente

>>my thought was to make a netcdfJSON, then add features to make an hdfJSON. (and netcdfJSON would look a lot like CDL)
>>So a netcdfJSON file would be a valid hdfJSON file, but not the other way around.
 
yes, sounds like a good plan
I''ll send you an email when I have things ready, thanks
-Pedro
----- Original Message -----
Sent: Thursday, October 20, 2016 6:17 PM
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?



On Thu, Oct 20, 2016 at 3:00 PM, Pedro Vicente <[hidden email]> wrote:
>>> This is making me think that we may want a spec for netcdf-json that would be a subset of the hdf-json spec.
 
that is one option;
other option is to make a JSON form of netCDF CDL , completely unaware of HDF5 (just like the netCDF API is)
 

yup.

Are they mutually exclusive approaches? my thought was to make a netcdfJSON, then add features to make an hdfJSON. (and netcdfJSON would look a lot like CDL)

So a netcdfJSON file would be a valid hdfJSON file, but not the other way around.

Like a netcdf4 file is a valid hdf5 file now.

-CHB

 
with the "data" part being optional, which was one of the goals of my design, to transmit just metadata over the web, for a quick remote inspection
 
-Pedro
----- Original Message -----
Sent: Thursday, October 20, 2016 4:48 PM
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

On Thu, Oct 20, 2016 at 12:02 PM, John Readey <[hidden email]> wrote:

So we came up with a scheme of Group, Dataset, and Datatype collections with a UUID to identify each object.  That way if you a reference to a specific UUID, you can always access the object regardless of what shenanigans may be happening with the links in the file.

 

It’s true that this makes path look ups a bit more cumbersome, but it’s a more general way of specify a directed graph (the HDF5 link structure) on a tree (the JSON hierarchy).


Hmm -- interesting. I hadn't realized that HDF was this flexible. For my part, I've only really used netcdf.

This is making me think that we may want a spec for netcdf-json that would be a subset of the hdf-json spec.

That way they can be as compatible as possible without "cluttering up" the netcdf spec too much.

-CHB



 

 

John

 

From: Pedro Vicente <[hidden email]>
Date: Tuesday, October 18, 2016 at 9:37 PM
To: John Readey <[hidden email]>, Chris Barker <[hidden email]>
Cc: netCDF Mail List <[hidden email]>, HDF Users Discussion List <[hidden email]>


Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

@John

 

>> 1.       Complete fidelity to all HDF5 features

>> 2.       Support graphs that are not acyclic.

 

ok, understood.

 

In my case I needed a simple schema for a particular set of files.

 

But why didn't you start with the official HDF5 DDL

 

 

and try to adapt to JSON?

 

Same thing for netCDF, there is already an official CDL, so any JSON spec should be "identical".

 

 

 

@Chris

 

{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}

 

>> * Do you need "rank"?

 

sometimes a bit of redundancy is useful, to make it visually clear

 

>> BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

 

yes

 

>>It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest 

>>1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project: 

 

 

ok, anyone interested send me an off list  email

 

 

-Pedro

 

 

 

----- Original Message -----

Sent: Tuesday, October 18, 2016 11:15 PM

Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

Hey,

 

The hdf5-json code is here: https://github.com/HDFGroup/hdf5-json and docs are here:  http://hdf5-json.readthedocs.io/en/latest/

 

The package is both a library of HFD5 <-> JSON conversion functions and some simple scripts for converting HDF5 to JSON and vice-versa.  E.g.

$ python h5tojson.py –D <hdf5-file>

outputs JSON minus the dataset data values.

 

While it may not be the most elegant JSON schema, it’s designed with the following goals in mind:

1.       Complete fidelity to all HDF5 features (i.e. the goal is that you should be able to take any HDF5 files, convert it to JSON, convert back to HDF5 and wind up with a file that is semantically equivalent to what you started with.

2.       Support graphs that are not acyclic.  I.e. a group structure like <root> links with A, and B.  And A and B links to C.  The output should only produce one representation of C.

Since NetCDF doesn’t use all these features, it’s certainly possible to come up with something simpler for just netCDF files.

 

Suggestions, feedback, and pull requests are welcome!

 

Cheers,

John

 

From: Chris Barker <[hidden email]>
Date: Friday, October 14, 2016 at 12:32 PM
To: Pedro Vicente <[hidden email]>
Cc: netCDF Mail List <[hidden email]>, Charlie Zender <[hidden email]>, John Readey <[hidden email]>, HDF Users Discussion List <[hidden email]>, David Pearah <[hidden email]>
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

Pedro,

 

When I first started reading this thread, I thought "there should be a spec for how to represent netcdf in JSON"

 

and then I read:

 

1) The specification to convert netCDF/HDF5 to "a" JSON format (note the "a" here)

 

Awesome -- that's exactly what we need -- as you say there is not one way to represent netcdf data in JSON, and probably far more than one "obvious" way.

 

Without looking at your spec yet, I do think it should probably look as much like CDL as possible -- we are all familiar with that.

 

(why Python? HDF5 developer tools should be all about writing in C/C++)

 

Because Python is an excellent language with which to "drive" C/C++ libraries like HDF5 and netcdf4. If I were to do this, I'd sure use Python. Even if you want to get to a C++ implementation eventually, you'd probably benefit from prototyping and working out the kinks with a Python version first.

 

But whoever is writing the code....

 

 

The specification is here

http://www.space-research.org/

 

Just took a quick look -- nice start. 

 

I've only used HDF through the netcdf4 spec, so there may be richness needed that I'm missing, but my first thought is to a make greater use of "objects" in JSON (key-value structures, hash tables, dicts in python), rather than array position for heterogeneous structures. For instance, you have:

 

 a dataset


{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}

 

I would perhaps do that as something like:

 

{

...

"dset1":{"object_type": "dataset",

         "dtype": "INT32"

         "rank": 2,

         "dimensions": [3,4],

         "data": [[1,2,3,4],

                  [5,6,7,8],

                  [9,10,11,12]]

         }

...

}

 

NOTES:

 

* I used nested arrays, rather than flattening the 2-d array -- this maps nicely to things like numpy arrays, for example -- not sure about the C++ world. (you can flatten and un-flatten numpy arrays easily, too, but this seems like a better mapping to the structure) And HDF is storing this all in chunks and who knows what -- so it's not a direct mapping to the memory layout anyway.

 

* Do you need "rank"? -- can't you check the length of the dimensions array?

 

* Do you  need "object_type" -- will it always be a dataset? Or you could have something like:

 

{

...

"datasets": {"dset1": {the actual dataset object},

             "dset2": {another dataset object},

 ....

 

Then you don't need object_type or a name

 

 

(BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

 

I would like to make this some kind of "official" netCDF/HDF5 JSON format for the community, so I encourage anyone to read the specification

 

If you see any flaw in the design or anything in the design that you would like to have change please let me know now

 

done :-)

 

It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest:

 

1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project:

 

 

(NOTE that that one got put on gitHub after there was a pretty complete draft spec, so there isn't THAT much discussion captured. But also note that that is too bad -- there is no good record of the decision process that led to the spec)

 

At the moment it only (intentionally) uses common generic features of both netCDF and HDF5, which are the numeric atomic types and strings.

 

Good plan.

 

-Chris

 

 

--


Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            <A href="tel:%28206%29%20526-6959" target=_blank value="+12065266959">(206) 526-6959   voice
7600 Sand Point Way NE   <A href="tel:%28206%29%20526-6329" target=_blank value="+12065266329">(206) 526-6329   fax
Seattle, WA  98115       <A href="tel:%28206%29%20526-6317" target=_blank value="+12065266317">(206) 526-6317   main reception

[hidden email]




--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       <A href="tel:%28206%29%20526-6317" target=_blank value="+12065266317">(206) 526-6317   main reception

[hidden email]



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [netcdfgroup] How to dump netCDF to JSON?

Pedro Vicente

>>my thought was to make a netcdfJSON, then add features to make an hdfJSON. (and netcdfJSON would look a lot like CDL)
>>So a netcdfJSON file would be a valid hdfJSON file, but not the other way around.
 
on better thinking , this design has the problem of netCDF having things that HDF5 does not (named dimensions),
and HDF5 has things that netCDF does not, so it's a bit of a catch 22 ; so maybe just keep them separate
 
my design method is usually a bit of specification , then a bit of code , then when something new comes up that was not planned, go to step 1 ,
and re-write the spec, sometimes re-write the code
 
-Pedro
 
 
----- Original Message -----
Sent: Thursday, October 20, 2016 7:33 PM
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

>>my thought was to make a netcdfJSON, then add features to make an hdfJSON. (and netcdfJSON would look a lot like CDL)
>>So a netcdfJSON file would be a valid hdfJSON file, but not the other way around.
 
yes, sounds like a good plan
I''ll send you an email when I have things ready, thanks
-Pedro
----- Original Message -----
Sent: Thursday, October 20, 2016 6:17 PM
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?



On Thu, Oct 20, 2016 at 3:00 PM, Pedro Vicente <[hidden email]> wrote:
>>> This is making me think that we may want a spec for netcdf-json that would be a subset of the hdf-json spec.
 
that is one option;
other option is to make a JSON form of netCDF CDL , completely unaware of HDF5 (just like the netCDF API is)
 

yup.

Are they mutually exclusive approaches? my thought was to make a netcdfJSON, then add features to make an hdfJSON. (and netcdfJSON would look a lot like CDL)

So a netcdfJSON file would be a valid hdfJSON file, but not the other way around.

Like a netcdf4 file is a valid hdf5 file now.

-CHB

 
with the "data" part being optional, which was one of the goals of my design, to transmit just metadata over the web, for a quick remote inspection
 
-Pedro
----- Original Message -----
Sent: Thursday, October 20, 2016 4:48 PM
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

On Thu, Oct 20, 2016 at 12:02 PM, John Readey <[hidden email]> wrote:

So we came up with a scheme of Group, Dataset, and Datatype collections with a UUID to identify each object.  That way if you a reference to a specific UUID, you can always access the object regardless of what shenanigans may be happening with the links in the file.

 

It’s true that this makes path look ups a bit more cumbersome, but it’s a more general way of specify a directed graph (the HDF5 link structure) on a tree (the JSON hierarchy).


Hmm -- interesting. I hadn't realized that HDF was this flexible. For my part, I've only really used netcdf.

This is making me think that we may want a spec for netcdf-json that would be a subset of the hdf-json spec.

That way they can be as compatible as possible without "cluttering up" the netcdf spec too much.

-CHB



 

 

John

 

From: Pedro Vicente <[hidden email]>
Date: Tuesday, October 18, 2016 at 9:37 PM
To: John Readey <[hidden email]>, Chris Barker <[hidden email]>
Cc: netCDF Mail List <[hidden email]>, HDF Users Discussion List <[hidden email]>


Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

@John

 

>> 1.       Complete fidelity to all HDF5 features

>> 2.       Support graphs that are not acyclic.

 

ok, understood.

 

In my case I needed a simple schema for a particular set of files.

 

But why didn't you start with the official HDF5 DDL

 

 

and try to adapt to JSON?

 

Same thing for netCDF, there is already an official CDL, so any JSON spec should be "identical".

 

 

 

@Chris

 

{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}

 

>> * Do you need "rank"?

 

sometimes a bit of redundancy is useful, to make it visually clear

 

>> BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

 

yes

 

>>It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest 

>>1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project: 

 

 

ok, anyone interested send me an off list  email

 

 

-Pedro

 

 

 

----- Original Message -----

Sent: Tuesday, October 18, 2016 11:15 PM

Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

Hey,

 

The hdf5-json code is here: https://github.com/HDFGroup/hdf5-json and docs are here:  http://hdf5-json.readthedocs.io/en/latest/

 

The package is both a library of HFD5 <-> JSON conversion functions and some simple scripts for converting HDF5 to JSON and vice-versa.  E.g.

$ python h5tojson.py –D <hdf5-file>

outputs JSON minus the dataset data values.

 

While it may not be the most elegant JSON schema, it’s designed with the following goals in mind:

1.       Complete fidelity to all HDF5 features (i.e. the goal is that you should be able to take any HDF5 files, convert it to JSON, convert back to HDF5 and wind up with a file that is semantically equivalent to what you started with.

2.       Support graphs that are not acyclic.  I.e. a group structure like <root> links with A, and B.  And A and B links to C.  The output should only produce one representation of C.

Since NetCDF doesn’t use all these features, it’s certainly possible to come up with something simpler for just netCDF files.

 

Suggestions, feedback, and pull requests are welcome!

 

Cheers,

John

 

From: Chris Barker <[hidden email]>
Date: Friday, October 14, 2016 at 12:32 PM
To: Pedro Vicente <[hidden email]>
Cc: netCDF Mail List <[hidden email]>, Charlie Zender <[hidden email]>, John Readey <[hidden email]>, HDF Users Discussion List <[hidden email]>, David Pearah <[hidden email]>
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

 

Pedro,

 

When I first started reading this thread, I thought "there should be a spec for how to represent netcdf in JSON"

 

and then I read:

 

1) The specification to convert netCDF/HDF5 to "a" JSON format (note the "a" here)

 

Awesome -- that's exactly what we need -- as you say there is not one way to represent netcdf data in JSON, and probably far more than one "obvious" way.

 

Without looking at your spec yet, I do think it should probably look as much like CDL as possible -- we are all familiar with that.

 

(why Python? HDF5 developer tools should be all about writing in C/C++)

 

Because Python is an excellent language with which to "drive" C/C++ libraries like HDF5 and netcdf4. If I were to do this, I'd sure use Python. Even if you want to get to a C++ implementation eventually, you'd probably benefit from prototyping and working out the kinks with a Python version first.

 

But whoever is writing the code....

 

 

The specification is here

http://www.space-research.org/

 

Just took a quick look -- nice start. 

 

I've only used HDF through the netcdf4 spec, so there may be richness needed that I'm missing, but my first thought is to a make greater use of "objects" in JSON (key-value structures, hash tables, dicts in python), rather than array position for heterogeneous structures. For instance, you have:

 

 a dataset


{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}

 

I would perhaps do that as something like:

 

{

...

"dset1":{"object_type": "dataset",

         "dtype": "INT32"

         "rank": 2,

         "dimensions": [3,4],

         "data": [[1,2,3,4],

                  [5,6,7,8],

                  [9,10,11,12]]

         }

...

}

 

NOTES:

 

* I used nested arrays, rather than flattening the 2-d array -- this maps nicely to things like numpy arrays, for example -- not sure about the C++ world. (you can flatten and un-flatten numpy arrays easily, too, but this seems like a better mapping to the structure) And HDF is storing this all in chunks and who knows what -- so it's not a direct mapping to the memory layout anyway.

 

* Do you need "rank"? -- can't you check the length of the dimensions array?

 

* Do you  need "object_type" -- will it always be a dataset? Or you could have something like:

 

{

...

"datasets": {"dset1": {the actual dataset object},

             "dset2": {another dataset object},

 ....

 

Then you don't need object_type or a name

 

 

(BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

 

I would like to make this some kind of "official" netCDF/HDF5 JSON format for the community, so I encourage anyone to read the specification

 

If you see any flaw in the design or anything in the design that you would like to have change please let me know now

 

done :-)

 

It would be really great to have this become an "official" spec -- if you want to get it there, you're probably going to need to develop it more out in the open with a wider community. These lists are the way to get that started, but I suggest:

 

1) put it up somewhere that people can collaborate on it, make suggestions, capture the discussion, etc. gitHub is one really nice way to do that. See, for example the UGRID spec project:

 

 

(NOTE that that one got put on gitHub after there was a pretty complete draft spec, so there isn't THAT much discussion captured. But also note that that is too bad -- there is no good record of the decision process that led to the spec)

 

At the moment it only (intentionally) uses common generic features of both netCDF and HDF5, which are the numeric atomic types and strings.

 

Good plan.

 

-Chris

 

 

--


Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            <A href="tel:%28206%29%20526-6959" target=_blank value="+12065266959">(206) 526-6959   voice
7600 Sand Point Way NE   <A href="tel:%28206%29%20526-6329" target=_blank value="+12065266329">(206) 526-6329   fax
Seattle, WA  98115       <A href="tel:%28206%29%20526-6317" target=_blank value="+12065266317">(206) 526-6317   main reception

[hidden email]




--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       <A href="tel:%28206%29%20526-6317" target=_blank value="+12065266317">(206) 526-6317   main reception

[hidden email]



--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[hidden email]


_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


netcdfgroup mailing list
[hidden email]
For list information or to unsubscribe,  visit: http://www.unidata.ucar.edu/mailing_lists/

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Loading...