questions about how VLEN/string is stored in a HDF5 file

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

questions about how VLEN/string is stored in a HDF5 file

Ed Hartnett
Howdy all!

Here in sunny Boulder, Colorado, I was biking with the netCDF dev team to our favorite free-trade organic vegan gluten-free coffee shop, when we started to wonder how strings are stored in a HDF5 file.

We know that strings are stored as VLENS - does that mean that all strings are stored with their length explicitly?

If we create a new dataset of VLEN, and write one value to the dataset, leaving the rest untouched, we will get fill values in the rest of the chunk. When those fill values are written, they consist of a bunch of VLENs which contain empty strings. What is actually stored on disk?

We recently noticed we can't turn off fill mode with VLENs. Is this why?

Thanks!
Ed Hartnett

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: questions about how VLEN/string is stored in a HDF5 file

Elena Pourmal
Hi Ed,

Happy New Year!

Since no one answered, I will try. Hope the HDF5 developers will correct me if I am wrong.

> On Dec 21, 2017, at 4:39 PM, Ed Hartnett <[hidden email]> wrote:
>
> Howdy all!
>
> Here in sunny Boulder, Colorado, I was biking with the netCDF dev team to our favorite free-trade organic vegan gluten-free coffee shop, when we started to wonder how strings are stored in a HDF5 file.
Wow! It looks like Rocky Mountains views cannot complete with HDF5 :-)

>
> We know that strings are stored as VLENS - does that mean that all strings are stored with their length explicitly?

HDF5 has two types of strings: fixed length and variable-length. Length of the fixed type string is part of the datatype description, while length of each VL string is part of the "raw data description" that is stored in the datasets elements (heap ID + length of the type).
>
> If we create a new dataset of VLEN, and write one value to the dataset, leaving the rest untouched, we will get fill values in the rest of the chunk. When those fill values are written, they consist of a bunch of VLENs which contain empty strings. What is actually stored on disk?

Data elements of the datasets with VL fill values will contain heap IDs that point to fill value (empty string).
>
> We recently noticed we can't turn off fill mode with VLENs. Is this why?

Could you please send us am example? It should work!

Thank you!

Elena
>
> Thanks!
> Ed Hartnett
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5