Quantcast

Handling incomplete columns for integer datatypes

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Handling incomplete columns for integer datatypes

Daniel Rimmelspacher

Dear Forum,


I encounter some conceptual problems, when trying to store incomplete columns for integer datatypes.

 

My issue refers to the problem expalined here:


http://stackoverflow.com/questions/33656043/hdf5-how-to-handle-empty-rows


Basically the author wants to store this table:


| time | x1  | y1   | x2 | y2 |

| 0       | 2.0 | 1.0 | 2.0 | 3.0 |

| 1       | 2.1 | 1.0 | 2.3 | 3.1 |

| 2       | 2.4 | 1.4 |        |       |

| 3       | 2.2 | 1.5 | 2.4 | 3.1 |

| 4       |       |       | 2.3  | 3.2 |


I tcontains incomplete columns of floating datatypes and can be solved by filling in NaNs.


For other datatypes, e.g. integer, there is no such NaN available. Is there some kind of textbook approach that decribes how to handle this problem?


Thanks,

Daniel


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Handling incomplete columns for integer datatypes

Gerd Heber
Daniel, unless "magic values" are accompanied by unambiguous metadata the
NaN/fill-value approach is risky and limits portability.
A better approach might be to maintain a "shadowing" dataset of a suitable
bitfield type whose values would indicate the validity of a dataset element
or fields in a compound, etc. By using compression on the shadowing dataset
the storage overhead should be negligible. Of course, if the shadowed dataset
ever gets updated and elements or fields change their status between
'valid' and 'N/A', both datasets must be updated and kept in sync.

Depending on how elaborate you want this to be, you could decorate the shadowed dataset
with a "MASK" attribute whose value is an object reference to the shadowing dataset.
Alternatively, if the shadowed dataset is linked to exactly one group and there is no potential for
name conflicts, you could have a convention that lets you derive the name of the shadowing dataset
from the link name of the shadowed dataset.

Best, G.
________________________________________
From: Hdf-forum <[hidden email]> on behalf of Daniel Rimmelspacher <[hidden email]>
Sent: Thursday, January 12, 2017 6:15:32 AM
To: HDF Users Discussion List
Subject: [Hdf-forum] Handling incomplete columns for integer datatypes

Dear Forum,


I encounter some conceptual problems, when trying to store incomplete columns for integer datatypes.



My issue refers to the problem expalined here:


http://stackoverflow.com/questions/33656043/hdf5-how-to-handle-empty-rows


Basically the author wants to store this table:


| time | x1  | y1   | x2 | y2 |

| 0       | 2.0 | 1.0 | 2.0 | 3.0 |

| 1       | 2.1 | 1.0 | 2.3 | 3.1 |

| 2       | 2.4 | 1.4 |        |       |

| 3       | 2.2 | 1.5 | 2.4 | 3.1 |

| 4       |       |       | 2.3  | 3.2 |


I tcontains incomplete columns of floating datatypes and can be solved by filling in NaNs.


For other datatypes, e.g. integer, there is no such NaN available. Is there some kind of textbook approach that decribes how to handle this problem?


Thanks,

Daniel


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Loading...