Best way to repackage a dataset? (C program)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Best way to repackage a dataset? (C program)

Landon Clipp
Hello everyone,

I am working with an HDF5 file that has a 5D dataset. What I'm wanting to do is to create a C program that reads this dataset into memory and then outputs it into a newly created file with only that dataset in it (perhaps at the root directory of the file tree). What I don't understand is how to read this entire 5D array using H5Dread into a 5D buffer that has been previously allocated on the heap (note I cannot use an array allocated on the stack, it would be too large and would create seg faults).

What is the general process I need to employ to do such a thing, and is there maybe a more elegant solution to this than reading the entire dataset into memory? This process seems easy to me for a 1 or 2D array but I am lost with larger dimension arrays. Thanks.

Regards,
Landon

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Best way to repackage a dataset? (C program)

Nelson, Jarom

You might look at h5copy as a reference, or just use that tool to do the work for you.

 

Jarom

 

From: Hdf-forum [mailto:[hidden email]] On Behalf Of Landon Clipp
Sent: Friday, September 02, 2016 11:56 AM
To: [hidden email]
Subject: [Hdf-forum] Best way to repackage a dataset? (C program)

 

Hello everyone,

 

I am working with an HDF5 file that has a 5D dataset. What I'm wanting to do is to create a C program that reads this dataset into memory and then outputs it into a newly created file with only that dataset in it (perhaps at the root directory of the file tree). What I don't understand is how to read this entire 5D array using H5Dread into a 5D buffer that has been previously allocated on the heap (note I cannot use an array allocated on the stack, it would be too large and would create seg faults).

 

What is the general process I need to employ to do such a thing, and is there maybe a more elegant solution to this than reading the entire dataset into memory? This process seems easy to me for a 1 or 2D array but I am lost with larger dimension arrays. Thanks.

 

Regards,

Landon


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Best way to repackage a dataset? (C program)

Michael Jackson
You can take a look at the following source files. They are mean for C++
and templates but assuming you know a bit of C++ you can convert them
back to pure "C" without any issues. The template parameter is on the
POD type.

https://github.com/BlueQuartzSoftware/SIMPL/tree/develop/Source/H5Support

Take a look at H5Lite.h and H5Lite.cpp. There are functions in there to
"readPointerDataset()", writePointerDataSet() and getDatasetInfo().

The basic flow would be the following (using some pure "C").

// Open the file and get the "Location ID"
hid_t fileId = ...

char* datasetName = ....
//Since you know it is a 5D array:
hsize_t_t dims[5];
H5T_class_t  classType;
size_t type_size;
H5Lite::getDatasetInfo(fileId, datasetName, dims, classType, typesize);

// Now loop over all the dim[] values to compute the total number
// of elements that need to allocate, lets assume they are 32 bit
// signed ints
size_t totalElements = dims[0] * dims[1] * dims[2] * dims[3] * dims[4];
// Allocate the data
signed int* dataPtr = malloc(totalElements * sizeof(signed int));

herr_t err = H5Lite::readPointerDataset(fileId, datasetName, dataPtr);
// Check error
if (err < 0) { ..... }

// Open New file for writing
hid_t outFileId = ...
signed int rank = 5;
err = H5Lite::writePointerDataset(outFileid, datasetName, rank, dims,
dataPtr);
// Check error
if (err < 0) { ..... }

This assumes that you take the code from GitHub and convert the
necessary functions into pure "C" which should be straight forward to do.

The code referenced above is BSD licensed.

--
Michael A. Jackson
BlueQuartz Software, LLC
[e]: [hidden email]


Nelson, Jarom wrote:

> You might look at h5copy as a reference, or just use that tool to do the
> work for you.
>
> Jarom
>
> *From:*Hdf-forum [mailto:[hidden email]] *On
> Behalf Of *Landon Clipp
> *Sent:* Friday, September 02, 2016 11:56 AM
> *To:* [hidden email]
> *Subject:* [Hdf-forum] Best way to repackage a dataset? (C program)
>
> Hello everyone,
>
> I am working with an HDF5 file that has a 5D dataset. What I'm wanting
> to do is to create a C program that reads this dataset into memory and
> then outputs it into a newly created file with only that dataset in it
> (perhaps at the root directory of the file tree). What I don't
> understand is how to read this entire 5D array using H5Dread into a 5D
> buffer that has been previously allocated on the heap (note I cannot use
> an array allocated on the stack, it would be too large and would create
> seg faults).
>
> What is the general process I need to employ to do such a thing, and is
> there maybe a more elegant solution to this than reading the entire
> dataset into memory? This process seems easy to me for a 1 or 2D array
> but I am lost with larger dimension arrays. Thanks.
>
> Regards,
>
> Landon
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Best way to repackage a dataset? (C program)

Landon Clipp
Hello,

Thank you everyone for your help. I figured out the problem, I was just misunderstanding how the functions worked. I was able to successfully read the dataset into a buffer. I did not realize that a 1D array was sufficient, I was for some reason thinking that it had to be a contiguous multidimensional array but it turns out that the functions know how to read the arrays if you give it the rank and the size of each dimension. 

Turns out I have another problem however. I am trying to now write this buffer into a new file. The error happens when I try to create a new dataset. When I ran my code, I got errors such as: "H5D.c line 194 in H5Dcreate2(): unable to create dataset." I looked online and it turns out that there is a size limit to the buffer and mine most certainly exceeds that. So the solution is to create a dataset creation property list and set it to chunk. Even after I have set a reasonable chunk size, I still get the same errors. I will attach my code and the errors I am receiving. Relevant code starts at line 122. Thank you SO MUCH for your help, I'm still trying to learn all of this.

Landon

On Sat, Sep 3, 2016 at 11:45 AM, Michael Jackson <[hidden email]> wrote:
You can take a look at the following source files. They are mean for C++ and templates but assuming you know a bit of C++ you can convert them back to pure "C" without any issues. The template parameter is on the POD type.

https://github.com/BlueQuartzSoftware/SIMPL/tree/develop/Source/H5Support

Take a look at H5Lite.h and H5Lite.cpp. There are functions in there to "readPointerDataset()", writePointerDataSet() and getDatasetInfo().

The basic flow would be the following (using some pure "C").

// Open the file and get the "Location ID"
hid_t fileId = ...

char* datasetName = ....
//Since you know it is a 5D array:
hsize_t_t dims[5];
H5T_class_t  classType;
size_t type_size;
H5Lite::getDatasetInfo(fileId, datasetName, dims, classType, typesize);

// Now loop over all the dim[] values to compute the total number
// of elements that need to allocate, lets assume they are 32 bit
// signed ints
size_t totalElements = dims[0] * dims[1] * dims[2] * dims[3] * dims[4];
// Allocate the data
signed int* dataPtr = malloc(totalElements * sizeof(signed int));

herr_t err = H5Lite::readPointerDataset(fileId, datasetName, dataPtr);
// Check error
if (err < 0) { ..... }

// Open New file for writing
hid_t outFileId = ...
signed int rank = 5;
err = H5Lite::writePointerDataset(outFileid, datasetName, rank, dims, dataPtr);
// Check error
if (err < 0) { ..... }

This assumes that you take the code from GitHub and convert the necessary functions into pure "C" which should be straight forward to do.

The code referenced above is BSD licensed.

--
Michael A. Jackson
BlueQuartz Software, LLC
[e]: [hidden email]


Nelson, Jarom wrote:
You might look at h5copy as a reference, or just use that tool to do the
work for you.

Jarom

*From:*Hdf-forum [mailto:[hidden email]] *On
Behalf Of *Landon Clipp
*Sent:* Friday, September 02, 2016 11:56 AM
*To:* [hidden email]
*Subject:* [Hdf-forum] Best way to repackage a dataset? (C program)

Hello everyone,

I am working with an HDF5 file that has a 5D dataset. What I'm wanting
to do is to create a C program that reads this dataset into memory and
then outputs it into a newly created file with only that dataset in it
(perhaps at the root directory of the file tree). What I don't
understand is how to read this entire 5D array using H5Dread into a 5D
buffer that has been previously allocated on the heap (note I cannot use
an array allocated on the stack, it would be too large and would create
seg faults).

What is the general process I need to employ to do such a thing, and is
there maybe a more elegant solution to this than reading the entire
dataset into memory? This process seems easy to me for a 1 or 2D array
but I am lost with larger dimension arrays. Thanks.

Regards,

Landon

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

error.txt (1K) Download Attachment
testcode.c (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Best way to repackage a dataset? (C program)

Martijn Jasperse

Hi,
Maybe I misunderstood the requirements, but if you want to just copy a dataset to another file, why not just use H5Ocopy? It allows you to use a different file as the destination. Could be a lot faster and simpler than loading the data into memory.

https://www.hdfgroup.org/HDF5/doc/RM/RM_H5O.html#Object-Copy

Cheers,
Martijn


On 4 Sep 2016 14:50, "Landon Clipp" <[hidden email]> wrote:
Hello,

Thank you everyone for your help. I figured out the problem, I was just misunderstanding how the functions worked. I was able to successfully read the dataset into a buffer. I did not realize that a 1D array was sufficient, I was for some reason thinking that it had to be a contiguous multidimensional array but it turns out that the functions know how to read the arrays if you give it the rank and the size of each dimension. 

Turns out I have another problem however. I am trying to now write this buffer into a new file. The error happens when I try to create a new dataset. When I ran my code, I got errors such as: "H5D.c line 194 in H5Dcreate2(): unable to create dataset." I looked online and it turns out that there is a size limit to the buffer and mine most certainly exceeds that. So the solution is to create a dataset creation property list and set it to chunk. Even after I have set a reasonable chunk size, I still get the same errors. I will attach my code and the errors I am receiving. Relevant code starts at line 122. Thank you SO MUCH for your help, I'm still trying to learn all of this.

Landon

On Sat, Sep 3, 2016 at 11:45 AM, Michael Jackson <[hidden email]> wrote:
You can take a look at the following source files. They are mean for C++ and templates but assuming you know a bit of C++ you can convert them back to pure "C" without any issues. The template parameter is on the POD type.

https://github.com/BlueQuartzSoftware/SIMPL/tree/develop/Source/H5Support

Take a look at H5Lite.h and H5Lite.cpp. There are functions in there to "readPointerDataset()", writePointerDataSet() and getDatasetInfo().

The basic flow would be the following (using some pure "C").

// Open the file and get the "Location ID"
hid_t fileId = ...

char* datasetName = ....
//Since you know it is a 5D array:
hsize_t_t dims[5];
H5T_class_t  classType;
size_t type_size;
H5Lite::getDatasetInfo(fileId, datasetName, dims, classType, typesize);

// Now loop over all the dim[] values to compute the total number
// of elements that need to allocate, lets assume they are 32 bit
// signed ints
size_t totalElements = dims[0] * dims[1] * dims[2] * dims[3] * dims[4];
// Allocate the data
signed int* dataPtr = malloc(totalElements * sizeof(signed int));

herr_t err = H5Lite::readPointerDataset(fileId, datasetName, dataPtr);
// Check error
if (err < 0) { ..... }

// Open New file for writing
hid_t outFileId = ...
signed int rank = 5;
err = H5Lite::writePointerDataset(outFileid, datasetName, rank, dims, dataPtr);
// Check error
if (err < 0) { ..... }

This assumes that you take the code from GitHub and convert the necessary functions into pure "C" which should be straight forward to do.

The code referenced above is BSD licensed.

--
Michael A. Jackson
BlueQuartz Software, LLC
[e]: [hidden email]


Nelson, Jarom wrote:
You might look at h5copy as a reference, or just use that tool to do the
work for you.

Jarom

*From:*Hdf-forum [mailto:[hidden email]] *On
Behalf Of *Landon Clipp
*Sent:* Friday, September 02, 2016 11:56 AM
*To:* [hidden email]
*Subject:* [Hdf-forum] Best way to repackage a dataset? (C program)

Hello everyone,

I am working with an HDF5 file that has a 5D dataset. What I'm wanting
to do is to create a C program that reads this dataset into memory and
then outputs it into a newly created file with only that dataset in it
(perhaps at the root directory of the file tree). What I don't
understand is how to read this entire 5D array using H5Dread into a 5D
buffer that has been previously allocated on the heap (note I cannot use
an array allocated on the stack, it would be too large and would create
seg faults).

What is the general process I need to employ to do such a thing, and is
there maybe a more elegant solution to this than reading the entire
dataset into memory? This process seems easy to me for a 1 or 2D array
but I am lost with larger dimension arrays. Thanks.

Regards,

Landon

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Loading...