Writing raw (byte array) data to dataset?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Writing raw (byte array) data to dataset?

jolindbe
Hi,

Is there a way to write raw binary byte array data to an existing dataset of a different type? E.g., if I have a byte array that represents an array of doubles (the byte array thus has 8 times as many elements as the double array, where each set of 8 bytes represents a double), can I somehow write that data to a double dataset in an HDF5 file? Trying this the naïve way with HDF.write just returns a -1 status.

The reason why I don't just convert it to a double array before writing is that I have an instrument which returns all its data in byte arrays, no matter the type, and then I'd have to write a converter for each of the 10 different types it output in.

Thank you,
Johan Lindberg

--
Dr. Johan E. Lindberg

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Writing raw (byte array) data to dataset?

Rafal Lichwala
Hi Johan,

The second argument to H5Dwrite function (referring to C API, not C++)
sets the type of single element which is expected in data buffer. If you
set this to H5T_NATIVE_DOUBLE everything should be fine and data should
be properly written since data buffer is just a pointer (void*).

https://support.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-Write

Usually -1 status means that you messed something with memory space and
file space (3rd and 4th argument of H5Dwrite) and/or dataset dimensions.
Please send some of your code examples (including how dataset is
created) for further investigation...

Regards,
Rafal


W dniu 2017-10-18 o 14:21, Johan Lindberg pisze:

> Hi,
>
> Is there a way to write raw binary byte array data to an existing
> dataset of a different type? E.g., if I have a byte array that
> represents an array of doubles (the byte array thus has 8 times as many
> elements as the double array, where each set of 8 bytes represents a
> double), can I somehow write that data to a double dataset in an HDF5
> file? Trying this the naïve way with HDF.write just returns a -1 status.
>
> The reason why I don't just convert it to a double array before writing
> is that I have an instrument which returns all its data in byte arrays,
> no matter the type, and then I'd have to write a converter for each of
> the 10 different types it output in.
>
> Thank you,
> Johan Lindberg
>
> --
> Dr. Johan E. Lindberg
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Writing raw (byte array) data to dataset?

jolindbe
In reply to this post by jolindbe
Hi Rafael,

Thank you for your reply!

I am using Visual Studio (C#) and HDF5 P/Invoke. While the syntax is a bit different from C/C++, the HDF5 functions should do the same.

I am indeed setting the type argument to the native double type. I paste my (simplified yet still quite lengthy) code below.

It is the status in the try block in the very end that returns -1. This works fine if I pass the double[] valuesOneDim to H5D.write instead of the byte[] byteData.

// Create a H5F file and H5G group (not included)
hid_t groupId = ...

// Create dataspace and dataset:
int nCols = 30;
int nRows = 5;
htri_t rank = 2;
ulong[] dims = new ulong[2] { 0, nCols };
ulong?[] maxDims = new ulong[2] { H5S.UNLIMITED, nCols };

string name = "TestDataset";

hid_t dataspaceId = H5S.create_simple(rank, dims, maxDims);
hid_t pList = H5P.create(H5P.DATASET_CREATE);
H5P.set_layout(pList, H5D.layout_t.CHUNKED);
H5P.set_chunk(pList, rank, new ulong[] { 1, (ulong)maxDims[1] });
hid_t datasetId = H5D.create(groupId, name, H5T.NATIVE_DOUBLE, dataspaceId, 
                H5P.DEFAULT, pList, H5P.DEFAULT);
H5P.close(pList);

// Generate a 2D (5x30) random double array and converting it to a 1D byte array.

Random random = new Random();
double[,] values = new double[nCols, nRows];
double[] valuesOneDim = new double[nCols * nRows];
int nBytes = 8;
byte[] byteData = new byte[nCols * nRows * nBytes];
for (int i = 0; i < nCols; i++)
{
    for (int j = 0; j < nRows; j++)
    {
        values[i, j] = random.NextDouble();
        valuesOneDim[i + nCols * j] = values[i, j];
        byte[] thisByteValue = BitConverter.GetBytes(values[i, j]);
        for (int k = 0; k < nBytes; k++)
        {
            byteData[k + nBytes * (i + nCols * j)] = thisByteValue[k];
        }
    }
}

// Write byte array to dataset

htri_t status = -1;

int arrayCols = (htri_t)(dims[1]);
int existingRows = (htri_t)(dims[0]);
int appendRows = data.GetLength(0) / Marshal.SizeOf(typeof(T)) / arrayCols;  // This number is 5, just like nRows.

htri_t nBytes = Marshal.SizeOf(typeof(T)) * arrayCols * appendRows; // = 5*30*8=1200

ulong[] appendDims = new ulong[] { (ulong)appendRows, (ulong)arrayCols }; // [5, 30]

hid_t memSpaceId = H5S.create_simple(2, appendDims, null);

ulong[] start = new ulong[2] { (ulong)existingRows, 0 };    // [0, 0]
ulong[] count = new ulong[2] { (ulong)appendRows, (ulong)arrayCols }; // [5, 30]

dataspaceId = H5D.get_space(datasetId);
H5S.select_hyperslab(dataspaceId, H5S.seloper_t.SET, start, null, count, null);

GCHandle handle = default(GCHandle);
try
{
    handle = GCHandle.Alloc(byteData, GCHandleType.Pinned);
    using (SafeArrayBuffer buffer = new SafeArrayBuffer(Marshal.AllocHGlobal(nBytes)))
    {
        Marshal.Copy(byteData, 0, buffer.DangerousGetHandle(), nBytes);
        status = H5D.write(datasetId, H5T.NATIVE_DOUBLE, memSpaceId,
            dataspaceId, H5P.DEFAULT, buffer.DangerousGetHandle());
    }
}
finally
{
    handle.Free();
}


// Close dataspaces, datasets, types, etc (not included).

...




Hi Johan,

The second argument to H5Dwrite function (referring to C API, not C++)
sets the type of single element which is expected in data buffer. If you
set this to H5T_NATIVE_DOUBLE everything should be fine and data should
be properly written since data buffer is just a pointer (void*).

https://support.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-Write

Usually -1 status means that you messed something with memory space and
file space (3rd and 4th argument of H5Dwrite) and/or dataset dimensions.
Please send some of your code examples (including how dataset is
created) for further investigation...

Regards,
Rafal


W dniu 2017-10-18 o?14:21, Johan Lindberg pisze:
> Hi,
>
> Is there a way to write raw binary byte array data to an existing
> dataset of a different type? E.g., if I have a byte array that
> represents an array of doubles (the byte array thus has 8 times as many
> elements as the double array, where each set of 8 bytes represents a
> double), can I somehow write that data to a double dataset in an HDF5
> file? Trying this the na?ve way with HDF.write just returns a -1 status.
>
> The reason why I don't just convert it to a double array before writing
> is that I have an instrument which returns all its data in byte arrays,
> no matter the type, and then I'd have to write a converter for each of
> the 10 different types it output in.
>
> Thank you,
> Johan Lindberg
>
> --
> Dr. Johan E. Lindberg
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>




------------------------------

Subject: Digest Footer

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org


------------------------------

End of Hdf-forum Digest, Vol 100, Issue 13
******************************************



--
Dr. Johan E. Lindberg
Mobile phone: +46 (0)76-209 14 13
e-mail: [hidden email]

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Writing raw (byte array) data to dataset?

Rafal Lichwala
Hi Johan,

Everything in your code seems to be fine, but since your initial dataset
X-dimension is "0" (no rows when creating dataset) you have to call:

H5D.set_extent(datasetId, count);

right after you defined a new space dimensions in "count".

See also my minimal working example in C++ (but using C API):

     int rank = 2;
     int nCols = 30;
     int nRows = 5;
     hsize_t tdims[rank] = {0, nCols};
     hsize_t maxdims[rank] = {H5S_UNLIMITED, nCols};
     hid_t space_2d = H5Screate_simple (rank, tdims, maxdims);
     hid_t plist = H5Pcreate(H5P_DATASET_CREATE);
     H5Pset_layout(plist, H5D_CHUNKED);
     hsize_t chunk[rank] = {1, nCols};
     H5Pset_chunk(plist, rank, chunk);

     hid_t datasetId = H5Dcreate(file_id, "TestDataset",
H5T_NATIVE_DOUBLE, space_2d, H5P_DEFAULT, plist, H5P_DEFAULT);

     hsize_t newdims[rank] = {nRows, nCols};
     H5Dset_extent(datasetId, newdims);
     hid_t newspace = H5Dget_space(datasetId);
     hsize_t offset[rank] = {0, 0};
     H5Sselect_hyperslab(newspace, H5S_SELECT_SET, offset, NULL,
newdims, NULL);

     hid_t tmemspace = H5Screate_simple(rank, newdims, NULL);
     double data[nCols * nRows];
     double step = 1.1;
     for(int r = 0; r < nRows; r++)
     {
         for(int c = 0; c < nCols; c++)
         {
             data[c + r * nCols] = step;
             step += 1.1;
         }
     }

     uint8_t *data_bytes = reinterpret_cast<uint8_t*>(data);

     H5Dwrite (datasetId, H5T_NATIVE_DOUBLE, tmemspace, newspace,
H5P_DEFAULT, data_bytes);

     H5Sclose(newspace);
     H5Sclose(tmemspace);
     H5Dclose(datasetId);
     H5Pclose(plist);
     H5Sclose(space_2d);


Best regards,
Rafal


W dniu 2017-10-19 o 09:33, Johan Lindberg pisze:

> Hi Rafael,
>
> Thank you for your reply!
>
> I am using Visual Studio (C#) and HDF5 P/Invoke. While the syntax is a
> bit different from C/C++, the HDF5 functions should do the same.
>
> I am indeed setting the type argument to the native double type. I paste
> my (simplified yet still quite lengthy) code below.
>
> It is the status in the try block in the very end that returns -1. This
> works fine if I pass the double[] valuesOneDim to H5D.write instead of
> the byte[] byteData.
>
> // Create a H5F file and H5G group (not included)
> hid_t groupId = ...
>
> // Create dataspace and dataset:
> int nCols = 30;
> int nRows = 5;
> htri_t rank = 2;
> ulong[] dims = new ulong[2] { 0, nCols };
> ulong?[] maxDims = new ulong[2] { H5S.UNLIMITED, nCols };
>
> string name = "TestDataset";
>
> hid_t dataspaceId = H5S.create_simple(rank, dims, maxDims);
> hid_t pList = H5P.create(H5P.DATASET_CREATE);
> H5P.set_layout(pList, H5D.layout_t.CHUNKED);
> H5P.set_chunk(pList, rank, new ulong[] { 1, (ulong)maxDims[1] });
> hid_t datasetId = H5D.create(groupId, name, H5T.NATIVE_DOUBLE, dataspaceId,
>                  H5P.DEFAULT, pList, H5P.DEFAULT);
> H5P.close(pList);
>
> // Generate a 2D (5x30) random double array and converting it to a 1D
> byte array.
>
> Random random = new Random();
> double[,] values = new double[nCols, nRows];
> double[] valuesOneDim = new double[nCols * nRows];
> int nBytes = 8;
> byte[] byteData = new byte[nCols * nRows * nBytes];
> for (int i = 0; i < nCols; i++)
> {
>      for (int j = 0; j < nRows; j++)
>      {
>          values[i, j] = random.NextDouble();
>          valuesOneDim[i + nCols * j] = values[i, j];
>          byte[] thisByteValue = BitConverter.GetBytes(values[i, j]);
>          for (int k = 0; k < nBytes; k++)
>          {
>              byteData[k + nBytes * (i + nCols * j)] = thisByteValue[k];
>          }
>      }
> }
>
> // Write byte array to dataset
>
> htri_t status = -1;
>
> int arrayCols = (htri_t)(dims[1]);
> int existingRows = (htri_t)(dims[0]);
> int appendRows = data.GetLength(0) / Marshal.SizeOf(typeof(T)) /
> arrayCols;  // This number is 5, just like nRows.
>
> htri_t nBytes = Marshal.SizeOf(typeof(T)) * arrayCols * appendRows; // =
> 5*30*8=1200
>
> ulong[] appendDims = new ulong[] { (ulong)appendRows, (ulong)arrayCols
> }; // [5, 30]
>
> hid_t memSpaceId = H5S.create_simple(2, appendDims, null);
>
> ulong[] start = new ulong[2] { (ulong)existingRows, 0 };    // [0, 0]
> ulong[] count = new ulong[2] { (ulong)appendRows, (ulong)arrayCols }; //
> [5, 30]
>
> dataspaceId = H5D.get_space(datasetId);
> H5S.select_hyperslab(dataspaceId, H5S.seloper_t.SET, start, null, count,
> null);
>
> GCHandle handle = default(GCHandle);
> try
> {
>      handle = GCHandle.Alloc(byteData, GCHandleType.Pinned);
>      using (SafeArrayBuffer buffer = new
> SafeArrayBuffer(Marshal.AllocHGlobal(nBytes)))
>      {
>          Marshal.Copy(byteData, 0, buffer.DangerousGetHandle(), nBytes);
>          status = H5D.write(datasetId, H5T.NATIVE_DOUBLE, memSpaceId,
>              dataspaceId, H5P.DEFAULT, buffer.DangerousGetHandle());
>      }
> }
> finally
> {
>      handle.Free();
> }
>
>
> // Close dataspaces, datasets, types, etc (not included).
>
> ...
>
>
>
>
>     Hi Johan,
>
>     The second argument to H5Dwrite function (referring to C API, not C++)
>     sets the type of single element which is expected in data buffer. If you
>     set this to H5T_NATIVE_DOUBLE everything should be fine and data should
>     be properly written since data buffer is just a pointer (void*).
>
>     https://support.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-Write
>     <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-Write>
>
>     Usually -1 status means that you messed something with memory space and
>     file space (3rd and 4th argument of H5Dwrite) and/or dataset dimensions.
>     Please send some of your code examples (including how dataset is
>     created) for further investigation...
>
>     Regards,
>     Rafal
>
>
>     W dniu 2017-10-18 o?14:21, Johan Lindberg pisze:
>      > Hi,
>      >
>      > Is there a way to write raw binary byte array data to an existing
>      > dataset of a different type? E.g., if I have a byte array that
>      > represents an array of doubles (the byte array thus has 8 times
>     as many
>      > elements as the double array, where each set of 8 bytes represents a
>      > double), can I somehow write that data to a double dataset in an HDF5
>      > file? Trying this the na?ve way with HDF.write just returns a -1
>     status.
>      >
>      > The reason why I don't just convert it to a double array before
>     writing
>      > is that I have an instrument which returns all its data in byte
>     arrays,
>      > no matter the type, and then I'd have to write a converter for
>     each of
>      > the 10 different types it output in.
>      >
>      > Thank you,
>      > Johan Lindberg
>      >
>      > --
>      > Dr. Johan E. Lindberg
>      >
>      >
>      > _______________________________________________
>      > Hdf-forum is for HDF software users discussion.
>      > [hidden email] <mailto:[hidden email]>
>      >
>     http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>     <http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org>
>      > Twitter: https://twitter.com/hdf5
>      >
>
>
>
>
>     ------------------------------
>
>     Subject: Digest Footer
>
>     _______________________________________________
>     Hdf-forum is for HDF software users discussion.
>     [hidden email] <mailto:[hidden email]>
>     http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>     <http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org>
>
>
>     ------------------------------
>
>     End of Hdf-forum Digest, Vol 100, Issue 13
>     ******************************************
>
>
>
>
> --
> Dr. Johan E. Lindberg
> Mobile phone: +46 (0)76-209 14 13
> e-mail: [hidden email] <mailto:[hidden email]>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Writing raw (byte array) data to dataset?

jolindbe
In reply to this post by jolindbe
Hi Rafal,
(Sorry for misspelling your name earlier!)

Thank you for finding this error. I had that line in the code for appending double data, but then when I was writing the code for appending byte arrays it somehow got lost. Now it works perfectly fine!

All the best,
Johan


 
Hi Johan,

Everything in your code seems to be fine, but since your initial dataset
X-dimension is "0" (no rows when creating dataset) you have to call:

H5D.set_extent(datasetId, count);

right after you defined a new space dimensions in "count".

See also my minimal working example in C++ (but using C API):

     int rank = 2;
     int nCols = 30;
     int nRows = 5;
     hsize_t tdims[rank] = {0, nCols};
     hsize_t maxdims[rank] = {H5S_UNLIMITED, nCols};
     hid_t space_2d = H5Screate_simple (rank, tdims, maxdims);
     hid_t plist = H5Pcreate(H5P_DATASET_CREATE);
     H5Pset_layout(plist, H5D_CHUNKED);
     hsize_t chunk[rank] = {1, nCols};
     H5Pset_chunk(plist, rank, chunk);

     hid_t datasetId = H5Dcreate(file_id, "TestDataset",
H5T_NATIVE_DOUBLE, space_2d, H5P_DEFAULT, plist, H5P_DEFAULT);

     hsize_t newdims[rank] = {nRows, nCols};
     H5Dset_extent(datasetId, newdims);
     hid_t newspace = H5Dget_space(datasetId);
     hsize_t offset[rank] = {0, 0};
     H5Sselect_hyperslab(newspace, H5S_SELECT_SET, offset, NULL,
newdims, NULL);

     hid_t tmemspace = H5Screate_simple(rank, newdims, NULL);
     double data[nCols * nRows];
     double step = 1.1;
     for(int r = 0; r < nRows; r++)
     {
         for(int c = 0; c < nCols; c++)
         {
             data[c + r * nCols] = step;
             step += 1.1;
         }
     }

     uint8_t *data_bytes = reinterpret_cast<uint8_t*>(data);

     H5Dwrite (datasetId, H5T_NATIVE_DOUBLE, tmemspace, newspace,
H5P_DEFAULT, data_bytes);

     H5Sclose(newspace);
     H5Sclose(tmemspace);
     H5Dclose(datasetId);
     H5Pclose(plist);
     H5Sclose(space_2d);


Best regards,
Rafal


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5