Quantcast

Output Array Slices

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Output Array Slices

Aaron Friesz
Hello,

I have a very specific usage question.

I would like to output a part of an array.  Other libraries may call this a slice.  By way of an example, say I have a 3 dimensional array with a size of 5 in each direction spread over several separate processes.  I would like to write to file all the data with y coordinate 3.

I currently use parallel file I/O by hyperslab, following the example here:
https://support.hdfgroup.org/HDF5/Tutor/phypecont.html

It seems to me that there might be a very easy way to do this using the memory and file datasets which doesn't require my code to check each process for participation in the output, etc, but I have failed to find examples or documentation that outline the constraints. Any help is appreciated.

--
Aaron Friesz


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Output Array Slices

Aaron Friesz
I take it this is not something that hdf5 supports within the API?

On Mon, Dec 12, 2016 at 10:18 AM, Aaron Friesz <[hidden email]> wrote:
Hello,

I have a very specific usage question.

I would like to output a part of an array.  Other libraries may call this a slice.  By way of an example, say I have a 3 dimensional array with a size of 5 in each direction spread over several separate processes.  I would like to write to file all the data with y coordinate 3.

I currently use parallel file I/O by hyperslab, following the example here:
https://support.hdfgroup.org/HDF5/Tutor/phypecont.html

It seems to me that there might be a very easy way to do this using the memory and file datasets which doesn't require my code to check each process for participation in the output, etc, but I have failed to find examples or documentation that outline the constraints. Any help is appreciated.

--
Aaron Friesz




--
Aaron Friesz
-----------------------------------------------------
University of Southern California
Department of Electrical Engineering
MicroPhotonic Devices Group
213 740 9208
[hidden email]

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Output Array Slices

Miller, Mark C.
"Hdf-forum on behalf of Aaron Friesz" wrote:

I take it this is not something that hdf5 supports within the API?


On Mon, Dec 12, 2016 at 10:18 AM, Aaron Friesz <[hidden email]> wrote:
Hello,

I have a very specific usage question.

I would like to output a part of an array.  Other libraries may call this a slice.  By way of an example, say I have a 3 dimensional array with a size of 5 in each direction spread over several separate processes.  I would like to write to file all the data with y coordinate 3.

I currently use parallel file I/O by hyperslab, following the example here:
https://support.hdfgroup.org/HDF5/Tutor/phypecont.html

It seems to me that there might be a very easy way to do this using the memory and file datasets which doesn't require my code to check each process for participation in the output, etc, but I have failed to find examples or documentation that outline the constraints. Any help is appreciated.

I am pretty sure you should be able to do what you describe (e.g. write to the file all the data with y coordinate 3 from several separate processors).

Once you have *created* the target dataset (and its dataspace) in the file (which *requiires* all processors that opened the file to participate in the creation -- there is no way around that), then each processor that *has*data* to write can define its own memory and file dataspaces to write data and then each can, using independent rather than collective I/O, write the data.

The trick is in defining the size/shape of the memory and file dataspaces. But, I am pretty confident it can be done. Alas, I don't have any examples to send. But, you could play with your code writing bogus data (e.g. 1's from proc 1, 2's from proc 2, etc.) and then examine the results with h5dump or h5ls to confirm you have code that is touching the correct parts of the array.

Mark


-- 
Mark C. Miller, LLNL

Happiness is a shear act of will, not simply a by product
of happenstance. - MCM




_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Output Array Slices

Aaron Friesz
Hi Mark, Thanks for your response.

In terms of efficiency, since I will likely do output of this type several hundred or thousand times in a single run, I will end up defining a new MPI communicator with just the required processes.  I was just hoping that HDF5 could do it for me since writing code to determine which processes should participate, dataset sizes, etc is going to be tedious.  And I'm sure it would be better code if the hdf5 team had written it. 

Regardless, I'll give it a crack and see if I can't come up with a decent way to get the job done.

On Thu, Jan 5, 2017 at 10:20 AM, Miller, Mark C. <[hidden email]> wrote:
"Hdf-forum on behalf of Aaron Friesz" wrote:

I take it this is not something that hdf5 supports within the API?


On Mon, Dec 12, 2016 at 10:18 AM, Aaron Friesz <[hidden email]> wrote:
Hello,

I have a very specific usage question.

I would like to output a part of an array.  Other libraries may call this a slice.  By way of an example, say I have a 3 dimensional array with a size of 5 in each direction spread over several separate processes.  I would like to write to file all the data with y coordinate 3.

I currently use parallel file I/O by hyperslab, following the example here:
https://support.hdfgroup.org/HDF5/Tutor/phypecont.html

It seems to me that there might be a very easy way to do this using the memory and file datasets which doesn't require my code to check each process for participation in the output, etc, but I have failed to find examples or documentation that outline the constraints. Any help is appreciated.

I am pretty sure you should be able to do what you describe (e.g. write to the file all the data with y coordinate 3 from several separate processors).

Once you have *created* the target dataset (and its dataspace) in the file (which *requiires* all processors that opened the file to participate in the creation -- there is no way around that), then each processor that *has*data* to write can define its own memory and file dataspaces to write data and then each can, using independent rather than collective I/O, write the data.

The trick is in defining the size/shape of the memory and file dataspaces. But, I am pretty confident it can be done. Alas, I don't have any examples to send. But, you could play with your code writing bogus data (e.g. 1's from proc 1, 2's from proc 2, etc.) and then examine the results with h5dump or h5ls to confirm you have code that is touching the correct parts of the array.

Mark


-- 
Mark C. Miller, LLNL

Happiness is a shear act of will, not simply a by product
of happenstance. - MCM




_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.hdfgroup.org_mailman_listinfo_hdf-2Dforum-5Flists.hdfgroup.org&d=DwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=Rx9txIqgEINHtVDIDfXdIw&m=gic9zbpOhrhnX5-j7FCTRslkXkEMypp7EZiFl4xA7JM&s=LoA4m__xHwFMJtBm0N-1it7sH90QZj4z55tGUU7kvo0&e=
Twitter: https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_hdf5&d=DwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=Rx9txIqgEINHtVDIDfXdIw&m=gic9zbpOhrhnX5-j7FCTRslkXkEMypp7EZiFl4xA7JM&s=Gm9pMuv42BIORM_XnQK8fS-HfZdnvb4pNBH4Tum9Ouw&e=


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Output Array Slices

Miller, Mark C.
"Hdf-forum on behalf of Aaron Friesz" wrote:

Hi Mark, Thanks for your response.

In terms of efficiency, since I will likely do output of this type several hundred or thousand times in a single run, I will end up defining a new MPI communicator with just the required processes.

Well, if you do that, *and* if you want to use only those processors to make HDF5 calls to create datasets *and* write data, then you will also need to *open* the HDF5 file with that new communicator.

I was just hoping that HDF5 could do it for me

Nope. Alas, HDF5 isn't going to help in creating MPI communicators for only those processors with data. 


since writing code to determine which processes should participate, dataset sizes, etc is going to be tedious.

int haveData = 0;
MPI_Comm newComm;
/* set haveData to 1 if this proc has data */
MPI_Comm_split(MPI_COMM_WORLD, haveData, haveData, &newComm);

Now, newComm is a communicator which involves *only* the processors with data ordered in rank according to their old ranks.

--
Miller, Mark C.

"Those who would [, even temporarily,] sacrafice
essential liberties in the name of security deserve
neither." BF*

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Loading...