[hdf-forum] Scattered read and write

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Scattered read and write

Mezghani, Mokhles B
Good morning,

I'm working with a parallel FORTRAN program for unstructured grid fluid flow modeling. In this program each processor writes data from the contiguous buffer into irregularly scattered locations in the file. For that I used the subroutine: h5sselect_elements_f(space_id, operator, num_elements, coord, hdferr) to specify my writing pattern followed by the standard writing routine. Unfortunately this approach is really very slow. I have also the same problem for the scattered reading.

Could you please tell me if there is another way to read/write scattered data?

Regards,

Mokhles


________________________________
The contents of this email, including all related responses, files and attachments transmitted with it (collectively referred to as "this Email"), are intended solely for the use of the individual/entity to whom/which they are addressed, and may contain confidential and/or legally privileged information. This Email may not be disclosed or forwarded to anyone else without authorization from the originator of this Email. If you have received this Email in error, please notify the sender immediately and delete all copies from your system. Please note that the views or opinions presented in this Email are those of the author and may not necessarily represent those of Saudi Aramco. The recipient should check this Email and any attachments for the presence of any viruses. Saudi Aramco accepts no liability for any damage caused by any virus/error transmitted by this Email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20090519/b594764b/attachment.html>

Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Scattered read and write

Rob Latham
On Tue, May 19, 2009 at 04:04:29PM +0300, Mezghani, Mokhles B wrote:

> I'm working with a parallel FORTRAN program for unstructured grid
> fluid flow modeling. In this program each processor writes data from
> the contiguous buffer into irregularly scattered locations in the
> file. For that I used the subroutine: h5sselect_elements_f(space_id,
> operator, num_elements, coord, hdferr) to specify my writing pattern
> followed by the standard writing routine. Unfortunately this
> approach is really very slow. I have also the same problem for the
> scattered reading.
>
> Could you please tell me if there is another way to read/write
> scattered data?

Do you know where your program is spending the bulk of its time?  Is
it spending a lot of time in HDF5 processing the dataset, or is it
spending a lot of time writing to or reading from the file system?

I know answering that question is not entirely straightforward.  If
you had a library that would report time spent in HDF5 calls and time
spent in MPI-IO calls, that would tell you where you should spend your
tuning efforts.

If you can put together a small self-contained test program that
demonstrates this slow I/O performance, that would be pretty helpful.

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Scattered read and write

Mezghani, Mokhles B

Good morning Rob,

The bulk time is spent in the reading/writing phase. You will find in attachment a Fortran small program to write scattered data. You can use this program to see the problem. Please let me know if you need any additional information or examples.

Regards,

Mokhles

-----Original Message-----
From: Rob Latham [mailto:robl at mcs.anl.gov]
Sent: Tuesday, May 19, 2009 6:20 PM
To: Mezghani, Mokhles B
Cc: hdf-forum at hdfgroup.org
Subject: Re: [hdf-forum] Scattered read and write

On Tue, May 19, 2009 at 04:04:29PM +0300, Mezghani, Mokhles B wrote:

> I'm working with a parallel FORTRAN program for unstructured grid
> fluid flow modeling. In this program each processor writes data from
> the contiguous buffer into irregularly scattered locations in the
> file. For that I used the subroutine: h5sselect_elements_f(space_id,
> operator, num_elements, coord, hdferr) to specify my writing pattern
> followed by the standard writing routine. Unfortunately this
> approach is really very slow. I have also the same problem for the
> scattered reading.
>
> Could you please tell me if there is another way to read/write
> scattered data?

Do you know where your program is spending the bulk of its time?  Is
it spending a lot of time in HDF5 processing the dataset, or is it
spending a lot of time writing to or reading from the file system?

I know answering that question is not entirely straightforward.  If
you had a library that would report time spent in HDF5 calls and time
spent in MPI-IO calls, that would tell you where you should spend your
tuning efforts.

If you can put together a small self-contained test program that
demonstrates this slow I/O performance, that would be pretty helpful.

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

The contents of this email, including all related responses, files and attachments transmitted with it (collectively referred to as "this Email"), are intended solely for the use of the individual/entity to whom/which they are addressed, and may contain confidential and/or legally privileged information. This Email may not be disclosed or forwarded to anyone else without authorization from the originator of this Email. If you have received this Email in error, please notify the sender immediately and delete all copies from your system. Please note that the views or opinions presented in this Email are those of the author and may not necessarily represent those of Saudi Aramco. The recipient should check this Email and any attachments for the presence of any viruses. Saudi Aramco accepts no liability for any damage caused by any virus/error transmitted by this Email.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Main.f90
Type: application/octet-stream
Size: 4190 bytes
Desc: Main.f90
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20090520/89278066/attachment.obj>
-------------- next part --------------
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.

Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Scattered read and write

Rob Latham
On Wed, May 20, 2009 at 09:08:32AM +0300, Mezghani, Mokhles B wrote:
>
> Good morning Rob,
>
> The bulk time is spent in the reading/writing phase. You will find
> in attachment a Fortran small program to write scattered data. You
> can use this program to see the problem. Please let me know if you
> need any additional information or examples.

What I mean to determine is if the overhead is in the MPI-IO layer, or
in the HDF5 layer.  

Thank you very much for the testcase.  It's exactly what I hoped you'd
send.  I can confirm that this code you sent is slow. dirt slow.
Roughly 1 MB per 10 minutes -- I had to cut down the number of points
to 100k just so it would finish in a reasonable amount of time :>

I can see that for me, HDF5 is turning a collective h5dwrite_f into N
individual MPI_File_write_at calls.  I don't know anything about HDF5
internals, but you've described all the elements of the dataset you
want with h5sselect_elements_f.  I would have expected HDF5 to
construct a monster datatype, feed that into MPI_File_write_at_all ...
and then send me a bug report when that doesn't work :>

I'm testing with HDF5-1.8.0.  

HDF5 folks: is it possible I have an improperly-built HDF5?  What I
mean is would you expect h5sselect_elements_f to behave as I
described, making a single (or few) calls to MPI_File_write_at_all ?

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Scattered read and write

Quincey Koziol
Hi Rob,

On May 20, 2009, at 2:41 PM, Rob Latham wrote:

> On Wed, May 20, 2009 at 01:25:26PM -0500, Elena Pourmal wrote:
>> Currently HDF5 doesn't support collective calls for the point  
>> selection;
>> it quietly switches to use independent I/O.
>>
>> I hope Quincey or someone else who is involved in parallel work, will
>> elaborate.
>>
>> May be you can try to use hyperslaqb selections instead of point
>> selections even it is only one element in each hyperslab? And I would
>> definitely go with 1.8.3.
>
> Hi Elena.  I guess I'm in Quincey territory now.
>
> For this case where a program makes a single HDF5 call, I'd like to
> see HDF5 make as few MPI-IO calls as possible.  Even if you don't use
> collective I/O, you could still create an indexed or hindexed MPI
> datatype describing all the types in memory and then make a single
> MPI_FILE_WRITE call.
>
> It is highly likely there is something about the HDF5 file format I do
> not understand and which would preclude this approach.
>
> I just wanted to point out that you could see tremendous performance
> gains with this workload and not even have to go all the way down to
> full collective i/o.
>
> I know Quincey's next email will contain a phrase along the lines of
> "as funding sources permit us to work on this", so you won't hurt my
> feelings if you have to shelve this for a while :>

        Elena's right - we don't have a "fast" path in the library for  
performing I/O on point selections.  We felt that it would only be  
used for 'small' numbers of elements and that it wouldn't require an  
special support.  I'm actually very surprised that someone is using it  
for selection with many elements - Elena's suggestion of using a  
hyperslab selection for that case would almost certainly work better.

        As you say, if someone with funding found this important, we would be  
happy to optimize this case.  We'd also accept a well-written patch  
which improved the performance.

        Quincey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2502 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20090520/f8961693/attachment.bin>

Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Scattered read and write

Mezghani, Mokhles B
In reply to this post by Rob Latham

Good morning Rob,

First of all I would like to thank the HDF community for the help and support. As I said, I need really to find a solution to this problem. I already adopted parallel HDF5 as a file format to my application. And I'm surprised by the performance with the scattered read and write. I think that the overhead is in the HDF5 layer. In fact one of my colleague is using MPI2 for scattered read and write and the performance are really very good. If it could help, I can try to make a simple program using MPI2.

Please, any suggestion is really appreciated.

Regards,

Mokhles

________________________________________
From: Rob Latham [robl at mcs.anl.gov]
Sent: Wednesday, May 20, 2009 7:57 PM
To: Mezghani, Mokhles B
Cc: hdf-forum at hdfgroup.org
Subject: Re: [hdf-forum] Scattered read and write

On Wed, May 20, 2009 at 09:08:32AM +0300, Mezghani, Mokhles B wrote:
>
> Good morning Rob,
>
> The bulk time is spent in the reading/writing phase. You will find
> in attachment a Fortran small program to write scattered data. You
> can use this program to see the problem. Please let me know if you
> need any additional information or examples.

What I mean to determine is if the overhead is in the MPI-IO layer, or
in the HDF5 layer.

Thank you very much for the testcase.  It's exactly what I hoped you'd
send.  I can confirm that this code you sent is slow. dirt slow.
Roughly 1 MB per 10 minutes -- I had to cut down the number of points
to 100k just so it would finish in a reasonable amount of time :>

I can see that for me, HDF5 is turning a collective h5dwrite_f into N
individual MPI_File_write_at calls.  I don't know anything about HDF5
internals, but you've described all the elements of the dataset you
want with h5sselect_elements_f.  I would have expected HDF5 to
construct a monster datatype, feed that into MPI_File_write_at_all ...
and then send me a bug report when that doesn't work :>

I'm testing with HDF5-1.8.0.

HDF5 folks: is it possible I have an improperly-built HDF5?  What I
mean is would you expect h5sselect_elements_f to behave as I
described, making a single (or few) calls to MPI_File_write_at_all ?

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

The contents of this email, including all related responses, files and attachments transmitted with it (collectively referred to as "this Email"), are intended solely for the use of the individual/entity to whom/which they are addressed, and may contain confidential and/or legally privileged information. This Email may not be disclosed or forwarded to anyone else without authorization from the originator of this Email. If you have received this Email in error, please notify the sender immediately and delete all copies from your system. Please note that the views or opinions presented in this Email are those of the author and may not necessarily represent those of Saudi Aramco. The recipient should check this Email and any attachments for the presence of any viruses. Saudi Aramco accepts no liability for any damage caused by any virus/error transmitted by this Email.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Scattered read and write

Mezghani, Mokhles B
In reply to this post by Quincey Koziol

Hi Quincey,

The code that we are developing is for parallel flow modeling using unstructured grids. In this framework, before starting the simulation you need to partition you grid cells to the different processors. The grid partitioning is done using different criteria (minimize the mpi communication, calculation balancing, etc). Because of the unstructured framework, the partition could not be contiguous and you will need to use scattered write to output your simulation results. In our application our grid could be
made by thousand millions of cells and the scattered writing become really a big issue. To be honest I' surprised that nobody is using the scattered read and write with the parallel HDF5 library. As I said I will made additional test on Saturday using the 1.8.3 release.

Thanks

Mokhles
________________________________________
From: Quincey Koziol [koziol at hdfgroup.org]
Sent: Thursday, May 21, 2009 12:38 AM
To: Rob Latham
Cc: Mezghani, Mokhles B; hdf-forum forum
Subject: Re: [hdf-forum] Scattered read and write

Hi Rob,

On May 20, 2009, at 2:41 PM, Rob Latham wrote:

> On Wed, May 20, 2009 at 01:25:26PM -0500, Elena Pourmal wrote:
>> Currently HDF5 doesn't support collective calls for the point
>> selection;
>> it quietly switches to use independent I/O.
>>
>> I hope Quincey or someone else who is involved in parallel work, will
>> elaborate.
>>
>> May be you can try to use hyperslaqb selections instead of point
>> selections even it is only one element in each hyperslab? And I would
>> definitely go with 1.8.3.
>
> Hi Elena.  I guess I'm in Quincey territory now.
>
> For this case where a program makes a single HDF5 call, I'd like to
> see HDF5 make as few MPI-IO calls as possible.  Even if you don't use
> collective I/O, you could still create an indexed or hindexed MPI
> datatype describing all the types in memory and then make a single
> MPI_FILE_WRITE call.
>
> It is highly likely there is something about the HDF5 file format I do
> not understand and which would preclude this approach.
>
> I just wanted to point out that you could see tremendous performance
> gains with this workload and not even have to go all the way down to
> full collective i/o.
>
> I know Quincey's next email will contain a phrase along the lines of
> "as funding sources permit us to work on this", so you won't hurt my
> feelings if you have to shelve this for a while :>

        Elena's right - we don't have a "fast" path in the library for
performing I/O on point selections.  We felt that it would only be
used for 'small' numbers of elements and that it wouldn't require an
special support.  I'm actually very surprised that someone is using it
for selection with many elements - Elena's suggestion of using a
hyperslab selection for that case would almost certainly work better.

        As you say, if someone with funding found this important, we would be
happy to optimize this case.  We'd also accept a well-written patch
which improved the performance.

        Quincey

The contents of this email, including all related responses, files and attachments transmitted with it (collectively referred to as "this Email"), are intended solely for the use of the individual/entity to whom/which they are addressed, and may contain confidential and/or legally privileged information. This Email may not be disclosed or forwarded to anyone else without authorization from the originator of this Email. If you have received this Email in error, please notify the sender immediately and delete all copies from your system. Please note that the views or opinions presented in this Email are those of the author and may not necessarily represent those of Saudi Aramco. The recipient should check this Email and any attachments for the presence of any viruses. Saudi Aramco accepts no liability for any damage caused by any virus/error transmitted by this Email.



Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Scattered read and write

Ruth Aydt
Administrator
Hello Mokhles,

While it's possible that others are using HDF5 in the manner you  
describe (scattered read and write with parallel HDF5), none are  
providing us with the funding necessary to improve the performance for  
these scenarios.  The HDF Group tries address performance issues that  
are brought to our attention, but other things currently have higher-
priority in our self-funded work queue.

The HDF Group does offer custom development and performance tuning  
services, and I'd be happy to discuss rates with you (or others) if  
you find the current behavior is significantly hampering your progress.

-Ruth

------------------------------------------------------------
Ruth Aydt
Director of Sponsored Projects and Business Development
The HDF Group
1901 South First Street,  Suite C-2
Champaign, IL 61820

aydt at hdfgroup.org      (217)265-7837
------------------------------------------------------------


On May 21, 2009, at 4:15 AM, Mezghani, Mokhles B wrote:

>
> Hi Quincey,
>
> The code that we are developing is for parallel flow modeling using  
> unstructured grids. In this framework, before starting the  
> simulation you need to partition you grid cells to the different  
> processors. The grid partitioning is done using different criteria  
> (minimize the mpi communication, calculation balancing, etc).  
> Because of the unstructured framework, the partition could not be  
> contiguous and you will need to use scattered write to output your  
> simulation results. In our application our grid could be
> made by thousand millions of cells and the scattered writing become  
> really a big issue. To be honest I' surprised that nobody is using  
> the scattered read and write with the parallel HDF5 library. As I  
> said I will made additional test on Saturday using the 1.8.3 release.
>
> Thanks
>
> Mokhles
> ________________________________________
> From: Quincey Koziol [koziol at hdfgroup.org]
> Sent: Thursday, May 21, 2009 12:38 AM
> To: Rob Latham
> Cc: Mezghani, Mokhles B; hdf-forum forum
> Subject: Re: [hdf-forum] Scattered read and write
>
> Hi Rob,
>
> On May 20, 2009, at 2:41 PM, Rob Latham wrote:
>
>> On Wed, May 20, 2009 at 01:25:26PM -0500, Elena Pourmal wrote:
>>> Currently HDF5 doesn't support collective calls for the point
>>> selection;
>>> it quietly switches to use independent I/O.
>>>
>>> I hope Quincey or someone else who is involved in parallel work,  
>>> will
>>> elaborate.
>>>
>>> May be you can try to use hyperslaqb selections instead of point
>>> selections even it is only one element in each hyperslab? And I  
>>> would
>>> definitely go with 1.8.3.
>>
>> Hi Elena.  I guess I'm in Quincey territory now.
>>
>> For this case where a program makes a single HDF5 call, I'd like to
>> see HDF5 make as few MPI-IO calls as possible.  Even if you don't use
>> collective I/O, you could still create an indexed or hindexed MPI
>> datatype describing all the types in memory and then make a single
>> MPI_FILE_WRITE call.
>>
>> It is highly likely there is something about the HDF5 file format I  
>> do
>> not understand and which would preclude this approach.
>>
>> I just wanted to point out that you could see tremendous performance
>> gains with this workload and not even have to go all the way down to
>> full collective i/o.
>>
>> I know Quincey's next email will contain a phrase along the lines of
>> "as funding sources permit us to work on this", so you won't hurt my
>> feelings if you have to shelve this for a while :>
>
>        Elena's right - we don't have a "fast" path in the library for
> performing I/O on point selections.  We felt that it would only be
> used for 'small' numbers of elements and that it wouldn't require an
> special support.  I'm actually very surprised that someone is using it
> for selection with many elements - Elena's suggestion of using a
> hyperslab selection for that case would almost certainly work better.
>
>        As you say, if someone with funding found this important, we  
> would be
> happy to optimize this case.  We'd also accept a well-written patch
> which improved the performance.
>
>        Quincey
>
> The contents of this email, including all related responses, files  
> and attachments transmitted with it (collectively referred to as  
> "this Email"), are intended solely for the use of the individual/
> entity to whom/which they are addressed, and may contain  
> confidential and/or legally privileged information. This Email may  
> not be disclosed or forwarded to anyone else without authorization  
> from the originator of this Email. If you have received this Email  
> in error, please notify the sender immediately and delete all copies  
> from your system. Please note that the views or opinions presented  
> in this Email are those of the author and may not necessarily  
> represent those of Saudi Aramco. The recipient should check this  
> Email and any attachments for the presence of any viruses. Saudi  
> Aramco accepts no liability for any damage caused by any virus/error  
> transmitted by this Email.
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org
> .
> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>
>