SWMR slow call to dset.id.refresh()

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

SWMR slow call to dset.id.refresh()

Aardeagle
Hi, 

I am experiencing some major performance issues in SWMR mode when a reader refreshes the metadata in a file. 

I’m testing I/O performance on our data systems using SWMR and h5py. I have a process writing out random data to a single dataset, which is then read by a single reader. The files are striped across a Lustre file system with three OSTs. 

In a nutshell, the reader process refreshes the metadata, loops through and reads the dataset until it runs out of data, then refreshes the metadata again, loops through the new data and so on. 

Refreshing the metadata with dset.id.refresh() lags both the reader and writer processes for several seconds. This gets worse the more data that is refreshed (usually several gb at a time). 

Please see the attached image for plots of the reading/writing rates to disk. The reading in this test case is slightly faster than the writing. Eventually the reader catches up to the writer and tries to call dset.id.refresh() on each loop iteration. At this point the I/O gets gridlocked and comes to a near standstill. 

Thanks, 
Eliseo



Thanks, 
Eliseo

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: SWMR slow call to dset.id.refresh()

Quincey Koziol-3
Hi Eliseo,
        This is very useful and interesting data, thanks for providing it.  When the I/O gets gridlocked at the end, are you in a tight loop polling the file, or do you wait between poll operations?  Are you able to share your programs?  In a couple of months, I’ll be working on some improvements to the performance and memory for getting data from a writer to a reader and it would be good to have test code like this to work with.

        Quincey


> On Oct 20, 2017, at 3:47 PM, Eliseo Gamboa <[hidden email]> wrote:
>
> Hi,
>
> I am experiencing some major performance issues in SWMR mode when a reader refreshes the metadata in a file.
>
> I’m testing I/O performance on our data systems using SWMR and h5py. I have a process writing out random data to a single dataset, which is then read by a single reader. The files are striped across a Lustre file system with three OSTs.
>
> In a nutshell, the reader process refreshes the metadata, loops through and reads the dataset until it runs out of data, then refreshes the metadata again, loops through the new data and so on.
>
> Refreshing the metadata with dset.id.refresh() lags both the reader and writer processes for several seconds. This gets worse the more data that is refreshed (usually several gb at a time).
>
> Please see the attached image for plots of the reading/writing rates to disk. The reading in this test case is slightly faster than the writing. Eventually the reader catches up to the writer and tries to call dset.id.refresh() on each loop iteration. At this point the I/O gets gridlocked and comes to a near standstill.
>
> Thanks,
> Eliseo
>
> <PastedGraphic-1.tiff>
>
>
> Thanks,
> Eliseo
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: SWMR slow call to dset.id.refresh()

Aardeagle
Hi Quincey,

Here is a minimal working example
https://github.com/slac-lcls/lcls2/tree/master/evtbuild/min_swmr_example

I’ve tested this on our flash-based filesystem. Note that the reader and writer processes have to be run on different machines. Otherwise the reader just grabs the data from the memory buffer.

When the I/O gets gridlocked, I wait 50 ms between refreshes.

Thanks,
Eliseo


> On Oct 23, 2017, at 7:13 AM, Quincey Koziol <[hidden email]> wrote:
>
> Hi Eliseo,
> This is very useful and interesting data, thanks for providing it.  When the I/O gets gridlocked at the end, are you in a tight loop polling the file, or do you wait between poll operations?  Are you able to share your programs?  In a couple of months, I’ll be working on some improvements to the performance and memory for getting data from a writer to a reader and it would be good to have test code like this to work with.
>
> Quincey
>
>
>> On Oct 20, 2017, at 3:47 PM, Eliseo Gamboa <[hidden email]> wrote:
>>
>> Hi,
>>
>> I am experiencing some major performance issues in SWMR mode when a reader refreshes the metadata in a file.
>>
>> I’m testing I/O performance on our data systems using SWMR and h5py. I have a process writing out random data to a single dataset, which is then read by a single reader. The files are striped across a Lustre file system with three OSTs.
>>
>> In a nutshell, the reader process refreshes the metadata, loops through and reads the dataset until it runs out of data, then refreshes the metadata again, loops through the new data and so on.
>>
>> Refreshing the metadata with dset.id.refresh() lags both the reader and writer processes for several seconds. This gets worse the more data that is refreshed (usually several gb at a time).
>>
>> Please see the attached image for plots of the reading/writing rates to disk. The reading in this test case is slightly faster than the writing. Eventually the reader catches up to the writer and tries to call dset.id.refresh() on each loop iteration. At this point the I/O gets gridlocked and comes to a near standstill.
>>
>> Thanks,
>> Eliseo
>>
>> <PastedGraphic-1.tiff>
>>
>>
>> Thanks,
>> Eliseo
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: SWMR slow call to dset.id.refresh()

Quincey Koziol-3
Hi Eliseo,
        Super, thanks!  I’ll pull it down and test with it, when I get closer to having the feature ready to work on.

                Quincey

> On Oct 23, 2017, at 12:51 PM, Eliseo Gamboa <[hidden email]> wrote:
>
> Hi Quincey,
>
> Here is a minimal working example
> https://github.com/slac-lcls/lcls2/tree/master/evtbuild/min_swmr_example
>
> I’ve tested this on our flash-based filesystem. Note that the reader and writer processes have to be run on different machines. Otherwise the reader just grabs the data from the memory buffer.
>
> When the I/O gets gridlocked, I wait 50 ms between refreshes.
>
> Thanks,
> Eliseo
>
>
>> On Oct 23, 2017, at 7:13 AM, Quincey Koziol <[hidden email]> wrote:
>>
>> Hi Eliseo,
>> This is very useful and interesting data, thanks for providing it.  When the I/O gets gridlocked at the end, are you in a tight loop polling the file, or do you wait between poll operations?  Are you able to share your programs?  In a couple of months, I’ll be working on some improvements to the performance and memory for getting data from a writer to a reader and it would be good to have test code like this to work with.
>>
>> Quincey
>>
>>
>>> On Oct 20, 2017, at 3:47 PM, Eliseo Gamboa <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> I am experiencing some major performance issues in SWMR mode when a reader refreshes the metadata in a file.
>>>
>>> I’m testing I/O performance on our data systems using SWMR and h5py. I have a process writing out random data to a single dataset, which is then read by a single reader. The files are striped across a Lustre file system with three OSTs.
>>>
>>> In a nutshell, the reader process refreshes the metadata, loops through and reads the dataset until it runs out of data, then refreshes the metadata again, loops through the new data and so on.
>>>
>>> Refreshing the metadata with dset.id.refresh() lags both the reader and writer processes for several seconds. This gets worse the more data that is refreshed (usually several gb at a time).
>>>
>>> Please see the attached image for plots of the reading/writing rates to disk. The reading in this test case is slightly faster than the writing. Eventually the reader catches up to the writer and tries to call dset.id.refresh() on each loop iteration. At this point the I/O gets gridlocked and comes to a near standstill.
>>>
>>> Thanks,
>>> Eliseo
>>>
>>> <PastedGraphic-1.tiff>
>>>
>>>
>>> Thanks,
>>> Eliseo
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [hidden email]
>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> Twitter: https://twitter.com/hdf5
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5