Write cache?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Write cache?

Andrey Paramonov
Hello!

HDF5 implements data caching which improves read performance substantially:
https://support.hdfgroup.org/HDF5/doc/H5.user/Caching.html

However, it turns our like there is no write cache. Consider the
following Pascal procedure which fills a 4Kx4K matrix row-wise:

procedure Test;
var
   Dll: THDF5Dll;
   dims, start, count: array of hsize_t;
   n: hsize_t;
   f: hid_t;
   d: hid_t;
   mems, s: hid_t;
   cpl: hid_t;
   v: array of Double;
   i: Integer;
begin
   Dll := THDF5Dll.Create('hdf5.dll');
   f := Dll.H5Fopen('test.hdf5', H5F_ACC_RDWR or H5F_ACC_CREAT,
H5P_DEFAULT);

   SetLength(dims, 2);
   dims[0] := 4096;
   dims[1] := 4096;
   s := Dll.H5Screate_simple(2, Phsize_t(dims), nil);

   cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
   dims[0] := 1;
   dims[1] := 4096;
   Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));
   d := Dll.H5Dcreate2(f, 'matrix', Dll.H5T_INTEL_F64, s,
     H5P_DEFAULT, cpl, H5P_DEFAULT);

   Random(6031986);
   SetLength(v, 4096);
   for i := 0 to 4095 do
     v[i] := Random;
   n := 4096;
   mems := Dll.H5Screate_simple(1, @n, nil);

   dims[1] := 4096;
   SetLength(start, 2);
   start[1] := 0;
   SetLength(count, 2);
   count[0] := 1;
   count[1] := 4096;

   for i := 0 to 4095 do
   begin
     s := Dll.H5Dget_space(d);
     start[0] := i;
     Dll.H5Sselect_hyperslab(s, H5S_SELECT_SET,
       Phsize_t(start), nil, Phsize_t(count), nil);
     Dll.H5Dwrite(d, Dll.H5T_NATIVE_DOUBLE, mems, s, H5P_DEFAULT,
PDouble(v));
     Dll.H5Sclose(s);
   end;
   Dll.H5Fflush(d, H5F_SCOPE_LOCAL);
end;

It's a minimal example, so memleaks are possible; but anyway the program
finishes very quickly.
However, when I change

1)cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
   dims[0] := 1;
   dims[1] := 4096;
   Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));

to

2)cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
   dims[0] := 64;
   dims[1] := 64;
   Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));

I see the program slowing down by a factor of >30!

The detailed stats for the former and latter cases reveal the dramatic
difference:

1) 1x4K chunks
Clock time (sec): 0.171
CPU time (sec): 0.141
I/O read (MB): 0.000
I/O read ops: 0
I/O write (MB): 134.416
I/O write ops: 4171

2) 64x64 chunks
Clock time (sec): 6.046
CPU time (sec): 5.969
I/O read (MB): 8455.717
I/O read ops: 258048
I/O write (MB): 8590.132
I/O write ops: 262222

It appears that for 64x64 case the data is written/read multiple times
yielding a whooping 17G total I/O.
Is it possible that data caching is also implemented for write?

Best wishes,
Andrey Paramonov

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Write cache?

Quincey Koziol-3
Hi Andrey,
        I believe that the chunk cache size is set too small (by default, it’s 1MB) for your I/O pattern when you make the dataset chunk size less aligned with the I/O pattern (by making them square, but still reading by rows).  Can you try either increasing the chunk size (with H5Pset_chunk_cache - https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetChunkCache) or changing your access pattern to be more aligned with the chunk dimensions?

        Quincey

> On May 2, 2017, at 6:09 AM, Андрей Парамонов <[hidden email]> wrote:
>
> Hello!
>
> HDF5 implements data caching which improves read performance substantially:
> https://support.hdfgroup.org/HDF5/doc/H5.user/Caching.html
>
> However, it turns our like there is no write cache. Consider the following Pascal procedure which fills a 4Kx4K matrix row-wise:
>
> procedure Test;
> var
>  Dll: THDF5Dll;
>  dims, start, count: array of hsize_t;
>  n: hsize_t;
>  f: hid_t;
>  d: hid_t;
>  mems, s: hid_t;
>  cpl: hid_t;
>  v: array of Double;
>  i: Integer;
> begin
>  Dll := THDF5Dll.Create('hdf5.dll');
>  f := Dll.H5Fopen('test.hdf5', H5F_ACC_RDWR or H5F_ACC_CREAT, H5P_DEFAULT);
>
>  SetLength(dims, 2);
>  dims[0] := 4096;
>  dims[1] := 4096;
>  s := Dll.H5Screate_simple(2, Phsize_t(dims), nil);
>
>  cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
>  dims[0] := 1;
>  dims[1] := 4096;
>  Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));
>  d := Dll.H5Dcreate2(f, 'matrix', Dll.H5T_INTEL_F64, s,
>    H5P_DEFAULT, cpl, H5P_DEFAULT);
>
>  Random(6031986);
>  SetLength(v, 4096);
>  for i := 0 to 4095 do
>    v[i] := Random;
>  n := 4096;
>  mems := Dll.H5Screate_simple(1, @n, nil);
>
>  dims[1] := 4096;
>  SetLength(start, 2);
>  start[1] := 0;
>  SetLength(count, 2);
>  count[0] := 1;
>  count[1] := 4096;
>
>  for i := 0 to 4095 do
>  begin
>    s := Dll.H5Dget_space(d);
>    start[0] := i;
>    Dll.H5Sselect_hyperslab(s, H5S_SELECT_SET,
>      Phsize_t(start), nil, Phsize_t(count), nil);
>    Dll.H5Dwrite(d, Dll.H5T_NATIVE_DOUBLE, mems, s, H5P_DEFAULT, PDouble(v));
>    Dll.H5Sclose(s);
>  end;
>  Dll.H5Fflush(d, H5F_SCOPE_LOCAL);
> end;
>
> It's a minimal example, so memleaks are possible; but anyway the program finishes very quickly.
> However, when I change
>
> 1)cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
>  dims[0] := 1;
>  dims[1] := 4096;
>  Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));
>
> to
>
> 2)cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
>  dims[0] := 64;
>  dims[1] := 64;
>  Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));
>
> I see the program slowing down by a factor of >30!
>
> The detailed stats for the former and latter cases reveal the dramatic difference:
>
> 1) 1x4K chunks
> Clock time (sec): 0.171
> CPU time (sec): 0.141
> I/O read (MB): 0.000
> I/O read ops: 0
> I/O write (MB): 134.416
> I/O write ops: 4171
>
> 2) 64x64 chunks
> Clock time (sec): 6.046
> CPU time (sec): 5.969
> I/O read (MB): 8455.717
> I/O read ops: 258048
> I/O write (MB): 8590.132
> I/O write ops: 262222
>
> It appears that for 64x64 case the data is written/read multiple times yielding a whooping 17G total I/O.
> Is it possible that data caching is also implemented for write?
>
> Best wishes,
> Andrey Paramonov
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Write cache?

Andrey Paramonov
06.05.2017 8:43, Quincey Koziol пишет:
> Hi Andrey,
> I believe that the chunk cache size is set too small (by default, it’s 1MB) for your I/O pattern when you make the dataset chunk size less aligned with the I/O pattern (by making them square, but still reading by rows).  Can you try either increasing the chunk size (with H5Pset_chunk_cache - https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetChunkCache) or changing your access pattern to be more aligned with the chunk dimensions?

Hello Quincey!

It could of course be possible that I implemented outer layer of caching
to help HDF5 behave more efficiently, but I like when HDF5 just does it
right by itself ;-)

Fortunately, increasing cache size to 2MB (the total size of "volatile"
chunks) helps! Thank you for the pointer.

Upon a bit further investigation it shocked me that the behavior changes
dramatically when the cache size is even 1 byte short.

For cache size of 2097152 bytes, I get:
Clock time (sec): 0.533
CPU time (sec): 0.531
I/O read (MB): 0.000
I/O read ops: 0
I/O write (MB): 134.416
I/O write ops: 4171

While for cache size of 2097151 bytes:
Clock time (sec): 4.580
CPU time (sec): 4.578
I/O read (MB): 8455.717
I/O read ops: 258048
I/O write (MB): 8590.132
I/O write ops: 262219

I would expect more gradual change in speed when changing cache size
from 0 bytes to 2MB. Doesn't it look like a performance problem?

Best wishes,
Andrey Paramonov

>> On May 2, 2017, at 6:09 AM, Андрей Парамонов <[hidden email]> wrote:
>>
>> Hello!
>>
>> HDF5 implements data caching which improves read performance substantially:
>> https://support.hdfgroup.org/HDF5/doc/H5.user/Caching.html
>>
>> However, it turns our like there is no write cache. Consider the following Pascal procedure which fills a 4Kx4K matrix row-wise:
>>
>> procedure Test;
>> var
>>   Dll: THDF5Dll;
>>   dims, start, count: array of hsize_t;
>>   n: hsize_t;
>>   f: hid_t;
>>   d: hid_t;
>>   mems, s: hid_t;
>>   cpl: hid_t;
>>   v: array of Double;
>>   i: Integer;
>> begin
>>   Dll := THDF5Dll.Create('hdf5.dll');
>>   f := Dll.H5Fopen('test.hdf5', H5F_ACC_RDWR or H5F_ACC_CREAT, H5P_DEFAULT);
>>
>>   SetLength(dims, 2);
>>   dims[0] := 4096;
>>   dims[1] := 4096;
>>   s := Dll.H5Screate_simple(2, Phsize_t(dims), nil);
>>
>>   cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
>>   dims[0] := 1;
>>   dims[1] := 4096;
>>   Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));
>>   d := Dll.H5Dcreate2(f, 'matrix', Dll.H5T_INTEL_F64, s,
>>     H5P_DEFAULT, cpl, H5P_DEFAULT);
>>
>>   Random(6031986);
>>   SetLength(v, 4096);
>>   for i := 0 to 4095 do
>>     v[i] := Random;
>>   n := 4096;
>>   mems := Dll.H5Screate_simple(1, @n, nil);
>>
>>   dims[1] := 4096;
>>   SetLength(start, 2);
>>   start[1] := 0;
>>   SetLength(count, 2);
>>   count[0] := 1;
>>   count[1] := 4096;
>>
>>   for i := 0 to 4095 do
>>   begin
>>     s := Dll.H5Dget_space(d);
>>     start[0] := i;
>>     Dll.H5Sselect_hyperslab(s, H5S_SELECT_SET,
>>       Phsize_t(start), nil, Phsize_t(count), nil);
>>     Dll.H5Dwrite(d, Dll.H5T_NATIVE_DOUBLE, mems, s, H5P_DEFAULT, PDouble(v));
>>     Dll.H5Sclose(s);
>>   end;
>>   Dll.H5Fflush(d, H5F_SCOPE_LOCAL);
>> end;
>>
>> It's a minimal example, so memleaks are possible; but anyway the program finishes very quickly.
>> However, when I change
>>
>> 1)cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
>>   dims[0] := 1;
>>   dims[1] := 4096;
>>   Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));
>>
>> to
>>
>> 2)cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
>>   dims[0] := 64;
>>   dims[1] := 64;
>>   Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));
>>
>> I see the program slowing down by a factor of >30!
>>
>> The detailed stats for the former and latter cases reveal the dramatic difference:
>>
>> 1) 1x4K chunks
>> Clock time (sec): 0.171
>> CPU time (sec): 0.141
>> I/O read (MB): 0.000
>> I/O read ops: 0
>> I/O write (MB): 134.416
>> I/O write ops: 4171
>>
>> 2) 64x64 chunks
>> Clock time (sec): 6.046
>> CPU time (sec): 5.969
>> I/O read (MB): 8455.717
>> I/O read ops: 258048
>> I/O write (MB): 8590.132
>> I/O write ops: 262222
>>
>> It appears that for 64x64 case the data is written/read multiple times yielding a whooping 17G total I/O.
>> Is it possible that data caching is also implemented for write?
>>
>> Best wishes,
>> Andrey Paramonov
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5