puzzled by chunking, storage, and performance.

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

puzzled by chunking, storage, and performance.

John Knutson-2
I'm still trying to milk as much as I can, performance-wise, out of HDF5...

My latest bit of confusion comes from the following seeming paradox.  I
have two files, poo2.h5 and poo3-d-1.h5.  Both files contain exactly the
same data, though poo2, being the larger data set, has more blank-filled
elements.  Also, because of the larger size of the data set in poo2, the
data starts at (0, 0, 694080) or thereabouts, vs. (0,0,0) for poo3-d-1.  
My question is: "why is the smaller data set 10x larger in size (bytes?)
than the larger data set with the same data and chunking?"

Is there any way to look at the details of what data is stored in the
file, i.e. how many and maybe which chunks are stored, etc.?


HDF5 "poo2.h5" {
DATASET "/Data/IS-GPS-200 ID 2 Ephemerides" {
   DATATYPE  "/Types/Ephemeris IS-GPS-200 id 2"
   DATASPACE  SIMPLE { ( 26, 160, 1051200 ) / ( H5S_UNLIMITED,
H5S_UNLIMITED, 1051200 ) }
   STORAGE_LAYOUT {
      CHUNKED ( 1, 15, 250 )
      SIZE 23228 (52713869.468:1 COMPRESSION)
    }

vs.

HDF5 "poo3-d-1.h5" {
DATASET "/Data/IS-GPS-200 ID 2 Ephemerides" {
   DATATYPE  "/Types/Ephemeris IS-GPS-200 id 2"
   DATASPACE  SIMPLE { ( 1, 160, 2880 ) / ( H5S_UNLIMITED,
H5S_UNLIMITED, 2880 ) }
   STORAGE_LAYOUT {
      CHUNKED ( 1, 15, 250 )
      SIZE 251461 (513.097:1 COMPRESSION)
    }


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: puzzled by chunking, storage, and performance.

Francesc Alted-2
A Wednesday 22 September 2010 21:36:57 John Knutson escrigué:

> I'm still trying to milk as much as I can, performance-wise, out of
> HDF5...
>
> My latest bit of confusion comes from the following seeming paradox.
> I have two files, poo2.h5 and poo3-d-1.h5.  Both files contain
> exactly the same data, though poo2, being the larger data set, has
> more blank-filled elements.  Also, because of the larger size of the
> data set in poo2, the data starts at (0, 0, 694080) or thereabouts,
> vs. (0,0,0) for poo3-d-1. My question is: "why is the smaller data
> set 10x larger in size (bytes?) than the larger data set with the
> same data and chunking?"
[clip]

May be compression has something to do?  poo2 has a 5e7 compression
ratio, while poo3-d-1 has 5e2.  While I can understand the latter
figure, the former compression ratio (5e7) is a bit too high?  Maybe
poo2 is only made of zeros?

--
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: puzzled by chunking, storage, and performance.

Quincey Koziol
In reply to this post by John Knutson-2
Hi John,

On Sep 22, 2010, at 2:36 PM, John Knutson wrote:

> I'm still trying to milk as much as I can, performance-wise, out of HDF5...
>
> My latest bit of confusion comes from the following seeming paradox.  I have two files, poo2.h5 and poo3-d-1.h5.  Both files contain exactly the same data, though poo2, being the larger data set, has more blank-filled elements.  Also, because of the larger size of the data set in poo2, the data starts at (0, 0, 694080) or thereabouts, vs. (0,0,0) for poo3-d-1.  My question is: "why is the smaller data set 10x larger in size (bytes?) than the larger data set with the same data and chunking?"

        The "compression ratio" reported accounts for the sparseness of the chunks in the dataset.  You probably have written more data to the smaller dataset.

> Is there any way to look at the details of what data is stored in the file, i.e. how many and maybe which chunks are stored, etc.?

        We don't have a way to return a "map" of the chunks for a dataset currently (although it is in our issue tracker).

        Quincey

> HDF5 "poo2.h5" {
> DATASET "/Data/IS-GPS-200 ID 2 Ephemerides" {
>  DATATYPE  "/Types/Ephemeris IS-GPS-200 id 2"
>  DATASPACE  SIMPLE { ( 26, 160, 1051200 ) / ( H5S_UNLIMITED, H5S_UNLIMITED, 1051200 ) }
>  STORAGE_LAYOUT {
>     CHUNKED ( 1, 15, 250 )
>     SIZE 23228 (52713869.468:1 COMPRESSION)
>   }
>
> vs.
>
> HDF5 "poo3-d-1.h5" {
> DATASET "/Data/IS-GPS-200 ID 2 Ephemerides" {
>  DATATYPE  "/Types/Ephemeris IS-GPS-200 id 2"
>  DATASPACE  SIMPLE { ( 1, 160, 2880 ) / ( H5S_UNLIMITED, H5S_UNLIMITED, 2880 ) }
>  STORAGE_LAYOUT {
>     CHUNKED ( 1, 15, 250 )
>     SIZE 251461 (513.097:1 COMPRESSION)
>   }
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: puzzled by chunking, storage, and performance.

John Knutson-2
In reply to this post by John Knutson-2
Setting aside the strange sizing issues in the earlier messages for a
moment...

Let's say I have a data set, dimensioned ( 26, 160, 1051200 )
and chunked ( 1, 15, 240 )

As I understand it, each individual chunk in the file will be in the
following order:
[ 0, 0, 0-239 ] - [ 0, 14, 0-239 ]

and the chunks will be ordered thus:
[ 0, 0, 0 ], [ 0, 0, 240 ] ... [ 0, 0, 1051200 ], [ 0, 15, 0 ], [ 0, 15,
240 ] ... [ 0, 15, 1051200 ]
and so on...

Is that correct?

Should I expect peak read performance by reading one chunk at a time in
that order, assuming each chunk is 1MB in size, as is the cache?

I notice there are functions for examining the hit % of the metadata
cache... any chance of equivalent functions for the raw data chunk cache?



_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: puzzled by chunking, storage, and performance.

Ruth Aydt
Administrator
You may find some of the chunking discussions in this paper of interest:

        http://www.hdfgroup.org/pubs/papers/2008-06_netcdf4_perf_report.pdf 

in particular, section 3.2 and port6ions of sections 4 & 5.


On Sep 27, 2010, at 4:19 PM, John Knutson wrote:

> Setting aside the strange sizing issues in the earlier messages for a moment...
>
> Let's say I have a data set, dimensioned ( 26, 160, 1051200 )
> and chunked ( 1, 15, 240 )
>
> As I understand it, each individual chunk in the file will be in the following order:
> [ 0, 0, 0-239 ] - [ 0, 14, 0-239 ]
>
> and the chunks will be ordered thus:
> [ 0, 0, 0 ], [ 0, 0, 240 ] ... [ 0, 0, 1051200 ], [ 0, 15, 0 ], [ 0, 15, 240 ] ... [ 0, 15, 1051200 ]
> and so on...
>
> Is that correct?

Chunks are not necessarily ordered on the disk,  so the sequence in which you read the chunks shouldn't impact performance.

>
> Should I expect peak read performance by reading one chunk at a time in that order, assuming each chunk is 1MB in size, as is the cache?
>
> I notice there are functions for examining the hit % of the metadata cache... any chance of equivalent functions for the raw data chunk cache?
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: puzzled by chunking, storage, and performance.

John Knutson-2
Thanks.. I've read the pertinent sections and what I'm coming away with
is that the chunk *sizes* should be designed around the I/O bandwidth of
your disk subsystem, and the *shapes* should be designed around the
access patterns for the data and around the data set itself (avoid
mostly empty chunks and so on, as per 5.1.2 guidelines)...

What this doesn't really get into, it seems to me, is the role of the
raw data chunk cache in all of this.

I don't think contiguous data is even an option for us, as we would have
several multi-terabyte data sets which take quite some time just to
initialize on disk.

Ruth Aydt wrote:
> You may find some of the chunking discussions in this paper of interest:
>
> http://www.hdfgroup.org/pubs/papers/2008-06_netcdf4_perf_report.pdf 
>
> in particular, section 3.2 and port6ions of sections 4 & 5.
>
>
>  

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

New "Chunking in HDF5" document

Frank Baker

Several users have raised questions regarding chunking in HDF5.  Partly in response to these questions, the initial draft of a new "Chunking in HDF5" document is now available on The HDF Group's website:
    http://www.hdfgroup.org/HDF5/doc/_topic/Chunking/

This draft includes sections on the following topics:
    General description of chunks
    Storage and access order
    Partial I/O
    Chunk caching
    I/O filters and compression
    Pitfalls and errors to avoid
    Additional Resources
    Future directions
Several suggestions for tuning chunking in an application are provided along the way.

As a draft, this remains a work in progress; your feedback will be appreciated and will be very useful in the document's development.  For example, let us know if there are additional questions that you would like to see treated.

Regards,
-- Frank Baker
   HDF Documentation
   [hidden email]



_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: New "Chunking in HDF5" document

Biddiscombe, John A.
Does the hdfgroup have any kind of plan/schedule for enabling compression of chunks when using parallel IO?

The use case being that each process compresses its own chunk at write time and the overall file size is reduced.
(I understand that chunks are preallocated and this makes it hard to implement compressed chunking with Parallel IO).

Thanks

JB


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Frank Baker
Sent: 04 October 2010 22:31
To: HDF Users Discussion List
Subject: [Hdf-forum] New "Chunking in HDF5" document


Several users have raised questions regarding chunking in HDF5.  Partly in response to these questions, the initial draft of a new "Chunking in HDF5" document is now available on The HDF Group's website:
    http://www.hdfgroup.org/HDF5/doc/_topic/Chunking/

This draft includes sections on the following topics:
    General description of chunks
    Storage and access order
    Partial I/O
    Chunk caching
    I/O filters and compression
    Pitfalls and errors to avoid
    Additional Resources
    Future directions
Several suggestions for tuning chunking in an application are provided along the way.

As a draft, this remains a work in progress; your feedback will be appreciated and will be very useful in the document's development.  For example, let us know if there are additional questions that you would like to see treated.

Regards,
-- Frank Baker
   HDF Documentation
   [hidden email]



_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: New "Chunking in HDF5" document

Quincey Koziol
Hi John,

On Feb 22, 2011, at 4:16 AM, Biddiscombe, John A. wrote:

> Does the hdfgroup have any kind of plan/schedule for enabling compression of chunks when using parallel IO?

        It's on my agenda for the first year of work that we will be starting soon for LBNL.  I think it's feasible for independent I/O, with some work.  I think collective I/O will probably require a different approach, however.  At least with collective I/O, all the processes are available to communicate and work on things together...

        The problem with the collective I/O [write] operations is that multiple processes may be writing into each chunk, which MPI-I/O can handle when the data is not compressed, but since compressed data is context-sensitive, straightforward collective I/O won't work for compressed chunks.  Perhaps a two-phase approach where the data for each chunk was shipped to a single process, which updated the data in the chunk and compressed it, followed by 1+ passes of collective writes of compressed chunks.

        The problem with independent I/O [write] operations is that compressed chunks [almost always] change size when the data in the chunk is written (either initially, or when the data is overwritten), and since all the processes aren't available, communicating the space allocation is a problem.  Each process needs to allocate space in the file, but since the other processes aren't "listening", it can't let them know that some space in the file has been used.  A possible solution to this might involve just appending data to the end of the file, but that's prone to race conditions between processes (although maybe the "shared file pointer" I/O mode in MPI-I/O would help this).  Also, if each process moves a chunk around in the file (because it resized it), how will other processes learn where that chunk is, if they need to read from it?

> The use case being that each process compresses its own chunk at write time and the overall file size is reduced.
> (I understand that chunks are preallocated and this makes it hard to implement compressed chunking with Parallel IO).

        Some other ideas that we've been kicking around recently are:

- Using a lossy compressor (like a wavelet encoder) to put a fixed upper limit on the size of each chunk, making them all the same size.  This will obviously affect the precision of the data stored and thus may not be a good solution for restart dumps, although it might be fine for visualization/plot files.  It's great from the perspective that it completely eliminates the space allocation problem, though.

- Use a lossless compressor (like gzip), but put an upper limit on the compressed size of a chunk, something that's likely to be achievable, like 2:1 or so.  Then, if each chunk can't be compressed to that size, have the I/O operation fail.  This eliminates the space allocation issue, but at the cost of possibly not being able to write compressed data at all.

- Alternatively, use a lossless compressor with an upper limit on the compressed size of a chunk, but also allow for chunks that aren't able to be compressed to the goal ratio to be stored uncompressed.  So, the dataset will only have two sizes of chunks: full-size chunks and half-size (or third-size, etc) chunks, which limits the space allocation complexities involved.  I'm not certain this buys much in the way of benefits, since it doesn't eliminate space allocation, and probably wouldn't address the space allocation problems with independent I/O.

        Any other ideas or input?

                Quincey

> Thanks
>
> JB
>
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Frank Baker
> Sent: 04 October 2010 22:31
> To: HDF Users Discussion List
> Subject: [Hdf-forum] New "Chunking in HDF5" document
>
>
> Several users have raised questions regarding chunking in HDF5.  Partly in response to these questions, the initial draft of a new "Chunking in HDF5" document is now available on The HDF Group's website:
>    http://www.hdfgroup.org/HDF5/doc/_topic/Chunking/
>
> This draft includes sections on the following topics:
>    General description of chunks
>    Storage and access order
>    Partial I/O
>    Chunk caching
>    I/O filters and compression
>    Pitfalls and errors to avoid
>    Additional Resources
>    Future directions
> Several suggestions for tuning chunking in an application are provided along the way.
>
> As a draft, this remains a work in progress; your feedback will be appreciated and will be very useful in the document's development.  For example, let us know if there are additional questions that you would like to see treated.
>
> Regards,
> -- Frank Baker
>   HDF Documentation
>   [hidden email]
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: New "Chunking in HDF5" document

Werner Benger
On Tue, 22 Feb 2011 21:29:03 +0100, Quincey Koziol <[hidden email]>  
wrote:

> Hi John,
>
> On Feb 22, 2011, at 4:16 AM, Biddiscombe, John A. wrote:
>
>> Does the hdfgroup have any kind of plan/schedule for enabling  
>> compression of chunks when using parallel IO?
>
> It's on my agenda for the first year of work that we will be starting  
> soon for LBNL.  I think it's feasible for independent I/O, with some  
> work.  I think collective I/O will probably require a different  
> approach, however.  At least with collective I/O, all the processes are  
> available to communicate and work on things together...
>
> The problem with the collective I/O [write] operations is that multiple  
> processes may be writing into each chunk, which MPI-I/O can handle when  
> the data is not compressed, but since compressed data is  
> context-sensitive, straightforward collective I/O won't work for  
> compressed chunks.  Perhaps a two-phase approach where the data for each  
> chunk was shipped to a single process, which updated the data in the  
> chunk and compressed it, followed by 1+ passes of collective writes of  
> compressed chunks.
>
> The problem with independent I/O [write] operations is that compressed  
> chunks [almost always] change size when the data in the chunk is written  
> (either initially, or when the data is overwritten), and since all the  
> processes aren't available, communicating the space allocation is a  
> problem.  Each process needs to allocate space in the file, but since  
> the other processes aren't "listening", it can't let them know that some  
> space in the file has been used.  A possible solution to this might  
> involve just appending data to the end of the file, but that's prone to  
> race conditions between processes (although maybe the "shared file  
> pointer" I/O mode in MPI-I/O would help this).  Also, if each process  
> moves a chunk around in the file (because it resized it), how will other  
> processes learn where that chunk is, if they need to read from it?
>
>> The use case being that each process compresses its own chunk at write  
>> time and the overall file size is reduced.
>> (I understand that chunks are preallocated and this makes it hard to  
>> implement compressed chunking with Parallel IO).
>
> Some other ideas that we've been kicking around recently are:
>
> - Using a lossy compressor (like a wavelet encoder) to put a fixed upper  
> limit on the size of each chunk, making them all the same size.  This  
> will obviously affect the precision of the data stored and thus may not  
> be a good solution for restart dumps, although it might be fine for  
> visualization/plot files.  It's great from the perspective that it  
> completely eliminates the space allocation problem, though.
>
> - Use a lossless compressor (like gzip), but put an upper limit on the  
> compressed size of a chunk, something that's likely to be achievable,  
> like 2:1 or so.  Then, if each chunk can't be compressed to that size,  
> have the I/O operation fail.  This eliminates the space allocation  
> issue, but at the cost of possibly not being able to write compressed  
> data at all.
>
> - Alternatively, use a lossless compressor with an upper limit on the  
> compressed size of a chunk, but also allow for chunks that aren't able  
> to be compressed to the goal ratio to be stored uncompressed.  So, the  
> dataset will only have two sizes of chunks: full-size chunks and  
> half-size (or third-size, etc) chunks, which limits the space allocation  
> complexities involved.  I'm not certain this buys much in the way of  
> benefits, since it doesn't eliminate space allocation, and probably  
> wouldn't address the space allocation problems with independent I/O.
>
> Any other ideas or input?
>
Maybe HDF5 could allocate some space for the uncompressed data, and if the  
compressed data don't use all that space, re-use leftover space for other  
purposes within the same processor, similar to a sparse matrix. This would  
not reduce the file size when writing the first dataset, but subsequent  
writings could benefit from it, as will a h5copy of the final dataset  
later (if copying is an option).

          Werner




> Quincey
>
>> Thanks
>>
>> JB
>>
>>
>> -----Original Message-----
>> From: [hidden email]  
>> [mailto:[hidden email]] On Behalf Of Frank Baker
>> Sent: 04 October 2010 22:31
>> To: HDF Users Discussion List
>> Subject: [Hdf-forum] New "Chunking in HDF5" document
>>
>>
>> Several users have raised questions regarding chunking in HDF5.  Partly  
>> in response to these questions, the initial draft of a new "Chunking in  
>> HDF5" document is now available on The HDF Group's website:
>>    http://www.hdfgroup.org/HDF5/doc/_topic/Chunking/
>>
>> This draft includes sections on the following topics:
>>    General description of chunks
>>    Storage and access order
>>    Partial I/O
>>    Chunk caching
>>    I/O filters and compression
>>    Pitfalls and errors to avoid
>>    Additional Resources
>>    Future directions
>> Several suggestions for tuning chunking in an application are provided  
>> along the way.
>>
>> As a draft, this remains a work in progress; your feedback will be  
>> appreciated and will be very useful in the document's development.  For  
>> example, let us know if there are additional questions that you would  
>> like to see treated.
>>
>> Regards,
>> -- Frank Baker
>>   HDF Documentation
>>   [hidden email]
>>
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>


--
___________________________________________________________________________
Dr. Werner Benger                Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809                        Fax.: +1 225 578-5362

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: New "Chunking in HDF5" document

Biddiscombe, John A.
In reply to this post by Quincey Koziol
Replying to multiple comments at once.

Quincey : "multiple processes may be writing into each chunk, which MPI-I/O can handle when the data is not compressed, but since compressed data is context-sensitive"
My initial use case would be much simpler. A chunk would be aligned with the boundaries of the domain decomposition and each process would write one chunk - one at a time - A compression filter would be applied by the process owning the data and then it would be written to disk (much like Marks' suggestion).
a) lossless. Problem understood, chunks varying in size, nasty metadata synchronization, sparse files, issues.
b) lossy. Seems feasible. We were in fact considering a wavelet type compression as a first pass (pun intended). "It's great from the perspective that it completely eliminates the space allocation problem". Absolutely. All chunks are known to be of size X beforehand, so nothing changes except for the indexing and actual chunk storage/retrieval + de/compression.

I also like the idea of using a lossless compression and having the IO operation fail if the data doesn't fit. Would give the user the chance to try their best to compress with some knowledge of the data type and if it doesn't fit the allocated space, to abort.

Mark : Multi-pass VFD. I like this too. It potentially allows a very flexible approach where even if collective IO is writing to the same chunk, the collection/compression phase can do the sums and transmit the info into the hdf5 metadata layer. We'd certainly need to extend the chunking interface to handle variable seized chunks to allow for more/less compression in different areas of the data (actually this would be true for any option involving lossless compression). I think the chunk hashing relies on all chunks being the same size, so any change to that is going to be a huge compatibility breaker. Also, the chunking layer sits on top of the VFD, so I'm not sure if the VFD would be able to manipulate the chunks in the way desired. Perhaps I'm mstaked and the VFD does see the chunks. Correct me anyway.

Quincey : One idea I had and which I think Mark also expounded on is ... each process takes its own data and compresses it as it sees fit, then the processes do a synchronization step to tell each other how much (new compressed) data they have got - and then a dataset create is called - using the size of the compressed data. Now each process creates a hyperslab for its piece of compressed data and writes into the file using collective IO. We now add an array of extent information and compression algorithm info to the dataset as an attribute where each entry has a start and end index of the data for each process.

Now the only trouble is that reading the data back requires a double step of reading the attributes and decompressing the desired piece- quite nasty when odd slices are being requested.

Now I start to think that Marks double VFD suggestion would do basically this (in one way or another), but maintaining the normal data layout rather than writing a special dataset representing the compressed data.
step 1 : Data is collected into chunks (if already aligned with domain decomposition, no-op), chunks are compressed.
step 2 : Sizes of chunks are exchanged and space is allocated in the file for all the chunks.
step 3 : chunks of compressed data are written
not sure two passes are actually needed, as long as the 3 steps are followed.

...but variable chunk sizes are not allowed in hdf (true or false?) - this seems like a showstopper.
Aha. I understand. The actual written data can/could vary in size, as long as the chunk indices as referring to the original dataspace are regular. yes?

JB
Please forgive my thinking out aloud





-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Mark Miller
Sent: 22 February 2011 23:43
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] New "Chunking in HDF5" document

On Tue, 2011-02-22 at 14:06, Quincey Koziol wrote:

>
> Well, as I say above, with this approach, you push the space
> allocation problem to the dataset creation step (which has it's own
> set of problems),

Yeah, but those 'problems' aren't new to parallel I/O issues. Anyone
that is currently doing concurrent parallel I/O with HDF5 has had to
already deal with this part of the problem -- space allocation at
dataset creation -- right? The point is the caller of HDF5 then knows
how big it will be after its been compressed and HDF5 doesn't have to
'discover' that during H5Dwrite. Hmm puzzling...

I am recalling my suggestion of a '2-pass-planning' VFD where the caller
executes slew of HDF5 operations on a file TWICE. The first pass, HDF5
doesn't do any of the actual raw data I/O but just records all the
information about it for a 'repeat performance' second pass. In the
second pass, HDF5 knows everything about what is 'about to happen' and
then can plan accordingly.

What about maybe doing that on a dataset-at-a-time basis? I mean, what
if you set dxpl props to indicate either 'pass 1' or 'pass 2' of a
2-pass H5Dwrite operation. On pass 1, between H5Dopen and H5Dclose,
H5Dwrites don't do any of the raw data I/O but do apply filters and
compute sizes of things it will eventually write. On H5Dclose of pass 1,
all the information of chunk sizes is recorded. Caller then does
everything again, a second time but sets 'pass' to 'pass 2' in dxpl for
H5Dwrite calls and everything 'works' because all processors know
everything they need to know.

>   Maybe HDF5 could expose an API routine that the application could
> call, to pre-compress the data by passing it through the I/O filters?

I think that could be useful in any case. Like its now possible to apply
type conversion to a buffer of bytes, it probably ought to be possible
to apply any 'filter' to a buffer of bytes. The second half of this
though would involve smartening HDF5 then to 'pass-through' pre-filtered
data so result is 'as if' HDF5 had done the filtering work itself during
H5Dwrite. Not sure how easy that would be ;) But, you asked for
comments/input.

>
> Quincey
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
[hidden email]      urgent: [hidden email]
T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: New "Chunking in HDF5" document

Quincey Koziol
In reply to this post by Quincey Koziol

On Feb 23, 2011, at 4:12 PM, Quincey Koziol wrote:

> Hi Mark,
>
> On Feb 22, 2011, at 4:42 PM, Mark Miller wrote:
>
>> On Tue, 2011-02-22 at 14:06, Quincey Koziol wrote:
>>
>>>
>>> Well, as I say above, with this approach, you push the space
>>> allocation problem to the dataset creation step (which has it's own
>>> set of problems),
>>
>> Yeah, but those 'problems' aren't new to parallel I/O issues. Anyone
>> that is currently doing concurrent parallel I/O with HDF5 has had to
>> already deal with this part of the problem -- space allocation at
>> dataset creation -- right? The point is the caller of HDF5 then knows
>> how big it will be after its been compressed and HDF5 doesn't have to
>> 'discover' that during H5Dwrite. Hmm puzzling...
>
> True, yes.
>
>> I am recalling my suggestion of a '2-pass-planning' VFD where the caller
>> executes slew of HDF5 operations on a file TWICE. The first pass, HDF5
>> doesn't do any of the actual raw data I/O but just records all the
>> information about it for a 'repeat performance' second pass. In the
>> second pass, HDF5 knows everything about what is 'about to happen' and
>> then can plan accordingly.
>
> Ah, yes, that may be a good segue into this two-pass feature.  I've been thinking about this feature and wondering about how to implement it.  Something that occurs to me would be to construct it like a "transaction", where the application opens a transaction,  the HDF5 library just records those operations performed with API routines, then when the application closes the transaction, they are replayed twice: once to record the results of all the operations, and then a second pass that actually performs all the I/O.  That would help to reduce the overhead from the collective metadata modification overhead also.

        BTW, if we go down this "transaction" path, it allows the HDF5 library to push the fault tolerance up to the application level - the library could guarantee that the atomicity of what was "visible" in the file was an entire checkpoint, rather than the atomicity being on a per-API call basis.

        Quincey

>> What about maybe doing that on a dataset-at-a-time basis? I mean, what
>> if you set dxpl props to indicate either 'pass 1' or 'pass 2' of a
>> 2-pass H5Dwrite operation. On pass 1, between H5Dopen and H5Dclose,
>> H5Dwrites don't do any of the raw data I/O but do apply filters and
>> compute sizes of things it will eventually write. On H5Dclose of pass 1,
>> all the information of chunk sizes is recorded. Caller then does
>> everything again, a second time but sets 'pass' to 'pass 2' in dxpl for
>> H5Dwrite calls and everything 'works' because all processors know
>> everything they need to know.
>
> Ah, I like this also!
>
>>> Maybe HDF5 could expose an API routine that the application could
>>> call, to pre-compress the data by passing it through the I/O filters?
>>
>> I think that could be useful in any case. Like its now possible to apply
>> type conversion to a buffer of bytes, it probably ought to be possible
>> to apply any 'filter' to a buffer of bytes. The second half of this
>> though would involve smartening HDF5 then to 'pass-through' pre-filtered
>> data so result is 'as if' HDF5 had done the filtering work itself during
>> H5Dwrite. Not sure how easy that would be ;) But, you asked for
>> comments/input.
>
> Yes, that's the direction I was thinking about going.
>
> I think the transaction idea I mentioned above might be the most general and have the highest payoff.  It could even be implemented with poor man's parallel I/O, when the transaction concluded.
>
> Quincey
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: New "Chunking in HDF5" document

Quincey Koziol
In reply to this post by Biddiscombe, John A.
Hi John,

On Feb 23, 2011, at 6:54 AM, Biddiscombe, John A. wrote:

> Replying to multiple comments at once.
>
> Quincey : "multiple processes may be writing into each chunk, which MPI-I/O can handle when the data is not compressed, but since compressed data is context-sensitive"
> My initial use case would be much simpler. A chunk would be aligned with the boundaries of the domain decomposition and each process would write one chunk - one at a time - A compression filter would be applied by the process owning the data and then it would be written to disk (much like Marks' suggestion).
> a) lossless. Problem understood, chunks varying in size, nasty metadata synchronization, sparse files, issues.
> b) lossy. Seems feasible. We were in fact considering a wavelet type compression as a first pass (pun intended). "It's great from the perspective that it completely eliminates the space allocation problem". Absolutely. All chunks are known to be of size X beforehand, so nothing changes except for the indexing and actual chunk storage/retrieval + de/compression.

        Yup.  (Although it's not impossible for collective I/O)

> I also like the idea of using a lossless compression and having the IO operation fail if the data doesn't fit. Would give the user the chance to try their best to compress with some knowledge of the data type and if it doesn't fit the allocated space, to abort.

        OK, at least one other person thinks this is reasonable. :-)

> Mark : Multi-pass VFD.
> I like this too. It potentially allows a very flexible approach where even if collective IO is writing to the same chunk, the collection/compression phase can do the sums and transmit the info into the hdf5 metadata layer. We'd certainly need to extend the chunking interface to handle variable sized chunks to allow for more/less compression in different areas of the data (actually this would be true for any option involving lossless compression). I think the chunk hashing relies on all chunks being the same size, so any change to that is going to be a huge compatibility breaker. Also, the chunking layer sits on top of the VFD, so I'm not sure if the VFD would be able to manipulate the chunks in the way desired. Perhaps I'm mistaken and the VFD does see the chunks. Correct me anyway.

        If we go with the multi-pass/transaction idea, I don't think we need to worry about the chunks being different sizes.

        You are correct in that the VFD layer doesn't see the chunk information.  (And I think it would be bad to make it so :-)

> Quincey : One idea I had and which I think Mark also expounded on is ... each process takes its own data and compresses it as it sees fit, then the processes do a synchronization step to tell each other how much (new compressed) data they have got - and then a dataset create is called - using the size of the compressed data. Now each process creates a hyperslab for its piece of compressed data and writes into the file using collective IO. We now add an array of extent information and compression algorithm info to the dataset as an attribute where each entry has a start and end index of the data for each process.
>
> Now the only trouble is that reading the data back requires a double step of reading the attributes and decompressing the desired piece- quite nasty when odd slices are being requested.

        Maybe.  (Icky if so)

> Now I start to think that Marks double VFD suggestion would do basically this (in one way or another), but maintaining the normal data layout rather than writing a special dataset representing the compressed data.
> step 1 : Data is collected into chunks (if already aligned with domain decomposition, no-op), chunks are compressed.
> step 2 : Sizes of chunks are exchanged and space is allocated in the file for all the chunks.
> step 3 : chunks of compressed data are written
> not sure two passes are actually needed, as long as the 3 steps are followed.
>
> ...but variable chunk sizes are not allowed in hdf (true or false?) - this seems like a showstopper.
> Aha. I understand. The actual written data can/could vary in size, as long as the chunk indices as referring to the original dataspace are regular. yes?

        Yes.

> JB
> Please forgive my thinking out aloud

        Not a problem - please continue to participate!

                Quincey

>
>
>
>
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Mark Miller
> Sent: 22 February 2011 23:43
> To: HDF Users Discussion List
> Subject: Re: [Hdf-forum] New "Chunking in HDF5" document
>
> On Tue, 2011-02-22 at 14:06, Quincey Koziol wrote:
>
>>
>> Well, as I say above, with this approach, you push the space
>> allocation problem to the dataset creation step (which has it's own
>> set of problems),
>
> Yeah, but those 'problems' aren't new to parallel I/O issues. Anyone
> that is currently doing concurrent parallel I/O with HDF5 has had to
> already deal with this part of the problem -- space allocation at
> dataset creation -- right? The point is the caller of HDF5 then knows
> how big it will be after its been compressed and HDF5 doesn't have to
> 'discover' that during H5Dwrite. Hmm puzzling...
>
> I am recalling my suggestion of a '2-pass-planning' VFD where the caller
> executes slew of HDF5 operations on a file TWICE. The first pass, HDF5
> doesn't do any of the actual raw data I/O but just records all the
> information about it for a 'repeat performance' second pass. In the
> second pass, HDF5 knows everything about what is 'about to happen' and
> then can plan accordingly.
>
> What about maybe doing that on a dataset-at-a-time basis? I mean, what
> if you set dxpl props to indicate either 'pass 1' or 'pass 2' of a
> 2-pass H5Dwrite operation. On pass 1, between H5Dopen and H5Dclose,
> H5Dwrites don't do any of the raw data I/O but do apply filters and
> compute sizes of things it will eventually write. On H5Dclose of pass 1,
> all the information of chunk sizes is recorded. Caller then does
> everything again, a second time but sets 'pass' to 'pass 2' in dxpl for
> H5Dwrite calls and everything 'works' because all processors know
> everything they need to know.
>
>>  Maybe HDF5 could expose an API routine that the application could
>> call, to pre-compress the data by passing it through the I/O filters?
>
> I think that could be useful in any case. Like its now possible to apply
> type conversion to a buffer of bytes, it probably ought to be possible
> to apply any 'filter' to a buffer of bytes. The second half of this
> though would involve smartening HDF5 then to 'pass-through' pre-filtered
> data so result is 'as if' HDF5 had done the filtering work itself during
> H5Dwrite. Not sure how easy that would be ;) But, you asked for
> comments/input.
>
>>
>> Quincey
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> [hidden email]      urgent: [hidden email]
> T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: New "Chunking in HDF5" document

Rhys Ulerich
In reply to this post by Quincey Koziol
>>       BTW, if we go down this "transaction" path, it allows the HDF5
>> library to push the fault tolerance up to the application level - the
>> library could guarantee that the atomicity of what was "visible" in
>> the file was an entire checkpoint, rather than the atomicity being on
>> a per-API call basis.

> Hmm. Thats only true if 'transaction' is whole file scope, right? I mean
> aren't you going to allow application to decide what 'granularity' a
> transaction should be; a single dataset, a bunch of datasets in a group
> in the file, etc.

Careful fellas... you'll end up implementing a good part of
conventional database transactions and their ACID guarantees before
you're done.  And you won't have the benefit of SQL as a lingua
franca.  If you want fancy transaction semantics why not just use a
database vendor with a particularly rich BLOB API?

99% tongue-in-cheek,
Rhys

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Reply | Threaded
Open this post in threaded view
|

Re: New "Chunking in HDF5" document

Quincey Koziol
Hi Rhys,

On Feb 23, 2011, at 9:17 PM, Rhys Ulerich wrote:

>>>       BTW, if we go down this "transaction" path, it allows the HDF5
>>> library to push the fault tolerance up to the application level - the
>>> library could guarantee that the atomicity of what was "visible" in
>>> the file was an entire checkpoint, rather than the atomicity being on
>>> a per-API call basis.
>
>> Hmm. Thats only true if 'transaction' is whole file scope, right? I mean
>> aren't you going to allow application to decide what 'granularity' a
>> transaction should be; a single dataset, a bunch of datasets in a group
>> in the file, etc.
>
> Careful fellas... you'll end up implementing a good part of
> conventional database transactions and their ACID guarantees before
> you're done.  And you won't have the benefit of SQL as a lingua
> franca.  If you want fancy transaction semantics why not just use a
> database vendor with a particularly rich BLOB API?

        I'm definitely not advocating going whole-hog for ACID semantics, but I think there are certain useful pieces of ACID that can be leveraged.  :-)

        Quincey


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org