Quantcast

Re: [**EXTERNAL**] Re: first non-fill-value in the sparse chunked dataset

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [**EXTERNAL**] Re: first non-fill-value in the sparse chunked dataset

Dave Allured - NOAA Affiliate
Efim,

Can you simply add a scalar integer attribute that keeps track of the lower bound index value of the slower dimension?  Just update this attribute every time you write to the data set, or at least every time the lower bound goes lower.  This would be an application level solution, rather than something provided by the library.

This resembles a minimal version of Gerd's suggestion #1.

--Dave


On Thu, Apr 20, 2017 at 8:39 AM, Efim Dyadkin <[hidden email]> wrote:

Sorry I should have specified what "first" is. I have a 2d dataset with slower dimension sparse and unlimited,  

and with fast dimension non-sparse and of fixed length. Typically for my data, information can be written first 

in the "middle" of the slower dimension of the dataset and then grow in any direction (to the left and to the right)

incrementally. I need to keep track of current bounding box in order to only access populated part of the dataset. 

The upper boundary of the slower dimension is basically an extent of the dataset so I do not need to store it 

on my ownAs to lower boundary I hoped I could find it by getting access to a first available chunk with

 a smallest index along slower dimension.

I think exposing at least a boolean grid of existing chunks could be helpful for sparse data handling.

Thanks,

Efim


From: Hdf-forum <[hidden email]> on behalf of Gerd Heber <[hidden email]>
Sent: Thursday, April 20, 2017 7:20 AM
To: HDF Users Discussion List
Subject: [**EXTERNAL**] Re: [Hdf-forum] first non-fill-value in the sparse chunked dataset
 

The “first non-fill-value” in which order? (chronological, C-order, …)

 

Short answer: No chance.

 

Slightly longer: (Apart from H5DOwrite_chunk…) There is currently no API that

gives you direct control over/introspection into chunks. You can control certain

aspects of chunk allocation time and policy (via dataset creation properties),

but the rest is pretty opaque and a side-effect of H5D[read,write].

I think you have at least two options:

 

1. Create an auxiliary structure where you maintain that type of log information.

   (This is dangerous/illusionary because you’ll be making assumptions about how the

    HDF5 library writes/updates chunks, and what happens in the underlying storage.)

 

2. Create a proper sparse structure and don’t use chunking to mimic one.

   (You might still struggle with the definition of ‘first.’)

 

G.

 

 From: Hdf-forum [mailto:[hidden email]] On Behalf Of Efim Dyadkin

Sent: Wednesday, April 19, 2017 5:04 PM
To: [hidden email]
Subject: [Hdf-forum] first non-fill-value in the sparse chunked dataset

 

Hi,

 

I am using a sparse chunked dataset with a certain fill value. I’d like to find a first non-fill-value element in the dataset. Can I narrow down my search to a first available chunk? How can I do it?

 

Thank you,

Efim Dyadkin


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [**EXTERNAL**] Re: first non-fill-value in the sparse chunked dataset

Efim Dyadkin-2

Thank you Gerd and Dave. 


Solution #1 is okay for my current task. However, ultimately, for performance of my app, I would like to visit only those areas of the sparse dataset where data really exists. From your answers and the documentation I learn that this information is available with chunk granularity in b-tree but apparently not exposed in API.



From: Hdf-forum <[hidden email]> on behalf of Dave Allured - NOAA Affiliate <[hidden email]>
Sent: Thursday, April 20, 2017 10:16 AM
To: [hidden email]
Subject: Re: [Hdf-forum] [**EXTERNAL**] Re: first non-fill-value in the sparse chunked dataset
 
Efim,

Can you simply add a scalar integer attribute that keeps track of the lower bound index value of the slower dimension?  Just update this attribute every time you write to the data set, or at least every time the lower bound goes lower.  This would be an application level solution, rather than something provided by the library.

This resembles a minimal version of Gerd's suggestion #1.

--Dave


On Thu, Apr 20, 2017 at 8:39 AM, Efim Dyadkin <[hidden email]> wrote:

Sorry I should have specified what "first" is. I have a 2d dataset with slower dimension sparse and unlimited,  

and with fast dimension non-sparse and of fixed length. Typically for my data, information can be written first 

in the "middle" of the slower dimension of the dataset and then grow in any direction (to the left and to the right)

incrementally. I need to keep track of current bounding box in order to only access populated part of the dataset. 

The upper boundary of the slower dimension is basically an extent of the dataset so I do not need to store it 

on my ownAs to lower boundary I hoped I could find it by getting access to a first available chunk with

 a smallest index along slower dimension.

I think exposing at least a boolean grid of existing chunks could be helpful for sparse data handling.

Thanks,

Efim


From: Hdf-forum <[hidden email]> on behalf of Gerd Heber <[hidden email]>
Sent: Thursday, April 20, 2017 7:20 AM
To: HDF Users Discussion List
Subject: [**EXTERNAL**] Re: [Hdf-forum] first non-fill-value in the sparse chunked dataset
 

The “first non-fill-value” in which order? (chronological, C-order, …)

 

Short answer: No chance.

 

Slightly longer: (Apart from H5DOwrite_chunk…) There is currently no API that

gives you direct control over/introspection into chunks. You can control certain

aspects of chunk allocation time and policy (via dataset creation properties),

but the rest is pretty opaque and a side-effect of H5D[read,write].

I think you have at least two options:

 

1. Create an auxiliary structure where you maintain that type of log information.

   (This is dangerous/illusionary because you’ll be making assumptions about how the

    HDF5 library writes/updates chunks, and what happens in the underlying storage.)

 

2. Create a proper sparse structure and don’t use chunking to mimic one.

   (You might still struggle with the definition of ‘first.’)

 

G.

 

 From: Hdf-forum [mailto:[hidden email]] On Behalf Of Efim Dyadkin

Sent: Wednesday, April 19, 2017 5:04 PM
To: [hidden email]
Subject: [Hdf-forum] first non-fill-value in the sparse chunked dataset

 

Hi,

 

I am using a sparse chunked dataset with a certain fill value. I’d like to find a first non-fill-value element in the dataset. Can I narrow down my search to a first available chunk? How can I do it?

 

Thank you,

Efim Dyadkin

------------------- This e-mail, including any attached files, may contain confidential and privileged information for the sole use of the intended recipient. Any review, use, distribution, or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive information for the intended recipient), please contact the sender by reply e-mail and delete all copies of this message.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Loading...