[hdf-forum] Practical limit on number of objects?

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Practical limit on number of objects?

Darryl Okahata
Hi,

     Sorry, this question has probably been asked before, but I couldn't
find anything in the docs, and there doesn't seem to be an archive of
this mailing list.

     Are there known practical limitations on the number of objects
(e.g., groups or datasets)?  I'm asking because I've written some test
programs, and the HDF5 performance seems to start non-linearly degrading
once the number of objects grows above approximately 50000-100000
objects.  Are there parameter settings that can improve this?

     I have two test programs that I'm using to test HDF5:

Program 1:
        Create a new HDF5 file, and write 100000 chunked datasets of
        size (6,2,2) (native double) into the top level.  In this test,
        the chunk dimensions are the same as the entire dataset.

Program 2:
        Create a new HDF5 file, create 1000 groups, and write 100
        chunked datasets of size (6,2,2) (native double) into each
        group.  In this test, the chunk dimensions are the same as the
        entire dataset.

I'm using chunked datasets, because the next test after this would
extend the dataset sizes from (6,2,2) to (N,2,2), for varying values of
N.  I've tried using the split file driver, but the performance of that
is comparable.

     Also, to get better performance, I've had to twiddle various symbol
and storage parameters, but I really have no idea what I'm doing, here:

        status = H5Pset_istore_k(fcpl, 1);
        status = H5Pset_sym_k(fcpl, 20, 50);

[ What I'm really trying to do is figure out a reasonable way of storing
  ragged arrays of ragged arrays of ragged arrays of ....  The nesting
  can go pretty deep, and so I was wondering if I could use groups to
  help with the nesting.  Unfortunately, with this approach, the number
  of groups used by my program could be on the order of a trillion or
  more, worst-case.  I could use alternative encodings (e.g.,
  concatenate all my datasets), but, at that point, I don't know if it's
  worthwhile to use HDF5 any more.  ;-(  ]

--
        Darryl Okahata
        darrylo at soco.agilent.com

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Practical limit on number of objects?

Quincey Koziol
Hi Darryl,

On Aug 6, 2008, at 2:49 PM, Darryl Okahata wrote:

> Hi,
>
>     Sorry, this question has probably been asked before, but I  
> couldn't
> find anything in the docs, and there doesn't seem to be an archive of
> this mailing list.
>
>     Are there known practical limitations on the number of objects
> (e.g., groups or datasets)?  I'm asking because I've written some test
> programs, and the HDF5 performance seems to start non-linearly  
> degrading
> once the number of objects grows above approximately 50000-100000
> objects.  Are there parameter settings that can improve this?

        I would suggest trying the enhancements that come with using the  
latest version of the file format, which can be enabled by calling  
H5Pset_libver_bounds() with both bounds settings to  
H5F_LIBVER_LATEST.  This should force the library to use the newer  
data structures for storing links in groups.  We are continuing some  
work that will speed things up further, but it's not into a public  
release yet.

>     I have two test programs that I'm using to test HDF5:
>
> Program 1:
> Create a new HDF5 file, and write 100000 chunked datasets of
> size (6,2,2) (native double) into the top level.  In this test,
> the chunk dimensions are the same as the entire dataset.
>
> Program 2:
> Create a new HDF5 file, create 1000 groups, and write 100
> chunked datasets of size (6,2,2) (native double) into each
> group.  In this test, the chunk dimensions are the same as the
> entire dataset.
>
> I'm using chunked datasets, because the next test after this would
> extend the dataset sizes from (6,2,2) to (N,2,2), for varying values  
> of
> N.  I've tried using the split file driver, but the performance of  
> that
> is comparable.
>
>     Also, to get better performance, I've had to twiddle various  
> symbol
> and storage parameters, but I really have no idea what I'm doing,  
> here:
>
> status = H5Pset_istore_k(fcpl, 1);
> status = H5Pset_sym_k(fcpl, 20, 50);
>
> [ What I'm really trying to do is figure out a reasonable way of  
> storing
>  ragged arrays of ragged arrays of ragged arrays of ....  The nesting
>  can go pretty deep, and so I was wondering if I could use groups to
>  help with the nesting.  Unfortunately, with this approach, the number
>  of groups used by my program could be on the order of a trillion or
>  more, worst-case.  I could use alternative encodings (e.g.,
>  concatenate all my datasets), but, at that point, I don't know if  
> it's
>  worthwhile to use HDF5 any more.  ;-(  ]

        You can nest HDF5's variable-length datatypes arbitrarily deep - does  
that give you what you are looking for?

        Quincey


> --
> Darryl Okahata
> darrylo at soco.agilent.com
>
> DISCLAIMER: this message is the author's personal opinion and does not
> constitute the support, opinion, or policy of Agilent Technologies, or
> of the little green men that have been following him all day.
>
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org
> .
> To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.
>
>


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Practical limit on number of objects?

Darryl Okahata
> I would suggest trying the enhancements that come with using the  
> latest version of the file format, which can be enabled by calling  
> H5Pset_libver_bounds() with both bounds settings to  
> H5F_LIBVER_LATEST.

     Thanks.  I tried this, but the difference is minimal, for my test
program.

> You can nest HDF5's variable-length datatypes arbitrarily deep - does  
> that give you what you are looking for?

     One of the issues is that the data won't fit into memory.  Worst
case, the entire pile of data is 10-100+ terabytes in size.

     I'm now trying plan B: concatenating the original datasets,
end-to-end, in one big dataset.  I have another dataset that keeps track
of the locations and sizes of the original datasets, in the big dataset.
So far, this seems to be somewhat fast and scalable (the write times
seem to scale roughly linearly).  I need to do read timings, though.

     On a different note: does h5dump support split files?  I can't seem
to get h5dump to recognize them.  I wrote some test code that uses the
split driver to write two files:

        ext.h5.meta
        ext.h5.raw

$ h5dump ext.h5.meta
h5dump error: unable to open file "ext.h5.meta"
$ h5dump ext.h5.raw
h5dump error: unable to open file "ext.h5.raw"
$ h5dump --filedriver split ext.h5.meta
h5dump error: unable to open file "ext.h5.meta"
$ h5dump --filedriver split ext.h5.raw
h5dump error: unable to open file "ext.h5.raw"

I don't think it's a problem with the test code, as it produces a
dumpable "ext.h5" file if I comment out the call to H5Pset_fapl_split()
(the test code originated from the h5_extend.c example).

--
        Darryl Okahata
        darrylo at soco.agilent.com

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Dataset overhead (HDF5 1.6 & Java)

Jim Robinson
In reply to this post by Quincey Koziol

Hi,  I have a java application using HDF through the jni bindings.   I
know from previous discussions on this forum that opening datasets have
some signifcant overhead and that, in general, reducing the number of
datasets is a good idea.   I have done that to the degree possible but
opening datasets are still, by far, the top hit on my performance
profiles.  It is taking significantly longer than reading the actual data.

The structure of the program makes it likely that once a dataset is
opened it is likely to be revisted later.   I have been closing datasets
after each read operation because I am not sure what the consequences
are of simply leaving them open.    I am considering implementing a
cache to keep some number of datasets open for the life of the program,  
or until my cache limit is reached.   Does anyone know how many I can
safely keep open without worrying about running out of some resource,
for example memory, int the C code that is actually managing them?  
Would 100 be safe,  or 1,000?  

The program by the way is a genomics visualizer, and was just released
to the public at www.broad.mit.edu.

Thanks for any help.  I like hdf5, it saves me a lot of time,  but I've
got to solve this problem by some means to continue using it.

Regards

Jim


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Dataset overhead (HDF5 1.6 & Java)

Francesc Alted
A Saturday 09 August 2008, Jim Robinson escrigu?:
> Hi,  I have a java application using HDF through the jni bindings.  
> I know from previous discussions on this forum that opening datasets
> have some signifcant overhead and that, in general, reducing the
> number of datasets is a good idea.   I have done that to the degree
> possible but opening datasets are still, by far, the top hit on my
> performance profiles.  It is taking significantly longer than reading
> the actual data.

If this is the case, then I think that you are using HDF5 in an scenario
that it is not designed for.  HDF5 is mainly meant for keeping large
amounts of data in relatively few containers.  If what you are trying
to do is to keep all your data spread in a lot of containers perhaps
using an object database (or something else) would be your best bet.

Having said that, the HDF5 1.8.x series implements a much more optimized
cache for metadata that should help you somewhat.  See below.

> The structure of the program makes it likely that once a dataset is
> opened it is likely to be revisted later.   I have been closing
> datasets after each read operation because I am not sure what the
> consequences are of simply leaving them open.    I am considering
> implementing a cache to keep some number of datasets open for the
> life of the program, or until my cache limit is reached.

My experience in that field is that you should have enough with the
included metadata cache integrated in the HDF5 library, specially that
included in HDF5 1.8.0 on -- just that you should not stress it too
much.  Just as a reference, on a small benchmark that I've done with
PyTables (the results should be extensible to an equivalent C benchmark
too), here it is the memory taken for creating and reading a file with
5000 and 10000 datasets:

HDF5 1.6.7:

Create:
File with 5000 datasets:  45 MB
File with 10000 datasets: 73 MB

Read a subset of 100 datasets:
File with 5000 datasets:  20 MB
File with 10000 datasets: 25 MB

HDF5 1.8.0:

Create:
File with 5000 datasets:  30 MB
File with 10000 datasets: 31 MB

Read a subset of 100 datasets:
File with 5000 datasets:  18 MB
File with 10000 datasets: 19 MB

So, clearly, the 1.8.x series have improved a lot in this area.

> Does
> anyone know how many I can safely keep open without worrying about
> running out of some resource, for example memory, int the C code that
> is actually managing them? Would 100 be safe,  or 1,000?

If you are still interested in implementing a sort of LRU cache for your
nodes (datasets), you should be aware that the algorithm for
determining the least recently used node also takes time (perhaps a lot
more than the chosen algorithm for evict the metadata cache in HDF5),
so you may want to create a cache with quite small number of nodes (my
recomendation is not to exceed 256) so as to not add too much overhead
in your implementation.

Just as a reference, in PyTables Pro I have implemented such a LRU cache
(for other reasons than yours) with a carefully optimized C code and,
for a LRU cache size of 256 nodes, we are getting a loss in performance
between 2x and 3x with respect to the metadata cache code in HDF5.  Of
course, we got our *own* eviction algorithm for cache, but we had to
pay a price for that.

> The program by the way is a genomics visualizer, and was just
> released to the public at www.broad.mit.edu.
>
> Thanks for any help.  I like hdf5, it saves me a lot of time,  but
> I've got to solve this problem by some means to continue using it.

HTH,

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Dataset overhead (HDF5 1.6 & Java)

Jim Robinson
Maybe you're correct,  the logical model of HDF is well suited to my
problem though and I'm committed to it in the near term.    This design
bias towards large datasets in relatively few containers is not obvious
from the documentation.   At any rate I'm not talking about a huge
number of datasets,  just 30-100 or so typically.

HDf 1.8x is not an option because, as far as I know,  there are no java
jni bindings available yet.  I will try your suggestion for a relatively
small cache.   Thanks very much for the numbers and advice.

Jim



Francesc Alted wrote:

> A Saturday 09 August 2008, Jim Robinson escrigu?:
>  
>> Hi,  I have a java application using HDF through the jni bindings.  
>> I know from previous discussions on this forum that opening datasets
>> have some signifcant overhead and that, in general, reducing the
>> number of datasets is a good idea.   I have done that to the degree
>> possible but opening datasets are still, by far, the top hit on my
>> performance profiles.  It is taking significantly longer than reading
>> the actual data.
>>    
>
> If this is the case, then I think that you are using HDF5 in an scenario
> that it is not designed for.  HDF5 is mainly meant for keeping large
> amounts of data in relatively few containers.  If what you are trying
> to do is to keep all your data spread in a lot of containers perhaps
> using an object database (or something else) would be your best bet.
>
> Having said that, the HDF5 1.8.x series implements a much more optimized
> cache for metadata that should help you somewhat.  See below.
>
>  
>> The structure of the program makes it likely that once a dataset is
>> opened it is likely to be revisted later.   I have been closing
>> datasets after each read operation because I am not sure what the
>> consequences are of simply leaving them open.    I am considering
>> implementing a cache to keep some number of datasets open for the
>> life of the program, or until my cache limit is reached.
>>    
>
> My experience in that field is that you should have enough with the
> included metadata cache integrated in the HDF5 library, specially that
> included in HDF5 1.8.0 on -- just that you should not stress it too
> much.  Just as a reference, on a small benchmark that I've done with
> PyTables (the results should be extensible to an equivalent C benchmark
> too), here it is the memory taken for creating and reading a file with
> 5000 and 10000 datasets:
>
> HDF5 1.6.7:
>
> Create:
> File with 5000 datasets:  45 MB
> File with 10000 datasets: 73 MB
>
> Read a subset of 100 datasets:
> File with 5000 datasets:  20 MB
> File with 10000 datasets: 25 MB
>
> HDF5 1.8.0:
>
> Create:
> File with 5000 datasets:  30 MB
> File with 10000 datasets: 31 MB
>
> Read a subset of 100 datasets:
> File with 5000 datasets:  18 MB
> File with 10000 datasets: 19 MB
>
> So, clearly, the 1.8.x series have improved a lot in this area.
>
>  
>> Does
>> anyone know how many I can safely keep open without worrying about
>> running out of some resource, for example memory, int the C code that
>> is actually managing them? Would 100 be safe,  or 1,000?
>>    
>
> If you are still interested in implementing a sort of LRU cache for your
> nodes (datasets), you should be aware that the algorithm for
> determining the least recently used node also takes time (perhaps a lot
> more than the chosen algorithm for evict the metadata cache in HDF5),
> so you may want to create a cache with quite small number of nodes (my
> recomendation is not to exceed 256) so as to not add too much overhead
> in your implementation.
>
> Just as a reference, in PyTables Pro I have implemented such a LRU cache
> (for other reasons than yours) with a carefully optimized C code and,
> for a LRU cache size of 256 nodes, we are getting a loss in performance
> between 2x and 3x with respect to the metadata cache code in HDF5.  Of
> course, we got our *own* eviction algorithm for cache, but we had to
> pay a price for that.
>
>  
>> The program by the way is a genomics visualizer, and was just
>> released to the public at www.broad.mit.edu.
>>
>> Thanks for any help.  I like hdf5, it saves me a lot of time,  but
>> I've got to solve this problem by some means to continue using it.
>>    
>
> HTH,
>
>  

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Dataset overhead (HDF5 1.6 & Java)

Francesc Alted
A Saturday 09 August 2008, escrigu?reu:
> Maybe you're correct,  the logical model of HDF is well suited to my
> problem though and I'm committed to it in the near term.    This
> design bias towards large datasets in relatively few containers is
> not obvious from the documentation.

Perhaps this fact is not obvious from reading the docs, but I've clearly
read it from the book of my own experience ;-)

> At any rate I'm not talking
> about a huge number of datasets,  just 30-100 or so typically.

30-100 datasets should be not a problem at all, even when using HDF5
1.6.x series.

> HDf 1.8x is not an option because, as far as I know,  there are no
> java jni bindings available yet.  I will try your suggestion for a
> relatively small cache.   Thanks very much for the numbers and
> advice.

Glad that I helped you.

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Dataset overhead (HDF5 1.6 & Java)

Jim Robinson
Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Dataset overhead (HDF5 1.6 & Java)

Francesc Alted
Hi Jim,

If your issue is the time to access the different datasets, then you
will not see too much difference between using 1.6.x or 1.8.x.  In my
benchmarks (made on my pretty old laptop), I'm opening a dataset in
about 200 microseconds when the dataset is not in the HDF5 metadata
cache and in about 35 microseconds when the dataset is in the cache,
irregardingly of using 1.6.7 or 1.8.0 -- incidentally, when the dataset
metadata is in the cache there is some small advantage in favor of
1.6.7 of a 25%, but this should not be too important in your setup.

Most importantly, my experiments show that the time needed to open a
single dataset is approximately *independent* of the number of datasets
in a file (at least in the range of 100 ~ 10000 datasets).  At any
rate, I don't find 35 microseconds (this is with PyTables, but this
time should be certainly better when using a C program) to be an
excessive figure for reopening a dataset.  I'd recommend you to double
check where your bottleneck really is (perhaps is in other piece of
code that really depends on the number of datasets visited).

If you really need much more speed than 35 microseconds, and you want to
keep using HDF5, then I'd think a way to consolidate more information
in your datasets, be adding more dimensions or appending more data to
the existing datasets.  Then, you can perform direct I/O by doing
hyperselections of the interesting parts of your datasets.  Of course,
that implies to setup a sort of index to quickly locate those
interesting sub-datasets.  Whether your new index is faster than the
cost of re-opening datasets in HDF5 will depend on the complexity of
such an index -- so you should look for a simple enough implementation.

BTW, after seeing the effectiveness of the new HDF5 1.8.x series, I
think that I'm starting to change my mind in that HDF5 is not useful
when there exist *a lot* of datasets in the same file.  The 1.8.x
versions seems to work really nice in this scenario.  Many thanks to
the HDF5 team for this :-)

Francesc

A Monday 11 August 2008, Jim Robinson escrigu?:

> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> <html>
> <head>
>   <meta content="text/html;charset=ISO-8859-1"
> http-equiv="Content-Type"> </head>
> <body bgcolor="#ffffff" text="#000000">
> Hi Francesc,&nbsp;&nbsp; I should have defined "problem" more
> clearly.&nbsp;&nbsp; In my application HDF is serve up data in small
> chunks to support an interactive visualization of genomic
> data.&nbsp;&nbsp; As they zoom and pan more reads are triggered as
> additional "tiles" come into view (its modeled closely on google
> maps).&nbsp;&nbsp;&nbsp;&nbsp; There are many datasets spread across
> multiple HDF5 files.&nbsp;&nbsp;&nbsp; As long as the user is hitting
> only a few datasets zooming and panning are fairly smooth, but when
> the numbers get in the "10s"&nbsp; the lag from opening datasets
> becomes noticeable.<br> <br>
> The datasets are organized by zoom level so by caching datasets
> panning is now smooth.&nbsp;&nbsp; Delays when zooming are more
> acceptable,&nbsp; somehow you expect a delay when zooming in,&nbsp;
> or at least visually it is less disturbing than jerky
> panning.&nbsp;&nbsp; I don't really need an LRU cache,&nbsp; simply
> dumping everything and starting over when the cache is full is good
> enough.&nbsp; <br>
> <br>
> I would really like to try 1.8X.&nbsp;&nbsp;&nbsp; Are there any
> plans to develop a java interface for that version?<br>
> <br>
> Thanks<br>
> <br>
> Jim<br>
> <br>
> <br>
> <br>
> Francesc Alted wrote:
> <blockquote cite="mid:200808110847.00909.faltet at pytables.com"
>  type="cite">
>   <pre wrap="">A Saturday 09 August 2008, escrigu&eacute;reu:
>   </pre>
>   <blockquote type="cite">
>     <pre wrap="">Maybe you're correct,  the logical model of HDF is
> well suited to my problem though and I'm committed to it in the near
> term.    This design bias towards large datasets in relatively few
> containers is not obvious from the documentation.
>     </pre>
>   </blockquote>
>   <pre wrap=""><!---->
> Perhaps this fact is not obvious from reading the docs, but I've
> clearly read it from the book of my own experience ;-)
>
>   </pre>
>   <blockquote type="cite">
>     <pre wrap="">At any rate I'm not talking
> about a huge number of datasets,  just 30-100 or so typically.
>     </pre>
>   </blockquote>
>   <pre wrap=""><!---->
> 30-100 datasets should be not a problem at all, even when using HDF5
> 1.6.x series.
>
>   </pre>
>   <blockquote type="cite">
>     <pre wrap="">HDf 1.8x is not an option because, as far as I know,
>  there are no java jni bindings available yet.  I will try your
> suggestion for a relatively small cache.   Thanks very much for the
> numbers and advice.
>     </pre>
>   </blockquote>
>   <pre wrap=""><!---->
> Glad that I helped you.
>
>   </pre>
> </blockquote>
> </body>
> </html>
>
> ---------------------------------------------------------------------
>- This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to
> hdf-forum-subscribe at hdfgroup.org. To unsubscribe, send a message to
> hdf-forum-unsubscribe at hdfgroup.org.



--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Practical limit on number of objects?

Quincey Koziol
In reply to this post by Darryl Okahata
Hi Darryl,

On Aug 8, 2008, at 1:22 PM, Darryl Okahata wrote:

>     On a different note: does h5dump support split files?  I can't  
> seem
> to get h5dump to recognize them.  I wrote some test code that uses the
> split driver to write two files:
>
> ext.h5.meta
> ext.h5.raw
>
> $ h5dump ext.h5.meta
> h5dump error: unable to open file "ext.h5.meta"
> $ h5dump ext.h5.raw
> h5dump error: unable to open file "ext.h5.raw"
> $ h5dump --filedriver split ext.h5.meta
> h5dump error: unable to open file "ext.h5.meta"
> $ h5dump --filedriver split ext.h5.raw
> h5dump error: unable to open file "ext.h5.raw"
>
> I don't think it's a problem with the test code, as it produces a
> dumpable "ext.h5" file if I comment out the call to  
> H5Pset_fapl_split()
> (the test code originated from the h5_extend.c example).

        I believe this should work if you say "h5dump --filedriver split  
ext.h5"

        Quincey


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Practical limit on number of objects?

Darryl Okahata
Quincey Koziol <koziol at hdfgroup.org> wrote:

>       I believe this should work if you say "h5dump --filedriver split  
> ext.h5"

     Thanks.  It turns out that the expected file suffixes are "-m.h5"
and "-r.h5" (this is for 1.8.1).

     I tracked down the problem to the use of H5P_set_libver_bounds():

        status = H5Pset_libver_bounds(fapl, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);

If I don't use this, h5dump works.  However, the use of the above line
results in:

        $ h5dump ext.h5
        h5dump error: internal error (file h5dump.c:line 4150)

h5dump works if I don't use H5Pset_libver_bounds().  I've attached a
short diff to the example, "h5_extend.c", program.  If you compile and
run it, the resulting file will not work with h5dump.

[ Also, note that the patch is for 1.8.1, as I had to modify the example
  to work with the 1.8.1 API.  ]

--
        Darryl Okahata
        darrylo at soco.agilent.com

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.

-------------- next part --------------
--- h5_extend.c.orig 2008-08-13 14:57:18.000000000 -0700
+++ h5_extend.c 2008-08-13 15:22:34.000000000 -0700
@@ -19,7 +19,7 @@
 int
 main (void)
 {
-    hid_t       file;                          /* handles */
+    hid_t       file, fapl;                    /* handles */
     hid_t       dataspace, dataset;  
     hid_t       filespace;                  
     hid_t       cparms;                    
@@ -52,7 +52,13 @@
     dataspace = H5Screate_simple (RANK, dims, maxdims);
 
     /* Create a new file. If file exists its contents will be overwritten. */
-    file = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
+    fapl = H5Pcreate(H5P_FILE_ACCESS);
+    status = H5Pset_fapl_split(fapl,
+       "-m.h5", H5P_DEFAULT, "-r.h5", H5P_DEFAULT);
+#if 1
+    status = H5Pset_libver_bounds(fapl, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);
+#endif
+    file = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, fapl);
 
     /* Modify dataset creation properties, i.e. enable chunking  */
     cparms = H5Pcreate (H5P_DATASET_CREATE);
@@ -61,7 +67,7 @@
     /* Create a new dataset within the file using cparms
        creation properties.  */
     dataset = H5Dcreate (file, DATASETNAME, H5T_NATIVE_INT, dataspace,
-                         cparms);
+                         H5P_DEFAULT, cparms, H5P_DEFAULT);
 
     /* Extend the dataset. This call assures that dataset is 3 x 3.*/
     size[0]   = 3;
@@ -109,8 +115,9 @@
     Read the data back
  ***************************************************************/
 
+#if 0
     file = H5Fopen (FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
-    dataset = H5Dopen (file, DATASETNAME);
+    dataset = H5Dopen (file, DATASETNAME, H5P_DEFAULT);
     filespace = H5Dget_space (dataset);
     rank = H5Sget_simple_extent_ndims (filespace);
     status_n = H5Sget_simple_extent_dims (filespace, dimsr, NULL);
@@ -138,4 +145,5 @@
     status = H5Sclose (filespace);
     status = H5Sclose (memspace);
     status = H5Fclose (file);
+#endif
 }    

Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Practical limit on number of objects?

Quincey Koziol
Hi Darryl,

On Aug 13, 2008, at 5:38 PM, Darryl Okahata wrote:

> Quincey Koziol <koziol at hdfgroup.org> wrote:
>
>>      I believe this should work if you say "h5dump --filedriver split
>> ext.h5"
>
>     Thanks.  It turns out that the expected file suffixes are "-m.h5"
> and "-r.h5" (this is for 1.8.1).
>
>     I tracked down the problem to the use of H5P_set_libver_bounds():
>
>        status = H5Pset_libver_bounds(fapl, H5F_LIBVER_LATEST,  
> H5F_LIBVER_LATEST);
>
> If I don't use this, h5dump works.  However, the use of the above line
> results in:
>
>        $ h5dump ext.h5
>        h5dump error: internal error (file h5dump.c:line 4150)

        Hmm, this is odd.  What version of h5dump are you using?  It doesn't  
look like it's 1.8.1...

        Quincey

> h5dump works if I don't use H5Pset_libver_bounds().  I've attached a
> short diff to the example, "h5_extend.c", program.  If you compile and
> run it, the resulting file will not work with h5dump.
>
> [ Also, note that the patch is for 1.8.1, as I had to modify the  
> example
>  to work with the 1.8.1 API.  ]
>
> --
>        Darryl Okahata
>        darrylo at soco.agilent.com
>
> DISCLAIMER: this message is the author's personal opinion and does not
> constitute the support, opinion, or policy of Agilent Technologies, or
> of the little green men that have been following him all day.
>
> <h5_extend.c.diffs>


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Practical limit on number of objects?

Darryl Okahata
Quincey Koziol <koziol at hdfgroup.org> wrote:

> Hmm, this is odd.  What version of h5dump are you using?  It doesn't  
> look like it's 1.8.1...

     Well, it came from a 1.8.1 tarball, downloaded on July 24th:

        $ ll hdf5-1.8.1.tar.bz2
        -rw-r--r--  1 darrylo eesofrd 5083921 Jul 24 16:39 hdf5-1.8.1.tar.bz2

I just re-extracted the tarball, and compared it with the 1.8.1 source
tree from which my h5dump came, and the two source trees are identical,
except for hdf5-1.8.1/Makefile (it appears to have been regenerated by
automake).

     I'll try to rebuild HDF with -g and without optimization (I'm using
an old gcc 4.1.1 compiler).  Perhaps that will help.  It'll be a few
days, though.

     Thanks.

--
        Darryl Okahata
        darrylo at soco.agilent.com

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Practical limit on number of objects?

Quincey Koziol
Hi Darryl,

On Aug 14, 2008, at 12:13 PM, Darryl Okahata wrote:

> Quincey Koziol <koziol at hdfgroup.org> wrote:
>
>> Hmm, this is odd.  What version of h5dump are you using?  It doesn't
>> look like it's 1.8.1...
>
>     Well, it came from a 1.8.1 tarball, downloaded on July 24th:
>
> $ ll hdf5-1.8.1.tar.bz2
> -rw-r--r--  1 darrylo eesofrd 5083921 Jul 24 16:39 hdf5-1.8.1.tar.bz2
>
> I just re-extracted the tarball, and compared it with the 1.8.1 source
> tree from which my h5dump came, and the two source trees are  
> identical,
> except for hdf5-1.8.1/Makefile (it appears to have been regenerated by
> automake).
>
>     I'll try to rebuild HDF with -g and without optimization (I'm  
> using
> an old gcc 4.1.1 compiler).  Perhaps that will help.  It'll be a few
> days, though.

        Hmm, if you are going to be rebuilding, could you try the current  
stable 1.8 release code from our public subversion repository:  http://svn.hdfgroup.uiuc.edu/hdf5/branches/hdf5_1_8

        It's got some related fixes and might work better for you,
                Quincey


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.