H5Aopen_name() is taking almost 11 minutes to read 56000 attributes and 3200 groups.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

H5Aopen_name() is taking almost 11 minutes to read 56000 attributes and 3200 groups.

Deepak 8 Kumar
Hello!

I have a HDF5 based application to read the hdf5 file which has almost 3200 groups and 56000 attributes. The application is using the standard hdf5 api and it took almost 11 minutes to read only the groups and attributes. I used StopWatch and observed that H5Aopen_name() is taking almost 97 percent of the total time. I am using HDF 1.10.1 Windows10 x64.

My question is that is this the expected behavior here with H5Aopen_name()  or I am not  reading it properly?
What approach we should take in this kind of file with large number of attributes?
Any insight is greatly appreciated.

Thanks,
Deepak Kumar
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: H5Aopen_name() is taking almost 11 minutes to read 56000 attributes and 3200 groups.

Miller, Mark C.

Hmm. That *does* sound really bad.

 

I don't have a lot of experience with attributes but am wondering about H5Aiterate.

 

Would using that approach to accessing your attribute data improve performance?

 

Reason I think it could possibly is that it wouldn't be constantly be having to map an attribute name and if you are opening most of the attributes on a given object, iterating them might go faster. But, that is only a guess.

 

Mark

 

 

"Hdf-forum on behalf of Deepak 8 Kumar" wrote:

 

Hello!

I have a HDF5 based application to read the hdf5 file which has almost 3200 groups and 56000 attributes. The application is using the standard hdf5 api and it took almost 11 minutes to read only the groups and attributes. I used StopWatch and observed that H5Aopen_name() is taking almost 97 percent of the total time. I am using HDF 1.10.1 Windows10 x64.

My question is that is this the expected behavior here with H5Aopen_name()  or I am not  reading it properly?
What approach we should take in this kind of file with large number of attributes?
Any insight is greatly appreciated.

Thanks,
Deepak Kumar


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: H5Aopen_name() is taking almost 11 minutes to read 56000 attributes and 3200 groups.

Deepak 8 Kumar
Hello Mark,

Thanks for the quick reply. I am Indeed using H5Aiterate2. Here is the sample code for reference.
H5Aiterate2(id, H5_INDEX_CRT_ORDER, H5_ITER_INC, NULL, iterate_attributes, &parent_group);

Here is the callback method implementation.
herr_t iterate_attributes(hid_t group_id, const char *attribute_name, const H5A_info_t *ainfo, void *pgroup) {
                hid_t attribute_id = H5Aopen_name(group_id, attribute_name);
}

The only way I can see how to open the Attribute is to call the H5Aopen_name();

Thanks,
Deepak Kumar



From:        "Miller, Mark C." <[hidden email]>
To:        HDF Users Discussion List <[hidden email]>
Date:        07/13/2017 12:20 PM
Subject:        Re: [Hdf-forum] H5Aopen_name() is taking almost 11 minutes to read 56000 attributes and 3200 groups.
Sent by:        "Hdf-forum" <[hidden email]>




Hmm. That *does* sound really bad.
 
I don't have a lot of experience with attributes but am wondering about H5Aiterate.
 
Would using that approach to accessing your attribute data improve performance?
 
Reason I think it could possibly is that it wouldn't be constantly be having to map an attribute name and if you are opening most of the attributes on a given object, iterating them might go faster. But, that is only a guess.
 
Mark
 
 
"Hdf-forum on behalf of Deepak 8 Kumar" wrote:
 
Hello!

I have a HDF5 based application to read the hdf5 file which has almost 3200 groups and 56000 attributes. The application is using the standard hdf5 api and it took almost 11 minutes to read only the groups and attributes. I used StopWatch and observed that H5Aopen_name() is taking almost 97 percent of the total time. I am using HDF 1.10.1 Windows10 x64.


My question is that is this the expected behavior here with H5Aopen_name()  or I am not  reading it properly?
What approach we should take in this kind of file with large number of attributes?

Any insight is greatly appreciated.


Thanks,

Deepak Kumar
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.hdfgroup.org_mailman_listinfo_hdf-2Dforum-5Flists.hdfgroup.org&d=DQICAg&c=p0oa49nxxGtbbM2qgM-GB4r4m9OlGg-sEp8sXylY2aQ&r=aVpsmDSm2bBgznM4DES61bNo7E_uMhhiIutsE14aYRg&m=M1dgdSv4QChcMTvMIHcWUfXREyf2RqqhkBLh4ZSr2JU&s=c28-Rp-JEqKrjEL4WSUjG6zkVcwjqnw0NibVNiBbLT4&e=
Twitter:
https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_hdf5&d=DQICAg&c=p0oa49nxxGtbbM2qgM-GB4r4m9OlGg-sEp8sXylY2aQ&r=aVpsmDSm2bBgznM4DES61bNo7E_uMhhiIutsE14aYRg&m=M1dgdSv4QChcMTvMIHcWUfXREyf2RqqhkBLh4ZSr2JU&s=93qmyGzSXSFGMj6BrwNdK40xApoMV54Hde5LvTpyKrY&e=

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: H5Aopen_name() is taking almost 11 minutes to read 56000 attributes and 3200 groups.

Miller, Mark C.

Hi Deepak,

 

My apologies. My response was somewhat misleading.

 

First, Your are using >=1.10, right? Does H5Aopen() perform faster than H5Aopen_name? It would be silly if it did but you never know.

 

Next, can't you use something like this (a bit of a guess)...

 

herr_t iterate_attributes(hid_t group_id, const char *attribute_name, const H5A_info_t *ainfo, void *pgroup) {

    hid_t retval;

 

    if (ainfo->corder_valid)    

        retval = H5Aopen_by_idx(group_id, H5_INDEX_CRT_ORDER, H5_ITER_NATIVE, (hsize_t) ainfo->corder, H5P_DEFAULT, H5P_DEFAULT );

    else

        retval = H5Aopen_name(group_id, attribute_name);

 

    return retval;

}

 

Only problem is that if ainfo->corder_valid is always false, then it'll just be doing what you're doing now.

 

H5Aget_info indicates "... Note that if creation order is not being tracked, no creation order data will be valid." I briefly looked and didn't see any place where it is documented the conditions under which attribute creation order is or is not tracked..

 

 

 

 

 

"Hdf-forum on behalf of Deepak 8 Kumar" wrote:

 

Hello Mark,

Thanks for the quick reply. I am Indeed using H5Aiterate2. Here is the sample code for reference.
H5Aiterate2(id, H5_INDEX_CRT_ORDER, H5_ITER_INC, NULL, iterate_attributes, &parent_group);

Here is the callback method implementation.
herr_t iterate_attributes(hid_t group_id, const char *attribute_name, const H5A_info_t *ainfo, void *pgroup) {
                hid_t attribute_id = H5Aopen_name(group_id, attribute_name);
}

The only way I can see how to open the Attribute is to call the H5Aopen_name();

Thanks,
Deepak Kumar



From:        "Miller, Mark C." <[hidden email]>
To:        HDF Users Discussion List <[hidden email]>
Date:        07/13/2017 12:20 PM
Subject:        Re: [Hdf-forum] H5Aopen_name() is taking almost 11 minutes to read 56000 attributes and 3200 groups.
Sent by:        "Hdf-forum" <[hidden email]>





Hmm. That *does* sound really bad.
 
I don't have a lot of experience with attributes but am wondering about H5Aiterate.
 
Would using that approach to accessing your attribute data improve performance?
 
Reason I think it could possibly is that it wouldn't be constantly be having to map an attribute name and if you are opening most of the attributes on a given object, iterating them might go faster. But, that is only a guess.
 
Mark
 
 
"Hdf-forum on behalf of Deepak 8 Kumar" wrote:
 
Hello!

I have a HDF5 based application to read the hdf5 file which has almost 3200 groups and 56000 attributes. The application is using the standard hdf5 api and it took almost 11 minutes to read only the groups and attributes. I used StopWatch and observed that H5Aopen_name() is taking almost 97 percent of the total time. I am using HDF 1.10.1 Windows10 x64.


My question is that is this the expected behavior here with H5Aopen_name()  or I am not  reading it properly?
What approach we should take in this kind of file with large number of attributes?

Any insight is greatly appreciated.


Thanks,

Deepak Kumar
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
https://secure-web.cisco.com/11l0csQhZPOVgAbmenPLVDk_2SlFwsxtHeo1tyzWAuK_ZMT-IGyuRO5E4iFNHqWO_XLU3sHReAaRjO010bcqiHNOcu4Vh18B-CUkOKbfTEXROlh_LR3oPp--WFEedJQ1vTtQVdN7lc3rMEQ6hIIak48QNcrDdWwC TmvJLexADAzxw3OlT4a_Ic2An2gs1Cmn-ssnTt5sNHN2xUvxs26zghZYOicFcPOAbzrLifu7UGFetteJJX8luNEehe1n2vPFYeWVf7GbwwZiLDHI7b7p2ouoKVP8aeX5glwTo6Sk_Gve_hEdNIP2CjCwTjEFe840Q5-jI3y5S4VeKYkXsYZ46ZUhHcH-13Hu8LNqpifDrdEllrgN4xIFhxsVegbRdYnEQ/https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__lists.hdfgroup.org_mailman_listinfo_hdf-2Dforum-5Flists.hdfgroup.org%26d%3DDQICAg%26c%3Dp0oa49nxxGtbbM2qgM-GB4r4m9OlGg-sEp8sXylY2aQ%26r%3DaVpsmDSm2bBgznM4DES61bNo7E_uMhhiIutsE14aYRg%26m%3DM1dgdSv4QChcMTvMIHcWUfXREyf2RqqhkBLh4ZSr2JU%26s%3Dc28-Rp-JEqKrjEL4WSUjG6zkVcwjqnw0NibVNiBbLT4%26e%3D
Twitter:
https://secure-web.cisco.com/1X40X5zzcHVamaIlV9i4yCROmgbFrOZbAvoBWaIztY2ck96UTuqXrXE9IAoqDjJkDlnQCqe-6bWEAkdY2eT4mPCNz1R62L84uFtyPBSGEo72ucEgxfhWQASuMOAVZhpUhnKAQxh0Np_LvbBI_qcDMTZ5odDlt7w016vgT-0nYQULeEmp3UY79Dzrpx0VXKC6hrxtCC8OuX jIXsJC1aA8qctpeSwMeYXXzeqgquuhLh6XI1ooHojubIsUR_Jn8zh1TgEcAKNX49rInrvEdxy_B9kEx2sWSI3I-bRse1cG_OmcCv5Nx_4Qb7srkP4E5gUlimlZb1A37CGIU0EuiT1sEF9xZbxO6zs3VWOhJGNOspyjPCUahKNLoShIzbPAIfGBe/https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__twitter.com_hdf5%26d%3DDQICAg%26c%3Dp0oa49nxxGtbbM2qgM-GB4r4m9OlGg-sEp8sXylY2aQ%26r%3DaVpsmDSm2bBgznM4DES61bNo7E_uMhhiIutsE14aYRg%26m%3DM1dgdSv4QChcMTvMIHcWUfXREyf2RqqhkBLh4ZSr2JU%26s%3D93qmyGzSXSFGMj6BrwNdK40xApoMV54Hde5LvTpyKrY%26e%3D


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: H5Aopen_name() is taking almost 11 minutes to read 56000 attributes and 3200 groups.

Werner Benger
In reply to this post by Deepak 8 Kumar

Hi,

 I had similarly bad experience with many groups and many attributes as well. Some things can be done (best do all of them):

1. Make sure you set H5Pset_libver_bounds(,H5F_LIBVER_LATEST,H5F_LIBVER_LATEST) when creating the file such that the newest library features are used

2. compress the group metadata, which may reduce disk reading times (some high-speed compressor such as LZ4 is recommended)

3. Use the split file driver to place meta-data into different physical files than the raw data, so meta-data are always compact on disk (might be less of an issue for SSDs)

4. Reorganize your attributes into datasets instead; when appending data it may be less inefficient to update the datasets, so during an incremental data update you can still write attributes, then do some postprocessing that builds a "cache" of the attributes into a dataset instead, and during reading you read that dataset instead of the attributes. This postprocessing to build the "attribute cache" would take some time of course, but if it needs to be done once only while reading happens frequently, it is worth the effort. Depends on your use case scenario of course.

Cheers,

           Werner

 
On 13.07.2017 18:33, Deepak 8 Kumar wrote:
Hello!

I have a HDF5 based application to read the hdf5 file which has almost 3200 groups and 56000 attributes. The application is using the standard hdf5 api and it took almost 11 minutes to read only the groups and attributes. I used StopWatch and observed that H5Aopen_name() is taking almost 97 percent of the total time. I am using HDF 1.10.1 Windows10 x64.

My question is that is this the expected behavior here with H5Aopen_name()  or I am not  reading it properly?
What approach we should take in this kind of file with large number of attributes?
Any insight is greatly appreciated.

Thanks,
Deepak Kumar

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

-- 
___________________________________________________________________________
Dr. Werner Benger                Visualization Research
Center for Computation & Technology at Louisiana State University (CCT/LSU)
2019  Digital Media Center, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809                        Fax.: +1 225 578-5362 

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5