Re: Data Chunking parallel IO problem with Lustre 2.10 and HDF5 1.10.x
My apology for taking so long to respond.
We do have a reported JIRA issue with HDF5 1.10.x and OpenMPI, your issue is probably related. We think this is not a Lustre issue, from a technical lead at Intel:
This isn’t a Lustre issue. From the trace, this definitely seems like a bug in openmpi / romio.
The romio in openmpi is pretty outdated and we had bugs fixed in the mpich repo that hasn’t carried over to the ompi repo.
Meanwhile, depending on the ompi version he is using, he can try to use – mca io ompio, or report this to the openmpi mailing list.
I tried your test program using Cray MPI on Lustre (2.6), with HDF5 1.10.1 and develop, and I did not have any issues, so if you need to use 1.10.x then you will for now need to use an alternate version of MPI.
On Dec 7, 2017, at 3:44 AM, Bertini, Denis Dr. <[hidden email]> wrote:
I am facing a problem with data chunking using Lustre 2.10 ( and 2.6) filesystem using HDF5 1.10.1 in parallel mode.
I attached in my mail a simple C program which create immediately the crash caused at Line 94 trying
to create the dataset collectively.
I observed the crash when i simply set the chunk size to be the same as the dataset size. I know that this is one
of the "non recommended" setup according to your documentation ("PitFalls")