Data Chunking parallel IO problem with Lustre 2.10 and HDF5 1.10.x

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Data Chunking parallel IO problem with Lustre 2.10 and HDF5 1.10.x

I am facing a problem with data chunking using Lustre 2.10 ( and 2.6) filesystem using HDF5 1.10.1 in parallel mode.
I attached in my mail a simple C program which create immediately the crash caused at Line 94 trying
to create the dataset collectively.  
I observed the crash when i simply set the chunk size to be the same as the dataset size. I know that this is one
of the "non recommended" setup according to your documentation ("PitFalls")
But  leaving apart the performance penalty , it should not cause a complete crash of the program. 
Furthermore testing the same program with the older HDF5 version 1.8.16 DO Not cause any crash on the same 
Lustre 2.10 ( or 2.6 ) version. So it seems that something has been change in the data chunking implementation 
between the two major HDF5 version 1.8.x and 1.10.x .

Could you please tell me what should be changed for the data chunk size in the program when using the new version HDF5 1.10.x?

Thanks in advance,
Denis Bertini

Here is the core dump that i observed as soon as i use more that one MPI process 
H5Pcreate access succeed
H5Pcreate access succeed
-I- Chunk size 176000:
-I- Chunk size 176000:
[lxbk0341:39368] *** Process received signal ***
[lxbk0341:39368] Signal: Segmentation fault (11)
[lxbk0341:39368] Signal code: Address not mapped (1)
[lxbk0341:39368] Failing at address: (nil)
[lxbk0341:39368] [ 0] /lib/x86_64-linux-gnu/[0x7f7742122890]
[lxbk0341:39368] [ 1] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/[0x7f772e8ac657]
[lxbk0341:39368] [ 2] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/[0x7f772e8ad363]
[lxbk0341:39368] [ 3] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/[0x7f772e8a2f5d]
[lxbk0341:39368] [ 4] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/[0x7f772e889e06]
[lxbk0341:39368] [ 5] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/[0x7f772e883802]
[lxbk0341:39368] [ 6] /lustre/hebe/rz/dbertini/plasma/softw/lib/[0x7f77423bfb2d]

Hdf-forum is for HDF software users discussion.
[hidden email]

mytest.c (3K) Download Attachment