Data Chunking parallel IO problem with Lustre 2.10 and HDF5 1.10.x

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Data Chunking parallel IO problem with Lustre 2.10 and HDF5 1.10.x

DenisBertini
Hi
I am facing a problem with data chunking using Lustre 2.10 ( and 2.6) filesystem using HDF5 1.10.1 in parallel mode.
I attached in my mail a simple C program which create immediately the crash caused at Line 94 trying
to create the dataset collectively.  
I observed the crash when i simply set the chunk size to be the same as the dataset size. I know that this is one
of the "non recommended" setup according to your documentation ("PitFalls")
But  leaving apart the performance penalty , it should not cause a complete crash of the program. 
Furthermore testing the same program with the older HDF5 version 1.8.16 DO Not cause any crash on the same 
Lustre 2.10 ( or 2.6 ) version. So it seems that something has been change in the data chunking implementation 
between the two major HDF5 version 1.8.x and 1.10.x .

Could you please tell me what should be changed for the data chunk size in the program when using the new version HDF5 1.10.x?

Thanks in advance,
Denis Bertini

PS:
Here is the core dump that i observed as soon as i use more that one MPI process 
H5Pcreate access succeed
H5Pcreate access succeed
-I- Chunk size 176000:
-I- Chunk size 176000:
[lxbk0341:39368] *** Process received signal ***
[lxbk0341:39368] Signal: Segmentation fault (11)
[lxbk0341:39368] Signal code: Address not mapped (1)
[lxbk0341:39368] Failing at address: (nil)
[lxbk0341:39368] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f7742122890]
[lxbk0341:39368] [ 1] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten+0x1577)[0x7f772e8ac657]
[lxbk0341:39368] [ 2] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten_datatype+0xe3)[0x7f772e8ad363]
[lxbk0341:39368] [ 3] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIO_Set_view+0x1fd)[0x7f772e8a2f5d]
[lxbk0341:39368] [ 4] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(mca_io_romio314_dist_MPI_File_set_view+0x2f6)[0x7f772e889e06]
[lxbk0341:39368] [ 5] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(mca_io_romio314_file_set_view+0x22)[0x7f772e883802]
[lxbk0341:39368] [ 6] /lustre/hebe/rz/dbertini/plasma/softw/lib/libmpi.so.40(MPI_File_set_view+0xdd)[0x7f77423bfb2d]

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

mytest.c (3K) Download Attachment