"File too large" error, seemingly related to MPI

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

"File too large" error, seemingly related to MPI

fffred
Hi,

While writing significant amount of data in parallel, I obtain the
following error stack:

HDF5-DIAG: Error detected in HDF5 (1.8.16) MPI-process 66:
  #000: H5D.c line 194 in H5Dcreate2(): unable to create dataset
    major: Dataset
    minor: Unable to initialize object
  #001: H5Dint.c line 453 in H5D__create_named(): unable to create and
link to dataset
    major: Dataset
    minor: Unable to initialize object
  #002: H5L.c line 1638 in H5L_link_object(): unable to create new
link to object
    major: Links
    minor: Unable to initialize object
  #003: H5L.c line 1882 in H5L_create_real(): can't insert link
    major: Symbol table
    minor: Unable to insert object
  #004: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #005: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #006: H5L.c line 1685 in H5L_link_cb(): unable to create object
    major: Object header
    minor: Unable to initialize object
  #007: H5O.c line 3016 in H5O_obj_create(): unable to open object
    major: Object header
    minor: Can't open object
  #008: H5Doh.c line 293 in H5O__dset_create(): unable to create dataset
    major: Dataset
    minor: Unable to initialize object
  #009: H5Dint.c line 1060 in H5D__create(): can't update the metadata cache
    major: Dataset
    minor: Unable to initialize object
  #010: H5Dint.c line 852 in H5D__update_oh_info(): unable to update
layout/pline/efl header message
    major: Dataset
    minor: Unable to initialize object
  #011: H5Dlayout.c line 238 in H5D__layout_oh_create(): unable to
initialize storage
    major: Dataset
    minor: Unable to initialize object
  #012: H5Dint.c line 1713 in H5D__alloc_storage(): unable to
initialize dataset with fill value
    major: Dataset
    minor: Unable to initialize object
  #013: H5Dint.c line 1805 in H5D__init_storage(): unable to allocate
all chunks of dataset
    major: Dataset
    minor: Unable to initialize object
  #014: H5Dchunk.c line 3575 in H5D__chunk_allocate(): unable to write
raw data to file
    major: Low-level I/O
    minor: Write failed
  #015: H5Dchunk.c line 3745 in H5D__chunk_collective_fill(): unable
to write raw data to file
    major: Low-level I/O
    minor: Write failed
  #016: H5Fio.c line 171 in H5F_block_write(): write through metadata
accumulator failed
    major: Low-level I/O
    minor: Write failed
  #017: H5Faccum.c line 825 in H5F__accum_write(): file write failed
    major: Low-level I/O
    minor: Write failed
  #018: H5FDint.c line 260 in H5FD_write(): driver write request failed
    major: Virtual File Layer
    minor: Write failed
  #019: H5FDmpio.c line 1846 in H5FD_mpio_write(): MPI_File_write_at_all failed
    major: Internal error (too specific to document in detail)
    minor: Some MPI function failed
  #020: H5FDmpio.c line 1846 in H5FD_mpio_write(): Other I/O error ,
error stack:
ADIOI_NFS_WRITESTRIDED(672): Other I/O error File too large
    major: Internal error (too specific to document in detail)
    minor: MPI Error String


It basically claims that I am creating a file too large. But I
verified that the filesystem is capable of handling such a size. In my
case, the file is around 4 TB when it crashes. Where could this
problem come from? I thought HDF5 does not have a problem with very
large files. Plus, I am dividing the file in several datasets, and the
write operations work perfectly until, at some point, it crashes with
the errors above.

Could it be an issue with HDF5? Or could it be an MPI limitation? I am
skeptic about the latter option: at the beginning, the program writes
several datasets inside the file succesfully (all the datasets being
the same size). If MPI was to blame, why wouldn't it crash at the
first write?

Thank you for your help.
Fred

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: "File too large" error, seemingly related to MPI

Quincey Koziol-3
Hi Frederic,
        Could you give us some more details about your file and the call(s) you are making to HDF5?   I can’t think of any reason that it would crash when creating a file like this, but something interesting could be going on…   :-)

        Quincey


> On Aug 7, 2017, at 5:28 AM, Frederic Perez <[hidden email]> wrote:
>
> Hi,
>
> While writing significant amount of data in parallel, I obtain the
> following error stack:
>
> HDF5-DIAG: Error detected in HDF5 (1.8.16) MPI-process 66:
>  #000: H5D.c line 194 in H5Dcreate2(): unable to create dataset
>    major: Dataset
>    minor: Unable to initialize object
>  #001: H5Dint.c line 453 in H5D__create_named(): unable to create and
> link to dataset
>    major: Dataset
>    minor: Unable to initialize object
>  #002: H5L.c line 1638 in H5L_link_object(): unable to create new
> link to object
>    major: Links
>    minor: Unable to initialize object
>  #003: H5L.c line 1882 in H5L_create_real(): can't insert link
>    major: Symbol table
>    minor: Unable to insert object
>  #004: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
>    major: Symbol table
>    minor: Object not found
>  #005: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
>    major: Symbol table
>    minor: Callback failed
>  #006: H5L.c line 1685 in H5L_link_cb(): unable to create object
>    major: Object header
>    minor: Unable to initialize object
>  #007: H5O.c line 3016 in H5O_obj_create(): unable to open object
>    major: Object header
>    minor: Can't open object
>  #008: H5Doh.c line 293 in H5O__dset_create(): unable to create dataset
>    major: Dataset
>    minor: Unable to initialize object
>  #009: H5Dint.c line 1060 in H5D__create(): can't update the metadata cache
>    major: Dataset
>    minor: Unable to initialize object
>  #010: H5Dint.c line 852 in H5D__update_oh_info(): unable to update
> layout/pline/efl header message
>    major: Dataset
>    minor: Unable to initialize object
>  #011: H5Dlayout.c line 238 in H5D__layout_oh_create(): unable to
> initialize storage
>    major: Dataset
>    minor: Unable to initialize object
>  #012: H5Dint.c line 1713 in H5D__alloc_storage(): unable to
> initialize dataset with fill value
>    major: Dataset
>    minor: Unable to initialize object
>  #013: H5Dint.c line 1805 in H5D__init_storage(): unable to allocate
> all chunks of dataset
>    major: Dataset
>    minor: Unable to initialize object
>  #014: H5Dchunk.c line 3575 in H5D__chunk_allocate(): unable to write
> raw data to file
>    major: Low-level I/O
>    minor: Write failed
>  #015: H5Dchunk.c line 3745 in H5D__chunk_collective_fill(): unable
> to write raw data to file
>    major: Low-level I/O
>    minor: Write failed
>  #016: H5Fio.c line 171 in H5F_block_write(): write through metadata
> accumulator failed
>    major: Low-level I/O
>    minor: Write failed
>  #017: H5Faccum.c line 825 in H5F__accum_write(): file write failed
>    major: Low-level I/O
>    minor: Write failed
>  #018: H5FDint.c line 260 in H5FD_write(): driver write request failed
>    major: Virtual File Layer
>    minor: Write failed
>  #019: H5FDmpio.c line 1846 in H5FD_mpio_write(): MPI_File_write_at_all failed
>    major: Internal error (too specific to document in detail)
>    minor: Some MPI function failed
>  #020: H5FDmpio.c line 1846 in H5FD_mpio_write(): Other I/O error ,
> error stack:
> ADIOI_NFS_WRITESTRIDED(672): Other I/O error File too large
>    major: Internal error (too specific to document in detail)
>    minor: MPI Error String
>
>
> It basically claims that I am creating a file too large. But I
> verified that the filesystem is capable of handling such a size. In my
> case, the file is around 4 TB when it crashes. Where could this
> problem come from? I thought HDF5 does not have a problem with very
> large files. Plus, I am dividing the file in several datasets, and the
> write operations work perfectly until, at some point, it crashes with
> the errors above.
>
> Could it be an issue with HDF5? Or could it be an MPI limitation? I am
> skeptic about the latter option: at the beginning, the program writes
> several datasets inside the file succesfully (all the datasets being
> the same size). If MPI was to blame, why wouldn't it crash at the
> first write?
>
> Thank you for your help.
> Fred
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: "File too large" error, seemingly related to MPI

Rob Latham
On Mon, 2017-08-07 at 09:14 -0500, Quincey Koziol wrote:
> Hi Frederic,
> Could you give us some more details about your file and the
> call(s) you are making to HDF5?   I can’t think of any reason that it
> would crash when creating a file like this, but something interesting
> could be going on…   :-)


Depending on how new his MPI implementation is, it might not have all
the 64 bit cleanups in the NFS path.

The final error in the trace says "File too large" but what it might
mean is "I/O request too big".

If you write to something that is not NFS, I think you'll find this
problem goes away:

http://press3.mcs.anl.gov/romio/2013/07/03/large-transfers-in-romio/
and
http://press3.mcs.anl.gov/romio/2014/07/11/more-headaches-with-2-gib-io
/

have a bit more information.  I neglected NFS back then and did not
update that driver until earlier this year.

==rob


>
> Quincey
>
>
> > On Aug 7, 2017, at 5:28 AM, Frederic Perez <[hidden email]
> > m> wrote:
> >
> > Hi,
> >
> > While writing significant amount of data in parallel, I obtain the
> > following error stack:
> >
> > HDF5-DIAG: Error detected in HDF5 (1.8.16) MPI-process 66:
> >  #000: H5D.c line 194 in H5Dcreate2(): unable to create dataset
> >    major: Dataset
> >    minor: Unable to initialize object
> >  #001: H5Dint.c line 453 in H5D__create_named(): unable to create
> > and
> > link to dataset
> >    major: Dataset
> >    minor: Unable to initialize object
> >  #002: H5L.c line 1638 in H5L_link_object(): unable to create new
> > link to object
> >    major: Links
> >    minor: Unable to initialize object
> >  #003: H5L.c line 1882 in H5L_create_real(): can't insert link
> >    major: Symbol table
> >    minor: Unable to insert object
> >  #004: H5Gtraverse.c line 861 in H5G_traverse(): internal path
> > traversal failed
> >    major: Symbol table
> >    minor: Object not found
> >  #005: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal
> > operator failed
> >    major: Symbol table
> >    minor: Callback failed
> >  #006: H5L.c line 1685 in H5L_link_cb(): unable to create object
> >    major: Object header
> >    minor: Unable to initialize object
> >  #007: H5O.c line 3016 in H5O_obj_create(): unable to open object
> >    major: Object header
> >    minor: Can't open object
> >  #008: H5Doh.c line 293 in H5O__dset_create(): unable to create
> > dataset
> >    major: Dataset
> >    minor: Unable to initialize object
> >  #009: H5Dint.c line 1060 in H5D__create(): can't update the
> > metadata cache
> >    major: Dataset
> >    minor: Unable to initialize object
> >  #010: H5Dint.c line 852 in H5D__update_oh_info(): unable to update
> > layout/pline/efl header message
> >    major: Dataset
> >    minor: Unable to initialize object
> >  #011: H5Dlayout.c line 238 in H5D__layout_oh_create(): unable to
> > initialize storage
> >    major: Dataset
> >    minor: Unable to initialize object
> >  #012: H5Dint.c line 1713 in H5D__alloc_storage(): unable to
> > initialize dataset with fill value
> >    major: Dataset
> >    minor: Unable to initialize object
> >  #013: H5Dint.c line 1805 in H5D__init_storage(): unable to
> > allocate
> > all chunks of dataset
> >    major: Dataset
> >    minor: Unable to initialize object
> >  #014: H5Dchunk.c line 3575 in H5D__chunk_allocate(): unable to
> > write
> > raw data to file
> >    major: Low-level I/O
> >    minor: Write failed
> >  #015: H5Dchunk.c line 3745 in H5D__chunk_collective_fill(): unable
> > to write raw data to file
> >    major: Low-level I/O
> >    minor: Write failed
> >  #016: H5Fio.c line 171 in H5F_block_write(): write through
> > metadata
> > accumulator failed
> >    major: Low-level I/O
> >    minor: Write failed
> >  #017: H5Faccum.c line 825 in H5F__accum_write(): file write failed
> >    major: Low-level I/O
> >    minor: Write failed
> >  #018: H5FDint.c line 260 in H5FD_write(): driver write request
> > failed
> >    major: Virtual File Layer
> >    minor: Write failed
> >  #019: H5FDmpio.c line 1846 in H5FD_mpio_write():
> > MPI_File_write_at_all failed
> >    major: Internal error (too specific to document in detail)
> >    minor: Some MPI function failed
> >  #020: H5FDmpio.c line 1846 in H5FD_mpio_write(): Other I/O error ,
> > error stack:
> > ADIOI_NFS_WRITESTRIDED(672): Other I/O error File too large
> >    major: Internal error (too specific to document in detail)
> >    minor: MPI Error String
> >
> >
> > It basically claims that I am creating a file too large. But I
> > verified that the filesystem is capable of handling such a size. In
> > my
> > case, the file is around 4 TB when it crashes. Where could this
> > problem come from? I thought HDF5 does not have a problem with very
> > large files. Plus, I am dividing the file in several datasets, and
> > the
> > write operations work perfectly until, at some point, it crashes
> > with
> > the errors above.
> >
> > Could it be an issue with HDF5? Or could it be an MPI limitation? I
> > am
> > skeptic about the latter option: at the beginning, the program
> > writes
> > several datasets inside the file succesfully (all the datasets
> > being
> > the same size). If MPI was to blame, why wouldn't it crash at the
> > first write?
> >
> > Thank you for your help.
> > Fred
> >
> > _______________________________________________
> > Hdf-forum is for HDF software users discussion.
> > [hidden email]
> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup
> > .org
> > Twitter: https://twitter.com/hdf5
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.o
> rg
> Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5