Quantcast

hdf5 crashing when opening and closing files inside a loop

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

hdf5 crashing when opening and closing files inside a loop

guido
Hello,
I am new in hdf5, and I recently encountered a problem. I am using fortran 90 and a fortran 90 hdf5 wrapper (https://github.com/galtay/sphray/tree/master/hdf5_wrapper) to read and write from many hdf5 files. The process of reading, modifying data and writing is done inside a do loop. 
I found that the code runs fine until a certain point when It shows the error message:

 ***Abort HDF5 : Unable to open HDF5 file in open_file()!
  
 file name  : /mnt/su3ctm/ggranda/cp_test2/rep_-1_-1_-2/ivol31/galaxies.hdf5


I have checked out and the file exist, so that is not the problem. I think is something connected with hdf5. 


The code is the following:


    subroutine replications_translation(n_a,nsub,lbox,directory)
    ! subroutine to do the translations along all the the replications
    ! it makes use of the 
    ! n_a: number of replications per axis
    ! nsub: number of subvolumes
    ! lbox: box size
    ! x: x coordinate
    ! y: y coordinate
    ! z: z coordinate
    ! directory: folder that contains the replications
    ! redshift: character that specifies the redshift (e.g. iz200)
        integer, intent(in) :: n_a,nsub
        real, dimension(:),allocatable :: x,y,z,dc,decl,ra
        real, intent(in) :: lbox
        character(*),  intent(in) :: directory
        !character(5),  intent(in) ::redshift
        character(2) :: temp_i,temp_j,temp_k,temp_subv
        integer :: i,j,k,l,ifile,dims(1),rank,count_l
        count_l=0
        do i=-n_a,n_a
            write(temp_i,"(I2)") i
            do j=-n_a,n_a
                write(temp_j,"(I2)") j
                do k=-n_a,n_a
                    write(temp_k,"(I2)") k
                    do l=30,nsub-1

                        write(temp_subv,"(I2)") l
                        call hdf5_open_file(ifile,directory//'rep_'//trim(adjustl(temp_i))//'_'//trim(adjustl(temp_j))//'_'//trim(adjustl(temp_k))//'/ivol'//trim(adjustl(temp_subv))//'/galaxies.hdf5',readonly=.false.)

                        call hdf5_get_dimensions(ifile,'Output001/mhalo',rank,dims)
                        allocate(x(dims(1)),y(dims(1)),z(dims(1)),dc(dims(1)),ra(dims(1)),decl(dims(1)))
                        call hdf5_read_data(ifile,'Output001/xgal',x)
                        call hdf5_read_data(ifile,'Output001/ygal',y)
                        call hdf5_read_data(ifile,'Output001/zgal',z)
                        x   =x+i*lbox
                        y   =y+j*lbox
                        z   =z+k*lbox
                        dc  =sqrt(x**2.0+y**2.0+z**2.0)
                        decl=asin(z/dc)
                        ra  =atan2(y,x)

                        call hdf5_write_data(ifile,'Output001/xgal_t',x,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/xgal_t/Comment','X(lightcone) coordinate of this galaxy [Mpc/h]')

                        call hdf5_write_data(ifile,'Output001/ygal_t',y,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/ygal_t/Comment','Y(lightcone) coordinate of this galaxy [Mpc/h]')

                        call hdf5_write_data(ifile,'Output001/zgal_t',z,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/zgal_t/Comment','Z(lightcone) coordinate of this galaxy [Mpc/h]')

                        call hdf5_write_data(ifile,'Output001/dc',dc,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/dc/Comment','Comoving distance  [Mpc/h]')
                        !print *, "check hdf5"
                        call hdf5_write_data(ifile,'Output001/ra',ra,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/ra/Comment',"Right ascention")

                        call hdf5_write_data(ifile,'Output001/decl',decl,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/decl/Comment',"Declination")

                        call hdf5_close_file(ifile)
                        print *, "Done with "//directory//'rep_'//trim(adjustl(temp_i))//'_'//trim(adjustl(temp_j))//'_'//trim(adjustl(temp_k))//'/ivol'//trim(adjustl(temp_subv))//'/galaxies.hdf5'
                        deallocate(x,y,z,dc,ra,decl)
                        count_l=count_l+1
                        print *, "number =",count_l
                    enddo
               enddo
            enddo
        enddo


Could you please help me with that? The number of files that I need to open, read, write is tremendous. So, I dont know if that is a limitation for hdf5 or if my code is written with not good practices and that is causing the crash.

Thanks in advance,

--
Guido

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: hdf5 crashing when opening and closing files inside a loop

Mark Koennecke
Guido,

Am 26.02.2017 um 17:11 schrieb Guido granda muñoz <[hidden email]>:

Hello,
I am new in hdf5, and I recently encountered a problem. I am using fortran 90 and a fortran 90 hdf5 wrapper (https://github.com/galtay/sphray/tree/master/hdf5_wrapper) to read and write from many hdf5 files. The process of reading, modifying data and writing is done inside a do loop. 
I found that the code runs fine until a certain point when It shows the error message:

 ***Abort HDF5 : Unable to open HDF5 file in open_file()!
  
 file name  : /mnt/su3ctm/ggranda/cp_test2/rep_-1_-1_-2/ivol31/galaxies.hdf5


I have checked out and the file exist, so that is not the problem. I think is something connected with hdf5. 


The code is the following:


    subroutine replications_translation(n_a,nsub,lbox,directory)
    ! subroutine to do the translations along all the the replications
    ! it makes use of the 
    ! n_a: number of replications per axis
    ! nsub: number of subvolumes
    ! lbox: box size
    ! x: x coordinate
    ! y: y coordinate
    ! z: z coordinate
    ! directory: folder that contains the replications
    ! redshift: character that specifies the redshift (e.g. iz200)
        integer, intent(in) :: n_a,nsub
        real, dimension(:),allocatable :: x,y,z,dc,decl,ra
        real, intent(in) :: lbox
        character(*),  intent(in) :: directory
        !character(5),  intent(in) ::redshift
        character(2) :: temp_i,temp_j,temp_k,temp_subv
        integer :: i,j,k,l,ifile,dims(1),rank,count_l
        count_l=0
        do i=-n_a,n_a
            write(temp_i,"(I2)") i
            do j=-n_a,n_a
                write(temp_j,"(I2)") j
                do k=-n_a,n_a
                    write(temp_k,"(I2)") k
                    do l=30,nsub-1

                        write(temp_subv,"(I2)") l
                        call hdf5_open_file(ifile,directory//'rep_'//trim(adjustl(temp_i))//'_'//trim(adjustl(temp_j))//'_'//trim(adjustl(temp_k))//'/ivol'//trim(adjustl(temp_subv))//'/galaxies.hdf5',readonly=.false.)

                        call hdf5_get_dimensions(ifile,'Output001/mhalo',rank,dims)
                        allocate(x(dims(1)),y(dims(1)),z(dims(1)),dc(dims(1)),ra(dims(1)),decl(dims(1)))
                        call hdf5_read_data(ifile,'Output001/xgal',x)
                        call hdf5_read_data(ifile,'Output001/ygal',y)
                        call hdf5_read_data(ifile,'Output001/zgal',z)
                        x   =x+i*lbox
                        y   =y+j*lbox
                        z   =z+k*lbox
                        dc  =sqrt(x**2.0+y**2.0+z**2.0)
                        decl=asin(z/dc)
                        ra  =atan2(y,x)

                        call hdf5_write_data(ifile,'Output001/xgal_t',x,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/xgal_t/Comment','X(lightcone) coordinate of this galaxy [Mpc/h]')

                        call hdf5_write_data(ifile,'Output001/ygal_t',y,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/ygal_t/Comment','Y(lightcone) coordinate of this galaxy [Mpc/h]')

                        call hdf5_write_data(ifile,'Output001/zgal_t',z,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/zgal_t/Comment','Z(lightcone) coordinate of this galaxy [Mpc/h]')

                        call hdf5_write_data(ifile,'Output001/dc',dc,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/dc/Comment','Comoving distance  [Mpc/h]')
                        !print *, "check hdf5"
                        call hdf5_write_data(ifile,'Output001/ra',ra,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/ra/Comment',"Right ascention")

                        call hdf5_write_data(ifile,'Output001/decl',decl,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/decl/Comment',"Declination")

                        call hdf5_close_file(ifile)
                        print *, "Done with "//directory//'rep_'//trim(adjustl(temp_i))//'_'//trim(adjustl(temp_j))//'_'//trim(adjustl(temp_k))//'/ivol'//trim(adjustl(temp_subv))//'/galaxies.hdf5'
                        deallocate(x,y,z,dc,ra,decl)
                        count_l=count_l+1
                        print *, "number =",count_l
                    enddo
               enddo
            enddo
        enddo


Could you please help me with that? The number of files that I need to open, read, write is tremendous. So, I dont know if that is a limitation for hdf5 or if my code is written with not good practices and that is causing the crash.

Thanks in advance,



Have you checked for resource leakage? HDF5 is special in that way that a close of a file does not necessarily release all handles 
associated with the file. So, if you leave a handle to a dataset, attribute, data type, etc dangling for each file, after some time 
space is exhausted and HDF5 will crash on you in a weird way. Been there, done that. 

As you have not provided all your subroutines I cannot see if this is the case with your code.

Regards,

     Mark Koennecke


--
Guido
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: hdf5 crashing when opening and closing files inside a loop

Scot Breitenfeld
In reply to this post by guido
Out of curiosity, do you have the same issue if you comment out everything except the open and close routine?

Also, I’m not 100% sure why you are choosing this file layout/looping scheme. It seems that you are creating a directory structure for different files/datasets. Can you move this directory/file structure to inside the HDF5 file using groups? That way you can eliminate this file open/close in the middle of the nested loop. This is not just an HDF5 issue either; I don’t think you would want to do it this way for POSIX writes either if you want to get good I/O performance.

Scot


On Feb 26, 2017, at 10:11 AM, Guido granda muñoz <[hidden email]> wrote:

Hello,
I am new in hdf5, and I recently encountered a problem. I am using fortran 90 and a fortran 90 hdf5 wrapper (https://github.com/galtay/sphray/tree/master/hdf5_wrapper) to read and write from many hdf5 files. The process of reading, modifying data and writing is done inside a do loop. 
I found that the code runs fine until a certain point when It shows the error message:

 ***Abort HDF5 : Unable to open HDF5 file in open_file()!
  
 file name  : /mnt/su3ctm/ggranda/cp_test2/rep_-1_-1_-2/ivol31/galaxies.hdf5


I have checked out and the file exist, so that is not the problem. I think is something connected with hdf5. 


The code is the following:


    subroutine replications_translation(n_a,nsub,lbox,directory)
    ! subroutine to do the translations along all the the replications
    ! it makes use of the 
    ! n_a: number of replications per axis
    ! nsub: number of subvolumes
    ! lbox: box size
    ! x: x coordinate
    ! y: y coordinate
    ! z: z coordinate
    ! directory: folder that contains the replications
    ! redshift: character that specifies the redshift (e.g. iz200)
        integer, intent(in) :: n_a,nsub
        real, dimension(:),allocatable :: x,y,z,dc,decl,ra
        real, intent(in) :: lbox
        character(*),  intent(in) :: directory
        !character(5),  intent(in) ::redshift
        character(2) :: temp_i,temp_j,temp_k,temp_subv
        integer :: i,j,k,l,ifile,dims(1),rank,count_l
        count_l=0
        do i=-n_a,n_a
            write(temp_i,"(I2)") i
            do j=-n_a,n_a
                write(temp_j,"(I2)") j
                do k=-n_a,n_a
                    write(temp_k,"(I2)") k
                    do l=30,nsub-1

                        write(temp_subv,"(I2)") l
                        call hdf5_open_file(ifile,directory//'rep_'//trim(adjustl(temp_i))//'_'//trim(adjustl(temp_j))//'_'//trim(adjustl(temp_k))//'/ivol'//trim(adjustl(temp_subv))//'/galaxies.hdf5',readonly=.false.)

                        call hdf5_get_dimensions(ifile,'Output001/mhalo',rank,dims)
                        allocate(x(dims(1)),y(dims(1)),z(dims(1)),dc(dims(1)),ra(dims(1)),decl(dims(1)))
                        call hdf5_read_data(ifile,'Output001/xgal',x)
                        call hdf5_read_data(ifile,'Output001/ygal',y)
                        call hdf5_read_data(ifile,'Output001/zgal',z)
                        x   =x+i*lbox
                        y   =y+j*lbox
                        z   =z+k*lbox
                        dc  =sqrt(x**2.0+y**2.0+z**2.0)
                        decl=asin(z/dc)
                        ra  =atan2(y,x)

                        call hdf5_write_data(ifile,'Output001/xgal_t',x,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/xgal_t/Comment','X(lightcone) coordinate of this galaxy [Mpc/h]')

                        call hdf5_write_data(ifile,'Output001/ygal_t',y,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/ygal_t/Comment','Y(lightcone) coordinate of this galaxy [Mpc/h]')

                        call hdf5_write_data(ifile,'Output001/zgal_t',z,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/zgal_t/Comment','Z(lightcone) coordinate of this galaxy [Mpc/h]')

                        call hdf5_write_data(ifile,'Output001/dc',dc,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/dc/Comment','Comoving distance  [Mpc/h]')
                        !print *, "check hdf5"
                        call hdf5_write_data(ifile,'Output001/ra',ra,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/ra/Comment',"Right ascention")

                        call hdf5_write_data(ifile,'Output001/decl',decl,overwrite=.true.)
                        call hdf5_write_attribute(ifile,'Output001/decl/Comment',"Declination")

                        call hdf5_close_file(ifile)
                        print *, "Done with "//directory//'rep_'//trim(adjustl(temp_i))//'_'//trim(adjustl(temp_j))//'_'//trim(adjustl(temp_k))//'/ivol'//trim(adjustl(temp_subv))//'/galaxies.hdf5'
                        deallocate(x,y,z,dc,ra,decl)
                        count_l=count_l+1
                        print *, "number =",count_l
                    enddo
               enddo
            enddo
        enddo


Could you please help me with that? The number of files that I need to open, read, write is tremendous. So, I dont know if that is a limitation for hdf5 or if my code is written with not good practices and that is causing the crash.

Thanks in advance,

--
Guido
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Loading...