fletcher32 checksum error with szip compression and 64bit data

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

fletcher32 checksum error with szip compression and 64bit data

Paul Müller
Dear All,

I am using h5py to store 2D and 3D data sets. When combining the
fletcher32 checksum with szip compression and 64 bit data (float, int
numpy arrays), I get a checksum error when I try to read the data.

I created an issue with h5py, but it seems like this could be a bug in HDF5:

https://github.com/h5py/h5py/issues/953

Here is the example code:

--------------
import h5py
import numpy as np

with h5py.File("test.h5", "w") as h5:
    h5.create_dataset("image_A",
                      data=np.zeros(10000, dtype=np.float64),
                      fletcher32=True,
                      compression="szip",
                      )

with h5py.File("test.h5", "r") as h5:
    print(h5["image_A"][0])
--------------

and here is the error message:

--------------
Traceback (most recent call last):
  File "error.py", line 12, in <module>
    print(h5["image_A"][0])
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)
  File "/usr/lib/python3/dist-packages/h5py/_hl/dataset.py", line 482,
in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)
  File "h5py/h5d.pyx", line 181, in h5py.h5d.DatasetID.read
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/h5d.c:3123)
  File "h5py/_proxy.pyx", line 130, in h5py._proxy.dset_rw
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_proxy.c:1769)
  File "h5py/_proxy.pyx", line 84, in h5py._proxy.H5PY_H5Dread
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_proxy.c:1411)
OSError: Can't read data (Data error detected by fletcher32 checksum)
--------------

and this is my setup:

 * Operating System: Ubuntu 16.04.3 LTS
 * Python versions: 2.7.12 and 3.5.2
 * Where Python was acquired: system Python (apt-get)
 * h5py version: 2.7.1
 * HDF5 version: 1.8.18
 * The full traceback/stack trace shown (Python 3):


I was not able to find anything in previous posts in the mailing list
archive.

I would really like to use szip compression, because the resulting file
sizes are considerably smaller.

Thanks for your help.

Regards,
Paul

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

0xF034E8E3.asc (3K) Download Attachment
signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: fletcher32 checksum error with szip compression and 64bit data

Elena Pourmal
Hi Paul,

I believe you need to specify szip first and then fletcher32. I just tried a C example (modified https://support.hdfgroup.org/ftp/HDF5/examples/examples-by-api/hdf5-examples/1_10/C/H5D/h5ex_d_szip.c)  with HDF5 1.10.1 and it works when szip is the first filter in a filter pipeline. I am getting an error when filters are reversed. 

HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 0:

  #000: /home/hdftest/snapshots-hdf5_1_10_1/current/src/H5Dio.c line 171 in H5Dread(): can't read data

    major: Dataset

    minor: Read failed

  #001: /home/hdftest/snapshots-hdf5_1_10_1/current/src/H5Dio.c line 544 in H5D__read(): can't read data

    major: Dataset

    minor: Read failed

  #002: /home/hdftest/snapshots-hdf5_1_10_1/current/src/H5Dchunk.c line 2050 in H5D__chunk_read(): unable to read raw data chunk

    major: Low-level I/O

    minor: Read failed

  #003: /home/hdftest/snapshots-hdf5_1_10_1/current/src/H5Dchunk.c line 3405 in H5D__chunk_lock(): data pipeline read failed

    major: Data filters

    minor: Filter operation failed

  #004: /home/hdftest/snapshots-hdf5_1_10_1/current/src/H5Z.c line 1374 in H5Z_pipeline(): filter returned failure during read

    major: Data filters

    minor: Read failed

  #005: /home/hdftest/snapshots-hdf5_1_10_1/current/src/H5Zszip.c line 324 in H5Z_filter_szip(): szip_filter: decompression failed

    major: Resource unavailable

    minor: No space available for allocation

And it makes sense, because your buffer has 4 bytes with checksum in fog and szip is confused.

I am attaching the C example that failed. If you comment out the first call to H5Pset_fletcher32 and uncomment the second one, the program will work.

Hope it will be an easy fix for your Python script.

Thank you!

Elena



On Dec 13, 2017, at 10:55 AM, Paul Müller <[hidden email]> wrote:

Dear All,

I am using h5py to store 2D and 3D data sets. When combining the
fletcher32 checksum with szip compression and 64 bit data (float, int
numpy arrays), I get a checksum error when I try to read the data.

I created an issue with h5py, but it seems like this could be a bug in HDF5:

https://github.com/h5py/h5py/issues/953

Here is the example code:

--------------
import h5py
import numpy as np

with h5py.File("test.h5", "w") as h5:
   h5.create_dataset("image_A",
                     data=np.zeros(10000, dtype=np.float64),
                     fletcher32=True,
                     compression="szip",
                     )

with h5py.File("test.h5", "r") as h5:
   print(h5["image_A"][0])
--------------

and here is the error message:

--------------
Traceback (most recent call last):
 File "error.py", line 12, in <module>
   print(h5["image_A"][0])
 File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)
 File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)
 File "/usr/lib/python3/dist-packages/h5py/_hl/dataset.py", line 482,
in __getitem__
   self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
 File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2577)
 File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_objects.c:2536)
 File "h5py/h5d.pyx", line 181, in h5py.h5d.DatasetID.read
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/h5d.c:3123)
 File "h5py/_proxy.pyx", line 130, in h5py._proxy.dset_rw
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_proxy.c:1769)
 File "h5py/_proxy.pyx", line 84, in h5py._proxy.H5PY_H5Dread
(/build/h5py-nQFNYZ/h5py-2.6.0/h5py/_proxy.c:1411)
OSError: Can't read data (Data error detected by fletcher32 checksum)
--------------

and this is my setup:

* Operating System: Ubuntu 16.04.3 LTS
* Python versions: 2.7.12 and 3.5.2
* Where Python was acquired: system Python (apt-get)
* h5py version: 2.7.1
* HDF5 version: 1.8.18
* The full traceback/stack trace shown (Python 3):


I was not able to find anything in previous posts in the mailing list
archive.

I would really like to use szip compression, because the resulting file
sizes are considerably smaller.

Thanks for your help.

Regards,
Paul
<0xF034E8E3.asc>_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

szip_example.c (7K) Download Attachment