New HDF5 compression plugin

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

New HDF5 compression plugin

Miller, Mark C.
Hi All,

Just wanted to mention a new HDF5 floating point compression plugin available on github...


This plugin will come embedded in the next release of the Silo library as well.

-- 
Mark C. Miller, LLNL

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Nelson, Jarom

Nice. Thanks!

 

From: Hdf-forum [mailto:[hidden email]] On Behalf Of Miller, Mark C.
Sent: Thursday, October 27, 2016 4:53 PM
To: HDF Users Discussion List
Subject: [Hdf-forum] New HDF5 compression plugin

 

Hi All,

 

Just wanted to mention a new HDF5 floating point compression plugin available on github...

 

 

This plugin will come embedded in the next release of the Silo library as well.

 

-- 

Mark C. Miller, LLNL


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Elvis Stansvik
In reply to this post by Miller, Mark C.
2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[hidden email]>:
> Hi All,
>
> Just wanted to mention a new HDF5 floating point compression plugin
> available on github...
>
> https://github.com/LLNL/H5Z-ZFP
>
> This plugin will come embedded in the next release of the Silo library as
> well.

Thanks for the pointer. That's very interesting. I had not heard about
ZFP before. The ability to set a bound on the error in the lossless
case seems very useful.

Do you know if there has been any comparative benchmarks of ZFP
against other compressors?

After some basic benchmarking, we recently settled on Blosc_LZ4HC at
level 4 for our datasets (3D float tomography data), but maybe it
would be worthwhile to look at ZFP as well..

Best regards,
Elvis

>
> --
> Mark C. Miller, LLNL
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Peter Steinbach
I second this request big time and would add zstd, if we are already
trying out various encoders. ;)

P

On 10/28/2016 01:12 PM, Elvis Stansvik wrote:

> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[hidden email]>:
>> Hi All,
>>
>> Just wanted to mention a new HDF5 floating point compression plugin
>> available on github...
>>
>> https://github.com/LLNL/H5Z-ZFP
>>
>> This plugin will come embedded in the next release of the Silo library as
>> well.
>
> Thanks for the pointer. That's very interesting. I had not heard about
> ZFP before. The ability to set a bound on the error in the lossless
> case seems very useful.
>
> Do you know if there has been any comparative benchmarks of ZFP
> against other compressors?
>
> After some basic benchmarking, we recently settled on Blosc_LZ4HC at
> level 4 for our datasets (3D float tomography data), but maybe it
> would be worthwhile to look at ZFP as well..
>
> Best regards,
> Elvis
>
>>
>> --
>> Mark C. Miller, LLNL
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Elvis Stansvik
In reply to this post by Elvis Stansvik
2016-10-28 13:12 GMT+02:00 Elvis Stansvik <[hidden email]>:

> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[hidden email]>:
>> Hi All,
>>
>> Just wanted to mention a new HDF5 floating point compression plugin
>> available on github...
>>
>> https://github.com/LLNL/H5Z-ZFP
>>
>> This plugin will come embedded in the next release of the Silo library as
>> well.
>
> Thanks for the pointer. That's very interesting. I had not heard about
> ZFP before. The ability to set a bound on the error in the lossless
> case seems very useful.

Here I meant the lossy case of course.. :)

Elvis

>
> Do you know if there has been any comparative benchmarks of ZFP
> against other compressors?
>
> After some basic benchmarking, we recently settled on Blosc_LZ4HC at
> level 4 for our datasets (3D float tomography data), but maybe it
> would be worthwhile to look at ZFP as well..
>
> Best regards,
> Elvis
>
>>
>> --
>> Mark C. Miller, LLNL
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Elvis Stansvik
In reply to this post by Peter Steinbach
2016-10-28 13:23 GMT+02:00 Peter Steinbach <[hidden email]>:
> I second this request big time and would add zstd, if we are already trying
> out various encoders. ;)

This may not be of interest, and does not include zstd, but I'm
attaching an excerpt from some of the results I got when back when
doing our basic benchmarking of some algorithms (all lossless).

It was based on those that we settled on Blosc_LZ4HC at level 4, since
we were looking for very fast decompression times, while longer
compression times and slightly larger file size was acceptable up to
certain points. The gzip results are included mostly because that's
what we were using at the time and I wanted them as a comparison, but
we knew we wanted something else. The input for those benchmarks was a
500x300x300 float dataset containing a tomographic 3D image.

I might try to dig up the script I used for the benchmark and see if
we still have the input I used, and do a test with lossy ZFP. It could
be very interesting for creating 3D "thumbnails" in our application.

Elvis

>
> P
>
>
> On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>>
>> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[hidden email]>:
>>>
>>> Hi All,
>>>
>>> Just wanted to mention a new HDF5 floating point compression plugin
>>> available on github...
>>>
>>> https://github.com/LLNL/H5Z-ZFP
>>>
>>> This plugin will come embedded in the next release of the Silo library as
>>> well.
>>
>>
>> Thanks for the pointer. That's very interesting. I had not heard about
>> ZFP before. The ability to set a bound on the error in the lossless
>> case seems very useful.
>>
>> Do you know if there has been any comparative benchmarks of ZFP
>> against other compressors?
>>
>> After some basic benchmarking, we recently settled on Blosc_LZ4HC at
>> level 4 for our datasets (3D float tomography data), but maybe it
>> would be worthwhile to look at ZFP as well..
>>
>> Best regards,
>> Elvis
>>
>>>
>>> --
>>> Mark C. Miller, LLNL
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [hidden email]
>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> Twitter: https://twitter.com/hdf5
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

compression_benchmarks.png (73K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Andrey Paramonov
In reply to this post by Miller, Mark C.
28.10.2016 2:53, Miller, Mark C. пишет:
> Hi All,
>
> Just wanted to mention a new HDF5 floating point compression plugin
> available on github...
>
> https://github.com/LLNL/H5Z-ZFP
>
> This plugin will come embedded in the next release of the Silo library
> as well.

Hello Mark!

I've downloaded library from
https://github.com/LLNL/zfp
and even managed to compile it under Windows.
I'd like now to compare it on my data against other options (e.g.
http://freearc.org/) but I couldn't grok how to specify lossless mode
for zfp. What are the parameters?

Best wishes,
Andrey Paramonov


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Peter Steinbach
In reply to this post by Elvis Stansvik
Hi Elvis,

interesting I am mostly looking into 3D optical tomography images (which
exclusively use voxels represented by integers).

>
> This may not be of interest, and does not include zstd, but I'm
> attaching an excerpt from some of the results I got when back when
> doing our basic benchmarking of some algorithms (all lossless).

We've seen a rough factor of (2.+/-0.5) with lz4 r131 in compression as
well with unfiltered data. In my cases we are mostly interested in high
compression bandwidth and high compression ratio. lz4 so far gives
compression bandwidths up to 1GB/s depending on the quality aspired (of
course the compression ratios tend to be lower then).

>
> It was based on those that we settled on Blosc_LZ4HC at level 4, since
> we were looking for very fast decompression times, while longer
> compression times and slightly larger file size was acceptable up to
> certain points. The gzip results are included mostly because that's
> what we were using at the time and I wanted them as a comparison, but
> we knew we wanted something else. The input for those benchmarks was a
> 500x300x300 float dataset containing a tomographic 3D image.

to be honest, I am still surprised that hdf5 doesn't contain these
state-of-the-art encoders, but rather ships bzip2 et al. which are
painfully slow and don't make any account of computer architectures (lz4
is cache aware AFAIK). But hey, coming up with a hdf5 compressor is
straight forward after one wrangled with the docs. I just don't know how
contributing to hdf5 works.

>
> I might try to dig up the script I used for the benchmark and see if
> we still have the input I used, and do a test with lossy ZFP. It could
> be very interesting for creating 3D "thumbnails" in our application.

indeed, that would be interesting to see.
Best,
Peter

>
> Elvis
>
>>
>> P
>>
>>
>> On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>>>
>>> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[hidden email]>:
>>>>
>>>> Hi All,
>>>>
>>>> Just wanted to mention a new HDF5 floating point compression plugin
>>>> available on github...
>>>>
>>>> https://github.com/LLNL/H5Z-ZFP
>>>>
>>>> This plugin will come embedded in the next release of the Silo library as
>>>> well.
>>>
>>>
>>> Thanks for the pointer. That's very interesting. I had not heard about
>>> ZFP before. The ability to set a bound on the error in the lossless
>>> case seems very useful.
>>>
>>> Do you know if there has been any comparative benchmarks of ZFP
>>> against other compressors?
>>>
>>> After some basic benchmarking, we recently settled on Blosc_LZ4HC at
>>> level 4 for our datasets (3D float tomography data), but maybe it
>>> would be worthwhile to look at ZFP as well..
>>>
>>> Best regards,
>>> Elvis
>>>
>>>>
>>>> --
>>>> Mark C. Miller, LLNL
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> [hidden email]
>>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>>> Twitter: https://twitter.com/hdf5
>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [hidden email]
>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> Twitter: https://twitter.com/hdf5
>>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Elvis Stansvik
2016-10-28 14:17 GMT+02:00 Peter Steinbach <[hidden email]>:

> Hi Elvis,
>
> interesting I am mostly looking into 3D optical tomography images (which
> exclusively use voxels represented by integers).
>
>>
>> This may not be of interest, and does not include zstd, but I'm
>> attaching an excerpt from some of the results I got when back when
>> doing our basic benchmarking of some algorithms (all lossless).
>
>
> We've seen a rough factor of (2.+/-0.5) with lz4 r131 in compression as well
> with unfiltered data. In my cases we are mostly interested in high
> compression bandwidth and high compression ratio. lz4 so far gives
> compression bandwidths up to 1GB/s depending on the quality aspired (of
> course the compression ratios tend to be lower then).
>
>>
>> It was based on those that we settled on Blosc_LZ4HC at level 4, since
>> we were looking for very fast decompression times, while longer
>> compression times and slightly larger file size was acceptable up to
>> certain points. The gzip results are included mostly because that's
>> what we were using at the time and I wanted them as a comparison, but
>> we knew we wanted something else. The input for those benchmarks was a
>> 500x300x300 float dataset containing a tomographic 3D image.
>
>
> to be honest, I am still surprised that hdf5 doesn't contain these
> state-of-the-art encoders, but rather ships bzip2 et al. which are painfully
> slow and don't make any account of computer architectures (lz4 is cache
> aware AFAIK). But hey, coming up with a hdf5 compressor is straight forward
> after one wrangled with the docs. I just don't know how contributing to hdf5
> works.

Yea me too, but I believe the HDF5 group has as a goal to open up the
development a bit more, which would be very welcome. So lets hope for
that.

Elvis

>
>>
>> I might try to dig up the script I used for the benchmark and see if
>> we still have the input I used, and do a test with lossy ZFP. It could
>> be very interesting for creating 3D "thumbnails" in our application.
>
>
> indeed, that would be interesting to see.
> Best,
> Peter
>
>
>>
>> Elvis
>>
>>>
>>> P
>>>
>>>
>>> On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>>>>
>>>>
>>>> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[hidden email]>:
>>>>>
>>>>>
>>>>> Hi All,
>>>>>
>>>>> Just wanted to mention a new HDF5 floating point compression plugin
>>>>> available on github...
>>>>>
>>>>> https://github.com/LLNL/H5Z-ZFP
>>>>>
>>>>> This plugin will come embedded in the next release of the Silo library
>>>>> as
>>>>> well.
>>>>
>>>>
>>>>
>>>> Thanks for the pointer. That's very interesting. I had not heard about
>>>> ZFP before. The ability to set a bound on the error in the lossless
>>>> case seems very useful.
>>>>
>>>> Do you know if there has been any comparative benchmarks of ZFP
>>>> against other compressors?
>>>>
>>>> After some basic benchmarking, we recently settled on Blosc_LZ4HC at
>>>> level 4 for our datasets (3D float tomography data), but maybe it
>>>> would be worthwhile to look at ZFP as well..
>>>>
>>>> Best regards,
>>>> Elvis
>>>>
>>>>>
>>>>> --
>>>>> Mark C. Miller, LLNL
>>>>>
>>>>> _______________________________________________
>>>>> Hdf-forum is for HDF software users discussion.
>>>>> [hidden email]
>>>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>>>> Twitter: https://twitter.com/hdf5
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> [hidden email]
>>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>>> Twitter: https://twitter.com/hdf5
>>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [hidden email]
>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> Twitter: https://twitter.com/hdf5
>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [hidden email]
>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> Twitter: https://twitter.com/hdf5
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Elvis Stansvik
In reply to this post by Andrey Paramonov
2016-10-28 14:11 GMT+02:00 Андрей Парамонов <[hidden email]>:

> 28.10.2016 2:53, Miller, Mark C. пишет:
>>
>> Hi All,
>>
>> Just wanted to mention a new HDF5 floating point compression plugin
>> available on github...
>>
>> https://github.com/LLNL/H5Z-ZFP
>>
>> This plugin will come embedded in the next release of the Silo library
>> as well.
>
>
> Hello Mark!
>
> I've downloaded library from
> https://github.com/LLNL/zfp
> and even managed to compile it under Windows.
> I'd like now to compare it on my data against other options (e.g.
> http://freearc.org/) but I couldn't grok how to specify lossless mode for
> zfp. What are the parameters?

From what I understand, zfp is always lossy (but the error can be
bounded in various ways). fpzip seems to be the lossless variant, but
this filter plugin is for zfp.

The parameters for the zfp filter plugin seems to be documented in
README_MORE in the GitHub repo:

    https://github.com/LLNL/H5Z-ZFP/blob/master/README_MORE

Elvis

>
> Best wishes,
> Andrey Paramonov
>
>
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Francesc Alted-3
In reply to this post by Elvis Stansvik
2016-10-28 13:59 GMT+02:00 Elvis Stansvik <[hidden email]>:
2016-10-28 13:23 GMT+02:00 Peter Steinbach <[hidden email]>:
> I second this request big time and would add zstd, if we are already trying
> out various encoders. ;)

This may not be of interest, and does not include zstd, but I'm
attaching an excerpt from some of the results I got when back when
doing our basic benchmarking of some algorithms (all lossless).

It was based on those that we settled on Blosc_LZ4HC at level 4, since
we were looking for very fast decompression times, while longer
compression times and slightly larger file size was acceptable up to
certain points. The gzip results are included mostly because that's
what we were using at the time and I wanted them as a comparison, but
we knew we wanted something else. The input for those benchmarks was a
500x300x300 float dataset containing a tomographic 3D image.

Zstd was included in Blosc a while ago:

http://blosc.org/blog/zstd-has-just-landed-in-blosc.html

and its performance really shines, even on real data:

http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html
 
(although here, being only integers of 1 byte, only the BITSHUFFLE filter is used, but not the faster SHUFFLE).

As Blosc offers the same API for a number of codecs, trying it in combination with Zstd should be really easy.


I might try to dig up the script I used for the benchmark and see if
we still have the input I used, and do a test with lossy ZFP. It could
be very interesting for creating 3D "thumbnails" in our application.

It would be nice if your benchmark code (and dataset) can be made publicly available so as to serve to others as a good comparison.
 

Elvis

>
> P
>
>
> On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>>
>> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[hidden email]>:
>>>
>>> Hi All,
>>>
>>> Just wanted to mention a new HDF5 floating point compression plugin
>>> available on github...
>>>
>>> https://github.com/LLNL/H5Z-ZFP
>>>
>>> This plugin will come embedded in the next release of the Silo library as
>>> well.
>>
>>
>> Thanks for the pointer. That's very interesting. I had not heard about
>> ZFP before. The ability to set a bound on the error in the lossless
>> case seems very useful.
>>
>> Do you know if there has been any comparative benchmarks of ZFP
>> against other compressors?
>>
>> After some basic benchmarking, we recently settled on Blosc_LZ4HC at
>> level 4 for our datasets (3D float tomography data), but maybe it
>> would be worthwhile to look at ZFP as well..
>>
>> Best regards,
>> Elvis
>>
>>>
>>> --
>>> Mark C. Miller, LLNL
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [hidden email]
>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> Twitter: https://twitter.com/hdf5
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5



--
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Miller, Mark C.
In reply to this post by Elvis Stansvik


From: Hdf-forum <[hidden email]> on behalf of Elvis Stansvik <[hidden email]>
Reply-To: HDF Users Discussion List <[hidden email]>
Date: Friday, October 28, 2016 at 4:12 AM
To: HDF Users Discussion List <[hidden email]>
Subject: Re: [Hdf-forum] New HDF5 compression plugin


Do you know if there has been any comparative benchmarks of ZFP
against other compressors?

Yes, see here...

http://computation.llnl.gov/projects/floating-point-compression/zfp-compression-ratio-and-quality



_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Miller, Mark C.
In reply to this post by Andrey Paramonov
From: Hdf-forum <[hidden email]> on behalf of Андрей Парамонов <[hidden email]>
Reply-To: HDF Users Discussion List <[hidden email]>
Date: Friday, October 28, 2016 at 5:11 AM
To: "[hidden email]" <[hidden email]>
Subject: Re: [Hdf-forum] New HDF5 compression plugin


I've downloaded library from
and even managed to compile it under Windows.
I'd like now to compare it on my data against other options (e.g.
http://freearc.org/) but I couldn't grok how to specify lossless mode
for zfp. What are the parameters?

Have a look at test_write.c test example. It demonstrates all 4 modes.

The ZFP library's compression controls are described in the ZFP release notes, here...


The filter's default behavior, if you specify nelmts==0, is best quality (least loss).

If you still have questions about using it after reviewing the above refs, please let me know.

Mark


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Miller, Mark C.
In reply to this post by Peter Steinbach

to be honest, I am still surprised that hdf5 doesn't contain these
state-of-the-art encoders, but rather ships bzip2 et al. which are
painfully slow and don't make any account of computer architectures (lz4
is cache aware AFAIK). But hey, coming up with a hdf5 compressor is
straight forward after one wrangled with the docs. I just don't know how
contributing to hdf5 works.

FWIW, I think the whole point of the plugin design in HDF5 is to *enable* the community to develop
and support what can potentially be a large variety of complex filters. 

I don't think any one team, focused on the core library support could possibly have the resources to
also support a wide variety of compression filters.

The key thing The HDF5 Group *is* doing is managing the filter ids and managing information about
the filters, here...


Mark



_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Elvis Stansvik
In reply to this post by Francesc Alted-3
2016-10-28 16:33 GMT+02:00 Francesc Alted <[hidden email]>:

> 2016-10-28 13:59 GMT+02:00 Elvis Stansvik <[hidden email]>:
>>
>> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <[hidden email]>:
>> > I second this request big time and would add zstd, if we are already
>> > trying
>> > out various encoders. ;)
>>
>> This may not be of interest, and does not include zstd, but I'm
>> attaching an excerpt from some of the results I got when back when
>> doing our basic benchmarking of some algorithms (all lossless).
>>
>> It was based on those that we settled on Blosc_LZ4HC at level 4, since
>> we were looking for very fast decompression times, while longer
>> compression times and slightly larger file size was acceptable up to
>> certain points. The gzip results are included mostly because that's
>> what we were using at the time and I wanted them as a comparison, but
>> we knew we wanted something else. The input for those benchmarks was a
>> 500x300x300 float dataset containing a tomographic 3D image.
>
>
> Zstd was included in Blosc a while ago:
>
> http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>
> and its performance really shines, even on real data:
>
> http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html
>
> (although here, being only integers of 1 byte, only the BITSHUFFLE filter is
> used, but not the faster SHUFFLE).
>
> As Blosc offers the same API for a number of codecs, trying it in
> combination with Zstd should be really easy.
Zstd indeed looks very well-balanced. The reason I didn't include it
back when I did those benchmarks was that we were really focused on
decompression speed in our application, compression speed was very
much secondary. So I included mostly LZ4 codecs.

>
>>
>> I might try to dig up the script I used for the benchmark and see if
>> we still have the input I used, and do a test with lossy ZFP. It could
>> be very interesting for creating 3D "thumbnails" in our application.
>
>
> It would be nice if your benchmark code (and dataset) can be made publicly
> available so as to serve to others as a good comparison.

The dataset is unfortunately confidential and not something I can
release. I'm attaching the script I used though, it's very simple.

But, a disclaimer: The benchmarks I did were not really thorough. They
were also internal and never really meant to be published. It was
mostly a quick and dirty test to see which of these LZ4 codecs would
be in the right ballpark for us.

Elvis

>
>>
>>
>> Elvis
>>
>> >
>> > P
>> >
>> >
>> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >>
>> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[hidden email]>:
>> >>>
>> >>> Hi All,
>> >>>
>> >>> Just wanted to mention a new HDF5 floating point compression plugin
>> >>> available on github...
>> >>>
>> >>> https://github.com/LLNL/H5Z-ZFP
>> >>>
>> >>> This plugin will come embedded in the next release of the Silo library
>> >>> as
>> >>> well.
>> >>
>> >>
>> >> Thanks for the pointer. That's very interesting. I had not heard about
>> >> ZFP before. The ability to set a bound on the error in the lossless
>> >> case seems very useful.
>> >>
>> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> against other compressors?
>> >>
>> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC at
>> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> would be worthwhile to look at ZFP as well..
>> >>
>> >> Best regards,
>> >> Elvis
>> >>
>> >>>
>> >>> --
>> >>> Mark C. Miller, LLNL
>> >>>
>> >>> _______________________________________________
>> >>> Hdf-forum is for HDF software users discussion.
>> >>> [hidden email]
>> >>>
>> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >>> Twitter: https://twitter.com/hdf5
>> >>
>> >>
>> >> _______________________________________________
>> >> Hdf-forum is for HDF software users discussion.
>> >> [hidden email]
>> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> Twitter: https://twitter.com/hdf5
>> >>
>> >
>> > _______________________________________________
>> > Hdf-forum is for HDF software users discussion.
>> > [hidden email]
>> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> > Twitter: https://twitter.com/hdf5
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [hidden email]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>
>
>
>
> --
> Francesc Alted
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

compression-benchmark.py (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Elvis Stansvik
In reply to this post by Miller, Mark C.
2016-10-28 17:07 GMT+02:00 Miller, Mark C. <[hidden email]>:

>
>
> From: Hdf-forum <[hidden email]> on behalf of Elvis
> Stansvik <[hidden email]>
> Reply-To: HDF Users Discussion List <[hidden email]>
> Date: Friday, October 28, 2016 at 4:12 AM
> To: HDF Users Discussion List <[hidden email]>
> Subject: Re: [Hdf-forum] New HDF5 compression plugin
>
>
> Do you know if there has been any comparative benchmarks of ZFP
> against other compressors?
>
>
> Yes, see here...
>
> http://computation.llnl.gov/projects/floating-point-compression/zfp-compression-ratio-and-quality

Thanks! Should have found that one myself.

Elvis

>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [hidden email]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Elvis Stansvik
In reply to this post by Elvis Stansvik
2016-10-28 17:53 GMT+02:00 Francesc Alted <[hidden email]>:

>
>
> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik <[hidden email]>:
>>
>> 2016-10-28 16:33 GMT+02:00 Francesc Alted <[hidden email]>:
>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
>> > <[hidden email]>:
>> >>
>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <[hidden email]>:
>> >> > I second this request big time and would add zstd, if we are already
>> >> > trying
>> >> > out various encoders. ;)
>> >>
>> >> This may not be of interest, and does not include zstd, but I'm
>> >> attaching an excerpt from some of the results I got when back when
>> >> doing our basic benchmarking of some algorithms (all lossless).
>> >>
>> >> It was based on those that we settled on Blosc_LZ4HC at level 4, since
>> >> we were looking for very fast decompression times, while longer
>> >> compression times and slightly larger file size was acceptable up to
>> >> certain points. The gzip results are included mostly because that's
>> >> what we were using at the time and I wanted them as a comparison, but
>> >> we knew we wanted something else. The input for those benchmarks was a
>> >> 500x300x300 float dataset containing a tomographic 3D image.
>> >
>> >
>> > Zstd was included in Blosc a while ago:
>> >
>> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>> >
>> > and its performance really shines, even on real data:
>> >
>> >
>> > http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html
>> >
>> > (although here, being only integers of 1 byte, only the BITSHUFFLE
>> > filter is
>> > used, but not the faster SHUFFLE).
>> >
>> > As Blosc offers the same API for a number of codecs, trying it in
>> > combination with Zstd should be really easy.
>>
>> Zstd indeed looks very well-balanced. The reason I didn't include it
>> back when I did those benchmarks was that we were really focused on
>> decompression speed in our application, compression speed was very
>> much secondary. So I included mostly LZ4 codecs.
>
>
> Yes, that makes sense, but I think you should give a try at least at the
> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too).  For
> these low compression levels Blosc chooses a block size that comfortably
> fits in L2.  Also, note that the benchmarks above where for in-memory data,
> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
> perform well enough.

Alright, thanks for the tip. I read the benchmarks too fast and didn't
realize it was all in-memory. I should definitely at Zstd.

In our use case it's always from disk (or well, SSD), and sometimes
even slow-ish network mounts.

Elvis

>
>
>>
>>
>> >
>> >>
>> >> I might try to dig up the script I used for the benchmark and see if
>> >> we still have the input I used, and do a test with lossy ZFP. It could
>> >> be very interesting for creating 3D "thumbnails" in our application.
>> >
>> >
>> > It would be nice if your benchmark code (and dataset) can be made
>> > publicly
>> > available so as to serve to others as a good comparison.
>>
>> The dataset is unfortunately confidential and not something I can
>> release. I'm attaching the script I used though, it's very simple.
>>
>> But, a disclaimer: The benchmarks I did were not really thorough. They
>> were also internal and never really meant to be published. It was
>> mostly a quick and dirty test to see which of these LZ4 codecs would
>> be in the right ballpark for us.
>
>
> Ok.  Thanks anyway.
>
>>
>>
>> Elvis
>>
>> >
>> >>
>> >>
>> >> Elvis
>> >>
>> >> >
>> >> > P
>> >> >
>> >> >
>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >> >>
>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[hidden email]>:
>> >> >>>
>> >> >>> Hi All,
>> >> >>>
>> >> >>> Just wanted to mention a new HDF5 floating point compression plugin
>> >> >>> available on github...
>> >> >>>
>> >> >>> https://github.com/LLNL/H5Z-ZFP
>> >> >>>
>> >> >>> This plugin will come embedded in the next release of the Silo
>> >> >>> library
>> >> >>> as
>> >> >>> well.
>> >> >>
>> >> >>
>> >> >> Thanks for the pointer. That's very interesting. I had not heard
>> >> >> about
>> >> >> ZFP before. The ability to set a bound on the error in the lossless
>> >> >> case seems very useful.
>> >> >>
>> >> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> >> against other compressors?
>> >> >>
>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC at
>> >> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> >> would be worthwhile to look at ZFP as well..
>> >> >>
>> >> >> Best regards,
>> >> >> Elvis
>> >> >>
>> >> >>>
>> >> >>> --
>> >> >>> Mark C. Miller, LLNL
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> Hdf-forum is for HDF software users discussion.
>> >> >>> [hidden email]
>> >> >>>
>> >> >>>
>> >> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >>> Twitter: https://twitter.com/hdf5
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> Hdf-forum is for HDF software users discussion.
>> >> >> [hidden email]
>> >> >>
>> >> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >> Twitter: https://twitter.com/hdf5
>> >> >>
>> >> >
>> >> > _______________________________________________
>> >> > Hdf-forum is for HDF software users discussion.
>> >> > [hidden email]
>> >> >
>> >> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> > Twitter: https://twitter.com/hdf5
>> >>
>> >> _______________________________________________
>> >> Hdf-forum is for HDF software users discussion.
>> >> [hidden email]
>> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> Twitter: https://twitter.com/hdf5
>> >
>> >
>> >
>> >
>> > --
>> > Francesc Alted
>> >
>> > _______________________________________________
>> > Hdf-forum is for HDF software users discussion.
>> > [hidden email]
>> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> > Twitter: https://twitter.com/hdf5
>
>
>
>
> --
> Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Francesc Alted-3


2016-10-28 18:04 GMT+02:00 Elvis Stansvik <[hidden email]>:
2016-10-28 17:53 GMT+02:00 Francesc Alted <[hidden email]>:
>
>
> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik <[hidden email]>:
>>
>> 2016-10-28 16:33 GMT+02:00 Francesc Alted <[hidden email]>:
>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
>> > <[hidden email]>:
>> >>
>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <[hidden email]>:
>> >> > I second this request big time and would add zstd, if we are already
>> >> > trying
>> >> > out various encoders. ;)
>> >>
>> >> This may not be of interest, and does not include zstd, but I'm
>> >> attaching an excerpt from some of the results I got when back when
>> >> doing our basic benchmarking of some algorithms (all lossless).
>> >>
>> >> It was based on those that we settled on Blosc_LZ4HC at level 4, since
>> >> we were looking for very fast decompression times, while longer
>> >> compression times and slightly larger file size was acceptable up to
>> >> certain points. The gzip results are included mostly because that's
>> >> what we were using at the time and I wanted them as a comparison, but
>> >> we knew we wanted something else. The input for those benchmarks was a
>> >> 500x300x300 float dataset containing a tomographic 3D image.
>> >
>> >
>> > Zstd was included in Blosc a while ago:
>> >
>> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>> >
>> > and its performance really shines, even on real data:
>> >
>> >
>> > http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html
>> >
>> > (although here, being only integers of 1 byte, only the BITSHUFFLE
>> > filter is
>> > used, but not the faster SHUFFLE).
>> >
>> > As Blosc offers the same API for a number of codecs, trying it in
>> > combination with Zstd should be really easy.
>>
>> Zstd indeed looks very well-balanced. The reason I didn't include it
>> back when I did those benchmarks was that we were really focused on
>> decompression speed in our application, compression speed was very
>> much secondary. So I included mostly LZ4 codecs.
>
>
> Yes, that makes sense, but I think you should give a try at least at the
> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too).  For
> these low compression levels Blosc chooses a block size that comfortably
> fits in L2.  Also, note that the benchmarks above where for in-memory data,
> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
> perform well enough.

Alright, thanks for the tip. I read the benchmarks too fast and didn't
realize it was all in-memory. I should definitely at Zstd.

In our use case it's always from disk (or well, SSD), and sometimes
even slow-ish network mounts.

Cool.  Keep us informed.  I am definitely interested.
 

Elvis

>
>
>>
>>
>> >
>> >>
>> >> I might try to dig up the script I used for the benchmark and see if
>> >> we still have the input I used, and do a test with lossy ZFP. It could
>> >> be very interesting for creating 3D "thumbnails" in our application.
>> >
>> >
>> > It would be nice if your benchmark code (and dataset) can be made
>> > publicly
>> > available so as to serve to others as a good comparison.
>>
>> The dataset is unfortunately confidential and not something I can
>> release. I'm attaching the script I used though, it's very simple.
>>
>> But, a disclaimer: The benchmarks I did were not really thorough. They
>> were also internal and never really meant to be published. It was
>> mostly a quick and dirty test to see which of these LZ4 codecs would
>> be in the right ballpark for us.
>
>
> Ok.  Thanks anyway.
>
>>
>>
>> Elvis
>>
>> >
>> >>
>> >>
>> >> Elvis
>> >>
>> >> >
>> >> > P
>> >> >
>> >> >
>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >> >>
>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[hidden email]>:
>> >> >>>
>> >> >>> Hi All,
>> >> >>>
>> >> >>> Just wanted to mention a new HDF5 floating point compression plugin
>> >> >>> available on github...
>> >> >>>
>> >> >>> https://github.com/LLNL/H5Z-ZFP
>> >> >>>
>> >> >>> This plugin will come embedded in the next release of the Silo
>> >> >>> library
>> >> >>> as
>> >> >>> well.
>> >> >>
>> >> >>
>> >> >> Thanks for the pointer. That's very interesting. I had not heard
>> >> >> about
>> >> >> ZFP before. The ability to set a bound on the error in the lossless
>> >> >> case seems very useful.
>> >> >>
>> >> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> >> against other compressors?
>> >> >>
>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC at
>> >> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> >> would be worthwhile to look at ZFP as well..
>> >> >>
>> >> >> Best regards,
>> >> >> Elvis
>> >> >>
>> >> >>>
>> >> >>> --
>> >> >>> Mark C. Miller, LLNL
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> Hdf-forum is for HDF software users discussion.
>> >> >>> [hidden email]
>> >> >>>
>> >> >>>
>> >> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >>> Twitter: https://twitter.com/hdf5
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> Hdf-forum is for HDF software users discussion.
>> >> >> [hidden email]
>> >> >>
>> >> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >> Twitter: https://twitter.com/hdf5
>> >> >>
>> >> >
>> >> > _______________________________________________
>> >> > Hdf-forum is for HDF software users discussion.
>> >> > [hidden email]
>> >> >
>> >> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> > Twitter: https://twitter.com/hdf5
>> >>
>> >> _______________________________________________
>> >> Hdf-forum is for HDF software users discussion.
>> >> [hidden email]
>> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> Twitter: https://twitter.com/hdf5
>> >
>> >
>> >
>> >
>> > --
>> > Francesc Alted
>> >
>> > _______________________________________________
>> > Hdf-forum is for HDF software users discussion.
>> > [hidden email]
>> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> > Twitter: https://twitter.com/hdf5
>
>
>
>
> --
> Francesc Alted



--
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Elvis Stansvik
2016-10-28 18:14 GMT+02:00 Francesc Alted <[hidden email]>:

>
>
> 2016-10-28 18:04 GMT+02:00 Elvis Stansvik <[hidden email]>:
>>
>> 2016-10-28 17:53 GMT+02:00 Francesc Alted <[hidden email]>:
>> >
>> >
>> > 2016-10-28 17:20 GMT+02:00 Elvis Stansvik
>> > <[hidden email]>:
>> >>
>> >> 2016-10-28 16:33 GMT+02:00 Francesc Alted <[hidden email]>:
>> >> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
>> >> > <[hidden email]>:
>> >> >>
>> >> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <[hidden email]>:
>> >> >> > I second this request big time and would add zstd, if we are
>> >> >> > already
>> >> >> > trying
>> >> >> > out various encoders. ;)
>> >> >>
>> >> >> This may not be of interest, and does not include zstd, but I'm
>> >> >> attaching an excerpt from some of the results I got when back when
>> >> >> doing our basic benchmarking of some algorithms (all lossless).
>> >> >>
>> >> >> It was based on those that we settled on Blosc_LZ4HC at level 4,
>> >> >> since
>> >> >> we were looking for very fast decompression times, while longer
>> >> >> compression times and slightly larger file size was acceptable up to
>> >> >> certain points. The gzip results are included mostly because that's
>> >> >> what we were using at the time and I wanted them as a comparison,
>> >> >> but
>> >> >> we knew we wanted something else. The input for those benchmarks was
>> >> >> a
>> >> >> 500x300x300 float dataset containing a tomographic 3D image.
>> >> >
>> >> >
>> >> > Zstd was included in Blosc a while ago:
>> >> >
>> >> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>> >> >
>> >> > and its performance really shines, even on real data:
>> >> >
>> >> >
>> >> >
>> >> > http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html
>> >> >
>> >> > (although here, being only integers of 1 byte, only the BITSHUFFLE
>> >> > filter is
>> >> > used, but not the faster SHUFFLE).
>> >> >
>> >> > As Blosc offers the same API for a number of codecs, trying it in
>> >> > combination with Zstd should be really easy.
>> >>
>> >> Zstd indeed looks very well-balanced. The reason I didn't include it
>> >> back when I did those benchmarks was that we were really focused on
>> >> decompression speed in our application, compression speed was very
>> >> much secondary. So I included mostly LZ4 codecs.
>> >
>> >
>> > Yes, that makes sense, but I think you should give a try at least at the
>> > lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too).  For
>> > these low compression levels Blosc chooses a block size that comfortably
>> > fits in L2.  Also, note that the benchmarks above where for in-memory
>> > data,
>> > so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
>> > perform well enough.
>>
>> Alright, thanks for the tip. I read the benchmarks too fast and didn't
>> realize it was all in-memory. I should definitely at Zstd.
>>
>> In our use case it's always from disk (or well, SSD), and sometimes
>> even slow-ish network mounts.
>
>
> Cool.  Keep us informed.  I am definitely interested.

I found the old input file and very quickly I ran the benchmark again
with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
and 3:

compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801

Unfortunately I can't find the spreadsheet where I made those
diagrams, so can't make a new updated one (at least not easily right
now).

But this shows that Zstd is very competitive. It achieves slightly
better compression ratio than Blosc_LZ4HC at level 4 (the original
file size was 189378052 bytes), which is what we picked, and the
compression is much faster. But Blosc_LZ4HC still wins out in the
decompression time, so I think in the end we picked the right one.

Our use case is essentially compress once, decompress many many times.
And during the decompression the user will sit there and wait. That's
why decompression time was so important to us.

Anyway, thanks a for making me have a look at Zstd, we may yet use it
somewhere else.

And I now remember the real reason I didn't include it the first time
around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
the packaged version (1.10 is where Zstd support was added), so I
lazily just skipped it :)

Elvis

>
>>
>>
>> Elvis
>>
>> >
>> >
>> >>
>> >>
>> >> >
>> >> >>
>> >> >> I might try to dig up the script I used for the benchmark and see if
>> >> >> we still have the input I used, and do a test with lossy ZFP. It
>> >> >> could
>> >> >> be very interesting for creating 3D "thumbnails" in our application.
>> >> >
>> >> >
>> >> > It would be nice if your benchmark code (and dataset) can be made
>> >> > publicly
>> >> > available so as to serve to others as a good comparison.
>> >>
>> >> The dataset is unfortunately confidential and not something I can
>> >> release. I'm attaching the script I used though, it's very simple.
>> >>
>> >> But, a disclaimer: The benchmarks I did were not really thorough. They
>> >> were also internal and never really meant to be published. It was
>> >> mostly a quick and dirty test to see which of these LZ4 codecs would
>> >> be in the right ballpark for us.
>> >
>> >
>> > Ok.  Thanks anyway.
>> >
>> >>
>> >>
>> >> Elvis
>> >>
>> >> >
>> >> >>
>> >> >>
>> >> >> Elvis
>> >> >>
>> >> >> >
>> >> >> > P
>> >> >> >
>> >> >> >
>> >> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >> >> >>
>> >> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[hidden email]>:
>> >> >> >>>
>> >> >> >>> Hi All,
>> >> >> >>>
>> >> >> >>> Just wanted to mention a new HDF5 floating point compression
>> >> >> >>> plugin
>> >> >> >>> available on github...
>> >> >> >>>
>> >> >> >>> https://github.com/LLNL/H5Z-ZFP
>> >> >> >>>
>> >> >> >>> This plugin will come embedded in the next release of the Silo
>> >> >> >>> library
>> >> >> >>> as
>> >> >> >>> well.
>> >> >> >>
>> >> >> >>
>> >> >> >> Thanks for the pointer. That's very interesting. I had not heard
>> >> >> >> about
>> >> >> >> ZFP before. The ability to set a bound on the error in the
>> >> >> >> lossless
>> >> >> >> case seems very useful.
>> >> >> >>
>> >> >> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> >> >> against other compressors?
>> >> >> >>
>> >> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
>> >> >> >> at
>> >> >> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> >> >> would be worthwhile to look at ZFP as well..
>> >> >> >>
>> >> >> >> Best regards,
>> >> >> >> Elvis
>> >> >> >>
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> Mark C. Miller, LLNL
>> >> >> >>>
>> >> >> >>> _______________________________________________
>> >> >> >>> Hdf-forum is for HDF software users discussion.
>> >> >> >>> [hidden email]
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >> >>> Twitter: https://twitter.com/hdf5
>> >> >> >>
>> >> >> >>
>> >> >> >> _______________________________________________
>> >> >> >> Hdf-forum is for HDF software users discussion.
>> >> >> >> [hidden email]
>> >> >> >>
>> >> >> >>
>> >> >> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >> >> Twitter: https://twitter.com/hdf5
>> >> >> >>
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > Hdf-forum is for HDF software users discussion.
>> >> >> > [hidden email]
>> >> >> >
>> >> >> >
>> >> >> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >> > Twitter: https://twitter.com/hdf5
>> >> >>
>> >> >> _______________________________________________
>> >> >> Hdf-forum is for HDF software users discussion.
>> >> >> [hidden email]
>> >> >>
>> >> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >> Twitter: https://twitter.com/hdf5
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Francesc Alted
>> >> >
>> >> > _______________________________________________
>> >> > Hdf-forum is for HDF software users discussion.
>> >> > [hidden email]
>> >> >
>> >> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> > Twitter: https://twitter.com/hdf5
>> >
>> >
>> >
>> >
>> > --
>> > Francesc Alted
>
>
>
>
> --
> Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: New HDF5 compression plugin

Miller, Mark C.
Can I just clarify some of this discussion...

It reads like you are talking about compression ratios around 1.6x, less than 2:1. Is that correct?

FYI..ZFP demonstrates results far beyond that (10-30x and better) at the expense of (some) loss.

However, current efforts indicate that losses are tolerable in many post-processing analysis workflows.

We think the key to achieving good compression on floating point data, going forward, is to allow for some well controlled loss.

See this page on on ZFP losses effect, for example, taking derivatives...


as compared to other compression methods.

We already face loss-like noise in floating point results when dealing with system differences either between current systems and software stacks or over time as systems and software evolve.

Mark

-- 
Mark C. Miller, LLNL

From: Hdf-forum <[hidden email]> on behalf of Elvis Stansvik <[hidden email]>
Reply-To: HDF Users Discussion List <[hidden email]>
Date: Friday, October 28, 2016 at 11:08 AM
To: "[hidden email]" <[hidden email]>
Cc: HDF Users Discussion List <[hidden email]>
Subject: Re: [Hdf-forum] New HDF5 compression plugin

2016-10-28 18:14 GMT+02:00 Francesc Alted <[hidden email]>:


2016-10-28 18:04 GMT+02:00 Elvis Stansvik <[hidden email]>:

2016-10-28 17:53 GMT+02:00 Francesc Alted <[hidden email]>:
>
>
> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik
>>
>> 2016-10-28 16:33 GMT+02:00 Francesc Alted <[hidden email]>:
>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
>> >>
>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <[hidden email]>:
>> >> > I second this request big time and would add zstd, if we are
>> >> > already
>> >> > trying
>> >> > out various encoders. ;)
>> >>
>> >> This may not be of interest, and does not include zstd, but I'm
>> >> attaching an excerpt from some of the results I got when back when
>> >> doing our basic benchmarking of some algorithms (all lossless).
>> >>
>> >> It was based on those that we settled on Blosc_LZ4HC at level 4,
>> >> since
>> >> we were looking for very fast decompression times, while longer
>> >> compression times and slightly larger file size was acceptable up to
>> >> certain points. The gzip results are included mostly because that's
>> >> what we were using at the time and I wanted them as a comparison,
>> >> but
>> >> we knew we wanted something else. The input for those benchmarks was
>> >> a
>> >> 500x300x300 float dataset containing a tomographic 3D image.
>> >
>> >
>> > Zstd was included in Blosc a while ago:
>> >
>> >
>> > and its performance really shines, even on real data:
>> >
>> >
>> >
>> >
>> > (although here, being only integers of 1 byte, only the BITSHUFFLE
>> > filter is
>> > used, but not the faster SHUFFLE).
>> >
>> > As Blosc offers the same API for a number of codecs, trying it in
>> > combination with Zstd should be really easy.
>>
>> Zstd indeed looks very well-balanced. The reason I didn't include it
>> back when I did those benchmarks was that we were really focused on
>> decompression speed in our application, compression speed was very
>> much secondary. So I included mostly LZ4 codecs.
>
>
> Yes, that makes sense, but I think you should give a try at least at the
> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too).  For
> these low compression levels Blosc chooses a block size that comfortably
> fits in L2.  Also, note that the benchmarks above where for in-memory
> data,
> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
> perform well enough.

Alright, thanks for the tip. I read the benchmarks too fast and didn't
realize it was all in-memory. I should definitely at Zstd.

In our use case it's always from disk (or well, SSD), and sometimes
even slow-ish network mounts.


Cool.  Keep us informed.  I am definitely interested.

I found the old input file and very quickly I ran the benchmark again
with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
and 3:

compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801

Unfortunately I can't find the spreadsheet where I made those
diagrams, so can't make a new updated one (at least not easily right
now).

But this shows that Zstd is very competitive. It achieves slightly
better compression ratio than Blosc_LZ4HC at level 4 (the original
file size was 189378052 bytes), which is what we picked, and the
compression is much faster. But Blosc_LZ4HC still wins out in the
decompression time, so I think in the end we picked the right one.

Our use case is essentially compress once, decompress many many times.
And during the decompression the user will sit there and wait. That's
why decompression time was so important to us.

Anyway, thanks a for making me have a look at Zstd, we may yet use it
somewhere else.

And I now remember the real reason I didn't include it the first time
around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
the packaged version (1.10 is where Zstd support was added), so I
lazily just skipped it :)

Elvis




Elvis

>
>
>>
>>
>> >
>> >>
>> >> I might try to dig up the script I used for the benchmark and see if
>> >> we still have the input I used, and do a test with lossy ZFP. It
>> >> could
>> >> be very interesting for creating 3D "thumbnails" in our application.
>> >
>> >
>> > It would be nice if your benchmark code (and dataset) can be made
>> > publicly
>> > available so as to serve to others as a good comparison.
>>
>> The dataset is unfortunately confidential and not something I can
>> release. I'm attaching the script I used though, it's very simple.
>>
>> But, a disclaimer: The benchmarks I did were not really thorough. They
>> were also internal and never really meant to be published. It was
>> mostly a quick and dirty test to see which of these LZ4 codecs would
>> be in the right ballpark for us.
>
>
> Ok.  Thanks anyway.
>
>>
>>
>> Elvis
>>
>> >
>> >>
>> >>
>> >> Elvis
>> >>
>> >> >
>> >> > P
>> >> >
>> >> >
>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >> >>
>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[hidden email]>:
>> >> >>>
>> >> >>> Hi All,
>> >> >>>
>> >> >>> Just wanted to mention a new HDF5 floating point compression
>> >> >>> plugin
>> >> >>> available on github...
>> >> >>>
>> >> >>>
>> >> >>> This plugin will come embedded in the next release of the Silo
>> >> >>> library
>> >> >>> as
>> >> >>> well.
>> >> >>
>> >> >>
>> >> >> Thanks for the pointer. That's very interesting. I had not heard
>> >> >> about
>> >> >> ZFP before. The ability to set a bound on the error in the
>> >> >> lossless
>> >> >> case seems very useful.
>> >> >>
>> >> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> >> against other compressors?
>> >> >>
>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
>> >> >> at
>> >> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> >> would be worthwhile to look at ZFP as well..
>> >> >>
>> >> >> Best regards,
>> >> >> Elvis
>> >> >>
>> >> >>>
>> >> >>> --
>> >> >>> Mark C. Miller, LLNL
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> Hdf-forum is for HDF software users discussion.
>> >> >>> [hidden email]
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> Twitter: https://twitter.com/hdf5
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> Hdf-forum is for HDF software users discussion.
>> >> >>
>> >> >>
>> >> >> Twitter: https://twitter.com/hdf5
>> >> >>
>> >> >
>> >> > _______________________________________________
>> >> > Hdf-forum is for HDF software users discussion.
>> >> >
>> >> >
>> >> > Twitter: https://twitter.com/hdf5
>> >>
>> >> _______________________________________________
>> >> Hdf-forum is for HDF software users discussion.
>> >>
>> >
>> >
>> >
>> >
>> > --
>> > Francesc Alted
>> >
>> > _______________________________________________
>> > Hdf-forum is for HDF software users discussion.
>> >
>
>
>
>
> --
> Francesc Alted




--
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
12
Loading...