Avoiding corruption of the HDF5 File

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Avoiding corruption of the HDF5 File

makepeace@jawasoft.com
Dear Experts,

We are building a data acquisition and processing system on top of an HDF5 file store. Generally we have been very pleased with HDF5 - great flexibility in data structure, performant, small file size, availability of third party data access tools etc.

However our system needs to run for 36-48 hours at a time - and we are finding that if we (deliberately or accidentally) stop the process while running (and writing data) the file is corrupted and we lose all our work.

We are in C# and wrote our access routines on top of HDF5.net (which I understand is deprecated). We tend to keep all active pointer objects open for the duration of the process that reads or writes them (file, group and dataset handles in particular).

1) Is there a full featured replacement for HDF5.net now, that I was unaware of? Previous contenders were found to be missing support for features we depend on. If so will it address the corruption issue?

2) Should we be opening and closing all the entities on every write? I would have thought that would dramatically slow access but perhaps not. Guidance?

3) Are there any other tips to making the file less susceptible to corruption if writing is abandoned unexpectedly?

Please help - this issue could be serious enough to make us reconsider our storage choice, which would be expensive now.

rgds,
Ewan
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding corruption of the HDF5 File

Miller, Mark C.

 

 

"Hdf-forum on behalf of Ewan Makepeace" wrote:

 

Dear Experts,

 

We are building a data acquisition and processing system on top of an HDF5 file store. Generally we have been very pleased with HDF5 - great flexibility in data structure, performant, small file size, availability of third party data access tools etc.

 

However our system needs to run for 36-48 hours at a time - and we are finding that if we (deliberately or accidentally) stop the process while running (and writing data) the file is corrupted and we lose all our work.

 

We are in C# and wrote our access routines on top of HDF5.net (which I understand is deprecated). We tend to keep all active pointer objects open for the duration of the process that reads or writes them (file, group and dataset handles in particular).

 

1) Is there a full featured replacement for HDF5.net now, that I was unaware of? Previous contenders were found to be missing support for features we depend on. If so will it address the corruption issue?

 

Apologies but I only ever use HDF5 C interface on Linux-like systems

 

2) Should we be opening and closing all the entities on every write? I would have thought that would dramatically slow access but perhaps not. Guidance?

 

Well, I think it is best to close datasets, dataspaces, types, and groups as soon as possible when you know you no longer need them. That should help to minimize memory usage. Also, can you possibly add a call to H5Fflush() (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5F.html#File-Flush) so that it happens relatively regularly? Can you possibly do something like on Linux where you *catch* a signal and then call H5Fclose() on the file as part of the signal handler? Are you by chance calling H5dont_atexit() (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html#Library-DontAtExit) somewhere to prevent HDF5's smarts to close down the file gracefully upon exit? (fyi...these are all linux-isms and so I don't know if they will be of much use to you in your context)

 

3) Are there any other tips to making the file less susceptible to corruption if writing is abandoned unexpectedly?

 

One of the DOE labs invested in a 'journaling metadata' enhancement to HDF5. I think that work was nearly completed. However, it has since staled on a private branch and has yet to have been merged into the mainline of the code. It might be worth making a pitch for that if you think it could be useful in this context. Again, I am not sure because all my experience is linux-centric.

 

Hope that helps.


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding corruption of the HDF5 File

Jager, Gerco de
In reply to this post by makepeace@jawasoft.com
I'm not an expert at all (yet) so please be kind...

I recently started writing a converter from our proprietary measurements format to HDF5 in C# and using the HDF.PInvoke nuget distribution. I've read that the HDF.PInvoke is the way forward and hopefully it discloses all the features you need.

Maybe too trivial to mention but it is important to implement IDispose properly on classes that have pointers to HDF entities to ensure that they are properly released on software failure and release of managed objects. But this won't save the file if your system halts in such a way that the disposers and finalizers and garbage collection won't be executed anymore.

Kind regards,
Gerco



Gerco de Jager  | Software Engineer | MARIN Software Group
MARIN | T +31 317 49 33 51 | mailto:[hidden email] | http://www.marin.nl

MARIN news: http://www.marin.nl/web/News/News-items/Predicting-broadband-hull-pressure-fluctuations-and-underwater-radiated-noise.htm

-----Original Message-----
From: Hdf-forum [mailto:[hidden email]] On Behalf Of Ewan Makepeace
Sent: vrijdag 22 september 2017 3:34
To: [hidden email]
Subject: [Hdf-forum] Avoiding corruption of the HDF5 File

Dear Experts,

We are building a data acquisition and processing system on top of an HDF5 file store. Generally we have been very pleased with HDF5 - great flexibility in data structure, performant, small file size, availability of third party data access tools etc.

However our system needs to run for 36-48 hours at a time - and we are finding that if we (deliberately or accidentally) stop the process while running (and writing data) the file is corrupted and we lose all our work.

We are in C# and wrote our access routines on top of HDF5.net (which I understand is deprecated). We tend to keep all active pointer objects open for the duration of the process that reads or writes them (file, group and dataset handles in particular).

1) Is there a full featured replacement for HDF5.net now, that I was unaware of? Previous contenders were found to be missing support for features we depend on. If so will it address the corruption issue?

2) Should we be opening and closing all the entities on every write? I would have thought that would dramatically slow access but perhaps not. Guidance?

3) Are there any other tips to making the file less susceptible to corruption if writing is abandoned unexpectedly?

Please help - this issue could be serious enough to make us reconsider our storage choice, which would be expensive now.

rgds,
Ewan
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding corruption of the HDF5 File

Quincey Koziol-3
In reply to this post by makepeace@jawasoft.com
Hi Ewan,
There’s two things you can be doing to address file corruption issues:

- For the near term, use the techniques and code for managing the metadata cache described here:  <a href="https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC H5Ocork v5 new fxn names.pdf" class="">https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf  

- In the next year or so, we will be finishing the “SWMR” feature, described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html

The metadata cache techniques are rather unsubtle, but will avoid corrupted files until the “full” SWMR feature is finished.

Quincey


On Sep 21, 2017, at 8:33 PM, Ewan Makepeace <[hidden email]> wrote:

Dear Experts,

We are building a data acquisition and processing system on top of an HDF5 file store. Generally we have been very pleased with HDF5 - great flexibility in data structure, performant, small file size, availability of third party data access tools etc.

However our system needs to run for 36-48 hours at a time - and we are finding that if we (deliberately or accidentally) stop the process while running (and writing data) the file is corrupted and we lose all our work.

We are in C# and wrote our access routines on top of HDF5.net (which I understand is deprecated). We tend to keep all active pointer objects open for the duration of the process that reads or writes them (file, group and dataset handles in particular).

1) Is there a full featured replacement for HDF5.net now, that I was unaware of? Previous contenders were found to be missing support for features we depend on. If so will it address the corruption issue?

2) Should we be opening and closing all the entities on every write? I would have thought that would dramatically slow access but perhaps not. Guidance?

3) Are there any other tips to making the file less susceptible to corruption if writing is abandoned unexpectedly?

Please help - this issue could be serious enough to make us reconsider our storage choice, which would be expensive now.

rgds,
Ewan
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding corruption of the HDF5 File

Miller, Mark C.

Hi Quincey,

 

One question though...Is it possible to produce bytes-on-disk format from a wholly different code base that is nonetheless compatible with HDF5 proper?

 

Your answer seems to suggest it is NOT possible without using (some of) the HDF5 code base.

 

That would be a shame as it suggests there is no longer a well defined bytes-on-disk format apart from whatever the HDF5 implementation produces.

 

Mark

 

 

"Hdf-forum on behalf of Quincey Koziol" wrote:

 

Hi Ewan,

There’s two things you can be doing to address file corruption issues:

 

- For the near term, use the techniques and code for managing the metadata cache described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf  

 

- In the next year or so, we will be finishing the “SWMR” feature, described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html

 

The metadata cache techniques are rather unsubtle, but will avoid corrupted files until the “full” SWMR feature is finished.

 

Quincey

 

 

On Sep 21, 2017, at 8:33 PM, Ewan Makepeace <[hidden email]> wrote:

 

Dear Experts,

We are building a data acquisition and processing system on top of an HDF5 file store. Generally we have been very pleased with HDF5 - great flexibility in data structure, performant, small file size, availability of third party data access tools etc.

However our system needs to run for 36-48 hours at a time - and we are finding that if we (deliberately or accidentally) stop the process while running (and writing data) the file is corrupted and we lose all our work.

We are in C# and wrote our access routines on top of HDF5.net (which I understand is deprecated). We tend to keep all active pointer objects open for the duration of the process that reads or writes them (file, group and dataset handles in particular).

1) Is there a full featured replacement for HDF5.net now, that I was unaware of? Previous contenders were found to be missing support for features we depend on. If so will it address the corruption issue?

2) Should we be opening and closing all the entities on every write? I would have thought that would dramatically slow access but perhaps not. Guidance?

3) Are there any other tips to making the file less susceptible to corruption if writing is abandoned unexpectedly?

Please help - this issue could be serious enough to make us reconsider our storage choice, which would be expensive now.

rgds,
Ewan
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

 


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding corruption of the HDF5 File

Dana Robinson

Hi all,

 

(I'm jumping in to the middle of this discussion while still on vacation, so please excuse me if I'm missing something.)

 

Those documents describe cache flushing. Flush ordering should not affect the file format – any transient 'file corruption' would be due to a second reader inspecting an incomplete file, which techniques like SWMR are designed to address.

 

Incompatibilities between the official HDF5 library and third-party HDF5 libraries should probably be considered bugs in one or the other (or maybe even both!), as long as they are truly holding to the published HDF5 file format.

 

Dana

 

From: Hdf-forum <[hidden email]> on behalf of "Miller, Mark C." <[hidden email]>
Reply-To: HDF List <[hidden email]>
Date: Monday, September 25, 2017 at 10:52
To: HDF List <[hidden email]>
Subject: Re: [Hdf-forum] Avoiding corruption of the HDF5 File

 

Hi Quincey,

 

One question though...Is it possible to produce bytes-on-disk format from a wholly different code base that is nonetheless compatible with HDF5 proper?

 

Your answer seems to suggest it is NOT possible without using (some of) the HDF5 code base.

 

That would be a shame as it suggests there is no longer a well defined bytes-on-disk format apart from whatever the HDF5 implementation produces.

 

Mark

 

 

"Hdf-forum on behalf of Quincey Koziol" wrote:

 

Hi Ewan,

There’s two things you can be doing to address file corruption issues:

 

- For the near term, use the techniques and code for managing the metadata cache described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf  

 

- In the next year or so, we will be finishing the “SWMR” feature, described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html

 

The metadata cache techniques are rather unsubtle, but will avoid corrupted files until the “full” SWMR feature is finished.

 

Quincey

 

 

On Sep 21, 2017, at 8:33 PM, Ewan Makepeace <[hidden email]> wrote:

 

Dear Experts,

We are building a data acquisition and processing system on top of an HDF5 file store. Generally we have been very pleased with HDF5 - great flexibility in data structure, performant, small file size, availability of third party data access tools etc.

However our system needs to run for 36-48 hours at a time - and we are finding that if we (deliberately or accidentally) stop the process while running (and writing data) the file is corrupted and we lose all our work.

We are in C# and wrote our access routines on top of HDF5.net (which I understand is deprecated). We tend to keep all active pointer objects open for the duration of the process that reads or writes them (file, group and dataset handles in particular).

1) Is there a full featured replacement for HDF5.net now, that I was unaware of? Previous contenders were found to be missing support for features we depend on. If so will it address the corruption issue?

2) Should we be opening and closing all the entities on every write? I would have thought that would dramatically slow access but perhaps not. Guidance?

3) Are there any other tips to making the file less susceptible to corruption if writing is abandoned unexpectedly?

Please help - this issue could be serious enough to make us reconsider our storage choice, which would be expensive now.

rgds,
Ewan
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

 


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding corruption of the HDF5 File

Miller, Mark C.

My apologies...I have obviously confused two different threads of discussion.

 

One thread, from several weeks ago, was regarding file corruption when a wholly different code base is producing HDF5 bytes-on-disk and then a subsequent read/write by an HDF5 tool was corrupting the file.

 

This thread, from last week, was regarding file corruption due to crash/shutdown before H5Fclose.

 

Again, sorry for confusion.

 

Mark

 

 

"Hdf-forum on behalf of Dana Robinson" wrote:

 

Hi all,

 

(I'm jumping in to the middle of this discussion while still on vacation, so please excuse me if I'm missing something.)

 

Those documents describe cache flushing. Flush ordering should not affect the file format – any transient 'file corruption' would be due to a second reader inspecting an incomplete file, which techniques like SWMR are designed to address.

 

Incompatibilities between the official HDF5 library and third-party HDF5 libraries should probably be considered bugs in one or the other (or maybe even both!), as long as they are truly holding to the published HDF5 file format.

 

Dana

 

From: Hdf-forum <[hidden email]> on behalf of "Miller, Mark C." <[hidden email]>
Reply-To: HDF List <[hidden email]>
Date: Monday, September 25, 2017 at 10:52
To: HDF List <[hidden email]>
Subject: Re: [Hdf-forum] Avoiding corruption of the HDF5 File

 

Hi Quincey,

 

One question though...Is it possible to produce bytes-on-disk format from a wholly different code base that is nonetheless compatible with HDF5 proper?

 

Your answer seems to suggest it is NOT possible without using (some of) the HDF5 code base.

 

That would be a shame as it suggests there is no longer a well defined bytes-on-disk format apart from whatever the HDF5 implementation produces.

 

Mark

 

 

"Hdf-forum on behalf of Quincey Koziol" wrote:

 

Hi Ewan,

There’s two things you can be doing to address file corruption issues:

 

- For the near term, use the techniques and code for managing the metadata cache described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf  

 

- In the next year or so, we will be finishing the “SWMR” feature, described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html

 

The metadata cache techniques are rather unsubtle, but will avoid corrupted files until the “full” SWMR feature is finished.

 

Quincey

 

 

On Sep 21, 2017, at 8:33 PM, Ewan Makepeace <[hidden email]> wrote:

 

Dear Experts,

We are building a data acquisition and processing system on top of an HDF5 file store. Generally we have been very pleased with HDF5 - great flexibility in data structure, performant, small file size, availability of third party data access tools etc.

However our system needs to run for 36-48 hours at a time - and we are finding that if we (deliberately or accidentally) stop the process while running (and writing data) the file is corrupted and we lose all our work.

We are in C# and wrote our access routines on top of HDF5.net (which I understand is deprecated). We tend to keep all active pointer objects open for the duration of the process that reads or writes them (file, group and dataset handles in particular).

1) Is there a full featured replacement for HDF5.net now, that I was unaware of? Previous contenders were found to be missing support for features we depend on. If so will it address the corruption issue?

2) Should we be opening and closing all the entities on every write? I would have thought that would dramatically slow access but perhaps not. Guidance?

3) Are there any other tips to making the file less susceptible to corruption if writing is abandoned unexpectedly?

Please help - this issue could be serious enough to make us reconsider our storage choice, which would be expensive now.

rgds,
Ewan
_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

 


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding corruption of the HDF5 File

makepeace@jawasoft.com
In reply to this post by makepeace@jawasoft.com
Thank-you to all of you who have replied to my query on this (appended at bottom).

To summarise replies (with my feedback inline) are:

From: "Miller, Mark C." <[hidden email]>

Well, I think it is best to close datasets, dataspaces, types, and groups as soon as possible when you know you no longer need them. That should help to minimize memory usage. Also, can you possibly add a call to H5Fflush() (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5F.html#File-Flush) so that it happens relatively regularly? Can you possibly do something like on Linux where you *catch* a signal and then call H5Fclose() on the file as part of the signal handler? Are you by chance calling H5dont_atexit() (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html#Library-DontAtExit) somewhere to prevent HDF5's smarts to close down the file gracefully upon exit? (fyi...these are all linux-isms and so I don't know if they will be of much use to you in your context)

We have written a completely object oriented layer to manage the references - all the objects get disposed correctly (and in the right order) in normal operation. The problem (as others have pointed out) is that the caching in HDF5 leaves the files in an unpredictable and often invalid state when we terminate unexpectedly.

We will try adding a call to H5Flush after every write which may solve the issue although at what cost in performance I do not know.

One of the DOE labs invested in a 'journaling metadata' enhancement to HDF5. I think that work was nearly completed. However, it has since staled on a private branch and has yet to have been merged into the mainline of the code. It might be worth making a pitch for that if you think it could be useful in this context. Again, I am not sure because all my experience is linux-centric.

This does sound like a problem that would be solved by file journaling - but in the absence of a library not an option.

"Jager, Gerco de" <[hidden email]>

I recently started writing a converter from our proprietary measurements format to HDF5 in C# and using the HDF.PInvoke nuget distribution. I've read that the HDF.PInvoke is the way forward and hopefully it discloses all the features you need.

I am aware that HDF5.net is deprecated and systems based on PInvoke are recommended but we have had almost no issues with it so far - as I said if the system does not stop while writing (due to exceptions in other code unrelated to the persistence layer) the file is never corrupted. In fact I suspect that the problem is in the caching of data and doubt PInvoke will solve that problem.

From: Quincey Koziol <[hidden email]>

Hi Ewan,
There?s two things you can be doing to address file corruption issues:

- For the near term, use the techniques and code for managing the metadata cache described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf<https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf>  

- In the next year or so, we will be finishing the ?SWMR? feature, described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html<https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html>

The metadata cache techniques are rather unsubtle, but will avoid corrupted files until the ?full? SWMR feature is finished.

This is fascinating stuff. The SWMR features have a lot of application for us but seem to be taking longer than originally expected. The meta data management tools are of interest - but I am not sure we need fine grained control here - we basically need to have the file valid after every write and so I think the first thing to try is just flush the whole file every write.

The other option I am considering is to remove our HDF5 code from the assembly and run it as a standalone service so that in the event of a crash in our application the HDF5 service is still running and hopefully able to flush and close the file gracefully.

rgds,
Ewan

Original Question:

Dear Experts,

We are building a data acquisition and processing system on top of an HDF5 file store. Generally we have been very pleased with HDF5 - great flexibility in data structure, performant, small file size, availability of third party data access tools etc.

However our system needs to run for 36-48 hours at a time - and we are finding that if we (deliberately or accidentally) stop the process while running (and writing data) the file is corrupted and we lose all our work.

We are in C# and wrote our access routines on top of HDF5.net<http://HDF5.net> (which I understand is deprecated). We tend to keep all active pointer objects open for the duration of the process that reads or writes them (file, group and dataset handles in particular).

1) Is there a full featured replacement for HDF5.net<http://HDF5.net> now, that I was unaware of? Previous contenders were found to be missing support for features we depend on. If so will it address the corruption issue?

2) Should we be opening and closing all the entities on every write? I would have thought that would dramatically slow access but perhaps not. Guidance?

3) Are there any other tips to making the file less susceptible to corruption if writing is abandoned unexpectedly?

Please help - this issue could be serious enough to make us reconsider our storage choice, which would be expensive now.

rgds,
Ewan

_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding corruption of the HDF5 File

Ger van Diepen

I fear that a flush after each write can be quite expensive. Furthermore, I do not know if HDF5 guarantees the file to be uncorrupted if the failure occurs during the write of data (inbetween flushes).


Another option is to write the data into an external raw data file and make links to that file (or segments) in the HDF5 file. It is described in section 5.5.4 of the HDF users's guide. In case of an unexpected failure, it is always possible to make the links afterwards.

We make use of it in our LOFAR data writer.




>>> Ewan Makepeace <[hidden email]> 28-Sep-17 5:48 >>>
Thank-you to all of you who have replied to my query on this (appended at bottom).


To summarise replies (with my feedback inline) are:


From: "Miller, Mark C." <[hidden email]>


Well, I think it is best to close datasets, dataspaces, types, and groups as soon as possible when you know you no longer need them. That should help to minimize memory usage. Also, can you possibly add a call to H5Fflush() (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5F.html#File-Flush) so that it happens relatively regularly? Can you possibly do something like on Linux where you *catch* a signal and then call H5Fclose() on the file as part of the signal handler? Are you by chance calling H5dont_atexit() (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html#Library-DontAtExit) somewhere to prevent HDF5's smarts to close down the file gracefully upon exit? (fyi...these are all linux-isms and so I don't know if they will be of much use to you in your context)


We have written a completely object oriented layer to manage the references - all the objects get disposed correctly (and in the right order) in normal operation. The problem (as others have pointed out) is that the caching in HDF5 leaves the files in an unpredictable and often invalid state when we terminate unexpectedly.


We will try adding a call to H5Flush after every write which may solve the issue although at what cost in performance I do not know.


One of the DOE labs invested in a 'journaling metadata' enhancement to HDF5. I think that work was nearly completed. However, it has since staled on a private branch and has yet to have been merged into the mainline of the code. It might be worth making a pitch for that if you think it could be useful in this context. Again, I am not sure because all my experience is linux-centric.


This does sound like a problem that would be solved by file journaling - but in the absence of a library not an option.


"Jager, Gerco de" <[hidden email]>


I recently started writing a converter from our proprietary measurements format to HDF5 in C# and using the HDF.PInvoke nuget distribution. I've read that the HDF.PInvoke is the way forward and hopefully it discloses all the features you need.


I am aware that HDF5.net is deprecated and systems based on PInvoke are recommended but we have had almost no issues with it so far - as I said if the system does not stop while writing (due to exceptions in other code unrelated to the persistence layer) the file is never corrupted. In fact I suspect that the problem is in the caching of data and doubt PInvoke will solve that problem.


From: Quincey Koziol <[hidden email]>


Hi Ewan,
There?s two things you can be doing to address file corruption issues:

- For the near term, use the techniques and code for managing the metadata cache described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf<https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf>  

- In the next year or so, we will be finishing the ?SWMR? feature, described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html<https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html>

The metadata cache techniques are rather unsubtle, but will avoid corrupted files until the ?full? SWMR feature is finished.


This is fascinating stuff. The SWMR features have a lot of application for us but seem to be taking longer than originally expected. The meta data management tools are of interest - but I am not sure we need fine grained control here - we basically need to have the file valid after every write and so I think the first thing to try is just flush the whole file every write.


The other option I am considering is to remove our HDF5 code from the assembly and run it as a standalone service so that in the event of a crash in our application the HDF5 service is still running and hopefully able to flush and close the file gracefully.


rgds,

Ewan


Original Question:


Dear Experts,

We are building a data acquisition and processing system on top of an HDF5 file store. Generally we have been very pleased with HDF5 - great flexibility in data structure, performant, small file size, availability of third party data access tools etc.

However our system needs to run for 36-48 hours at a time - and we are finding that if we (deliberately or accidentally) stop the process while running (and writing data) the file is corrupted and we lose all our work.

We are in C# and wrote our access routines on top of HDF5.net<http://HDF5.net> (which I understand is deprecated). We tend to keep all active pointer objects open for the duration of the process that reads or writes them (file, group and dataset handles in particular).

1) Is there a full featured replacement for HDF5.net<http://HDF5.net> now, that I was unaware of? Previous contenders were found to be missing support for features we depend on. If so will it address the corruption issue?

2) Should we be opening and closing all the entities on every write? I would have thought that would dramatically slow access but perhaps not. Guidance?

3) Are there any other tips to making the file less susceptible to corruption if writing is abandoned unexpectedly?

Please help - this issue could be serious enough to make us reconsider our storage choice, which would be expensive now.

rgds,
Ewan


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|

Re: Avoiding corruption of the HDF5 File

Miller, Mark C.

So, just to be clear...my suggestion was to call H5Fflush() "...relatively regularly...". I guess I was intentionally vague because I think its obvious that there definitely *will*be* a performance hit and you then have to tradeoff the cost of risk of loss of data with the cost of loss of performance.

 

This thread does make me wonder about something though (and this may be a question for THG)...is the corruption "localized" (or is there a way of guaranteeing that it will be localized) to only the most recently written objects? Or, is it the case that *all*data* written to the file in the past is at risk of corruption if a failure occurs in the "current" operation?

 

Obviously, if its just the most recently written stuff that is at risk of corruption, that is much more tolerable. And, I thought that one of the command-line tools could help to 'fix' a broken hdf5 file but I can't remember which now (h5debug maybe?)

 

Mark

 

 

"Hdf-forum on behalf of Ger van Diepen" wrote:

 

I fear that a flush after each write can be quite expensive. Furthermore, I do not know if HDF5 guarantees the file to be uncorrupted if the failure occurs during the write of data (inbetween flushes).

 

Another option is to write the data into an external raw data file and make links to that file (or segments) in the HDF5 file. It is described in section 5.5.4 of the HDF users's guide. In case of an unexpected failure, it is always possible to make the links afterwards.

We make use of it in our LOFAR data writer.

 



>>> Ewan Makepeace <[hidden email]> 28-Sep-17 5:48 >>>
Thank-you to all of you who have replied to my query on this (appended at bottom).

 

To summarise replies (with my feedback inline) are:

 

From: "Miller, Mark C." <[hidden email]>

 

Well, I think it is best to close datasets, dataspaces, types, and groups as soon as possible when you know you no longer need them. That should help to minimize memory usage. Also, can you possibly add a call to H5Fflush() (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5F.html#File-Flush) so that it happens relatively regularly? Can you possibly do something like on Linux where you *catch* a signal and then call H5Fclose() on the file as part of the signal handler? Are you by chance calling H5dont_atexit() (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html#Library-DontAtExit) somewhere to prevent HDF5's smarts to close down the file gracefully upon exit? (fyi...these are all linux-isms and so I don't know if they will be of much use to you in your context)

 

We have written a completely object oriented layer to manage the references - all the objects get disposed correctly (and in the right order) in normal operation. The problem (as others have pointed out) is that the caching in HDF5 leaves the files in an unpredictable and often invalid state when we terminate unexpectedly.

 

We will try adding a call to H5Flush after every write which may solve the issue although at what cost in performance I do not know.

 

One of the DOE labs invested in a 'journaling metadata' enhancement to HDF5. I think that work was nearly completed. However, it has since staled on a private branch and has yet to have been merged into the mainline of the code. It might be worth making a pitch for that if you think it could be useful in this context. Again, I am not sure because all my experience is linux-centric.

 

This does sound like a problem that would be solved by file journaling - but in the absence of a library not an option.

 

"Jager, Gerco de" <[hidden email]>

 

I recently started writing a converter from our proprietary measurements format to HDF5 in C# and using the HDF.PInvoke nuget distribution. I've read that the HDF.PInvoke is the way forward and hopefully it discloses all the features you need.

 

I am aware that HDF5.net is deprecated and systems based on PInvoke are recommended but we have had almost no issues with it so far - as I said if the system does not stop while writing (due to exceptions in other code unrelated to the persistence layer) the file is never corrupted. In fact I suspect that the problem is in the caching of data and doubt PInvoke will solve that problem.

 

From: Quincey Koziol <[hidden email]>

 

Hi Ewan,
There?s two things you can be doing to address file corruption issues:

- For the near term, use the techniques and code for managing the metadata cache described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf<https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf>  

- In the next year or so, we will be finishing the ?SWMR? feature, described here:  https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html<https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html>

The metadata cache techniques are rather unsubtle, but will avoid corrupted files until the ?full? SWMR feature is finished.

 

This is fascinating stuff. The SWMR features have a lot of application for us but seem to be taking longer than originally expected. The meta data management tools are of interest - but I am not sure we need fine grained control here - we basically need to have the file valid after every write and so I think the first thing to try is just flush the whole file every write.

 

The other option I am considering is to remove our HDF5 code from the assembly and run it as a standalone service so that in the event of a crash in our application the HDF5 service is still running and hopefully able to flush and close the file gracefully.

 

rgds,

Ewan

 

Original Question:

 

Dear Experts,

We are building a data acquisition and processing system on top of an HDF5 file store. Generally we have been very pleased with HDF5 - great flexibility in data structure, performant, small file size, availability of third party data access tools etc.

However our system needs to run for 36-48 hours at a time - and we are finding that if we (deliberately or accidentally) stop the process while running (and writing data) the file is corrupted and we lose all our work.

We are in C# and wrote our access routines on top of HDF5.net<http://HDF5.net> (which I understand is deprecated). We tend to keep all active pointer objects open for the duration of the process that reads or writes them (file, group and dataset handles in particular).

1) Is there a full featured replacement for HDF5.net<http://HDF5.net> now, that I was unaware of? Previous contenders were found to be missing support for features we depend on. If so will it address the corruption issue?

2) Should we be opening and closing all the entities on every write? I would have thought that would dramatically slow access but perhaps not. Guidance?

3) Are there any other tips to making the file less susceptible to corruption if writing is abandoned unexpectedly?

Please help - this issue could be serious enough to make us reconsider our storage choice, which would be expensive now.

rgds,
Ewan


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5