[hdf-forum] Is locking a file possible?

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Francesc Alted
Hi,

I'm trying to support the locking of a HDF5 file in a multi-process
environment, with no success at the moment.  What I have tried so far
is to call the flock system call right after the opening of an HDF5
file.  Something like this:

"""
        self.file_id = H5Fopen(name, H5F_ACC_RDWR, H5P_DEFAULT)
        # Get the low-level file descriptor
        H5Fget_vfd_handle(self.file_id, H5P_DEFAULT, &file_handle)
        fd = (<int *>file_handle)[0]
        # Lock the file
        flock(fd, LOCK_EX)
"""

Then, I launch several processes that tries to access the same file,
write some dataset, and then close the file (hence, releasing the
lock).  When I run a single instance of the hosting program, it runs
well.  However, whenever I try to run more than one instance
simultaneously, a lot of errors happens (see attachment).

If I use a separate lock file, everything works fine. In that case, I
lock my lockfile, then open the HDF file, write, close the HDF file,
and unlock the lock file.

I have tried both HDF5 1.6.7 and 1.8.1 on a Linux box, with same result.

Any hints on why the above code is not working properly?

Thanks,

--
Francesc Alted
Freelance developer
Tel +34-964-282-249
-------------- next part --------------
HDF5-DIAG: Error detected in HDF5 (1.8.1) thread 0:
  #000: H5Gdeprec.c line 777 in H5Giterate(): group iteration failed
    major: Symbol table
    minor: Iteration failed
  #001: H5G.c line 1657 in H5G_iterate(): error iterating over links
    major: Symbol table
    minor: Iteration failed
  #002: H5Gobj.c line 681 in H5G_obj_iterate(): can't iterate over symbol table
    major: Symbol table
    minor: Iteration failed
  #003: H5Gstab.c line 522 in H5G_stab_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #004: H5B.c line 1218 in H5B_iterate(): iterator function failed
    major: B-Tree node
    minor: Unable to list node
  #005: H5Gnode.c line 1425 in H5G_node_iterate(): unable to load symbol table node
    major: Symbol table
    minor: Unable to load metadata into cache
  #006: H5AC.c line 1970 in H5AC_protect(): H5C_protect() failed.
    major: Object cache
    minor: Unable to protect metadata
  #007: H5C.c line 5928 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #008: H5C.c line 10567 in H5C_load_entry(): unable to load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #009: H5Gnode.c line 384 in H5G_node_load(): unable to read symbol table node
    major: Symbol table
    minor: Read failed
  #010: H5F.c line 3014 in H5F_block_read(): file read failed
    major: Low-level I/O
    minor: Read failed
  #011: H5FD.c line 2046 in H5FD_read(): driver read request failed
    major: Virtual File Layer
    minor: Read failed
  #012: H5FDsec2.c line 725 in H5FD_sec2_read(): addr overflow
    major: Invalid arguments to routine
    minor: Address overflowed
HDF5-DIAG: Error detected in HDF5 (1.8.1) thread 0:
  #000: H5Gdeprec.c line 247 in H5Gcreate1(): unable to create group
    major: Symbol table
    minor: Unable to initialize object
  #001: H5G.c line 266 in H5G_create_named(): unable to create and link to group
    major: Symbol table
    minor: Unable to initialize object
  #002: H5L.c line 1633 in H5L_link_object(): unable to create new link to object
    major: Links
    minor: Unable to initialize object
  #003: H5L.c line 1856 in H5L_create_real(): can't insert link
    major: Symbol table
    minor: Unable to insert object
  #004: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #005: H5Gtraverse.c line 691 in H5G_traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #006: H5L.c line 1713 in H5L_link_cb(): unable to create new link for object
    major: Links
    minor: Unable to initialize object
  #007: H5Gobj.c line 577 in H5G_obj_insert(): unable to insert entry into symbol table
    major: Symbol table
    minor: Unable to insert object
  #008: H5Gstab.c line 294 in H5G_stab_insert(): unable to insert the name
    major: Datatype
    minor: Unable to initialize object
  #009: H5Gstab.c line 249 in H5G_stab_insert_real(): unable to insert entry
    major: Symbol table
    minor: Unable to insert object
  #010: H5B.c line 635 in H5B_insert(): unable to insert key
    major: B-Tree node
    minor: Unable to initialize object
  #011: H5B.c line 1011 in H5B_insert_helper(): can't insert maximum leaf node
    major: B-Tree node
    minor: Unable to insert object
  #012: H5Gnode.c line 1055 in H5G_node_insert(): unable to protect symbol table node
    major: Symbol table
    minor: Unable to load metadata into cache
  #013: H5AC.c line 1970 in H5AC_protect(): H5C_protect() failed.
    major: Object cache
    minor: Unable to protect metadata
  #014: H5C.c line 5928 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #015: H5C.c line 10567 in H5C_load_entry(): unable to load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #016: H5Gnode.c line 391 in H5G_node_load(): bad symbol table node signature
    major: Symbol table
    minor: Unable to load metadata into cache
HDF5-DIAG: Error detected in HDF5 (1.8.1) thread 0:
  #000: H5G.c line 699 in H5Gclose(): not a group
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.8.1) thread 0:
  #000: H5Gdeprec.c line 777 in H5Giterate(): group iteration failed
    major: Symbol table
    minor: Iteration failed
  #001: H5G.c line 1657 in H5G_iterate(): error iterating over links
    major: Symbol table
    minor: Iteration failed
  #002: H5Gobj.c line 681 in H5G_obj_iterate(): can't iterate over symbol table
    major: Symbol table
    minor: Iteration failed
  #003: H5Gstab.c line 522 in H5G_stab_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #004: H5B.c line 1218 in H5B_iterate(): iterator function failed
    major: B-Tree node
    minor: Unable to list node
  #005: H5Gnode.c line 1425 in H5G_node_iterate(): unable to load symbol table node
    major: Symbol table
    minor: Unable to load metadata into cache
  #006: H5AC.c line 1970 in H5AC_protect(): H5C_protect() failed.
    major: Object cache
    minor: Unable to protect metadata
  #007: H5C.c line 5928 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #008: H5C.c line 10567 in H5C_load_entry(): unable to load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #009: H5Gnode.c line 391 in H5G_node_load(): bad symbol table node signature
    major: Symbol table
    minor: Unable to load metadata into cache
Traceback (most recent call last):
  File "test_lock.py", line 32, in <module>
    main()
  File "test_lock.py", line 29, in main
    pool.map(work, range(500), chunksize=1)
  File "/usr/local/lib/python2.6/multiprocessing/pool.py", line 148, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/local/lib/python2.6/multiprocessing/pool.py", line 422, in get
    raise self._value
tables.exceptions.HDF5ExtError: Problems closing the Group group1

-------------- next part --------------
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.

Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Rob Latham
On Mon, Sep 08, 2008 at 08:44:09PM +0200, Francesc Alted wrote:
> I'm trying to support the locking of a HDF5 file in a multi-process
> environment, with no success at the moment.  What I have tried so far
> is to call the flock system call right after the opening of an HDF5
> file.  Something like this:

It sounds like you should be using MPI and the parallel HDF5
interface.  

==rob

--
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Quincey Koziol
In reply to this post by Francesc Alted
Hi Francesc,

On Sep 8, 2008, at 1:44 PM, Francesc Alted wrote:

> Hi,
>
> I'm trying to support the locking of a HDF5 file in a multi-process
> environment, with no success at the moment.  What I have tried so far
> is to call the flock system call right after the opening of an HDF5
> file.  Something like this:
>
> """
>        self.file_id = H5Fopen(name, H5F_ACC_RDWR, H5P_DEFAULT)
>        # Get the low-level file descriptor
>        H5Fget_vfd_handle(self.file_id, H5P_DEFAULT, &file_handle)
>        fd = (<int *>file_handle)[0]
>        # Lock the file
>        flock(fd, LOCK_EX)
> """
>
> Then, I launch several processes that tries to access the same file,
> write some dataset, and then close the file (hence, releasing the
> lock).  When I run a single instance of the hosting program, it runs
> well.  However, whenever I try to run more than one instance
> simultaneously, a lot of errors happens (see attachment).
>
> If I use a separate lock file, everything works fine. In that case, I
> lock my lockfile, then open the HDF file, write, close the HDF file,
> and unlock the lock file.
>
> I have tried both HDF5 1.6.7 and 1.8.1 on a Linux box, with same  
> result.
>
> Any hints on why the above code is not working properly?

        You are fighting the metadata cache in HDF5.  Unfortunately there's  
currently no way to evict all the entries from the cache, even if you  
call H5Fflush(), so it's very likely that one or more of the processes  
will be dealing with stale metadata.  I've added a new feature request  
to our bugzilla database and maybe we'll be able to act on it at some  
point.

        Quincey


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Francesc Alted
A Monday 08 September 2008, Quincey Koziol escrigu?:

> Hi Francesc,
>
> On Sep 8, 2008, at 1:44 PM, Francesc Alted wrote:
> > Hi,
> >
> > I'm trying to support the locking of a HDF5 file in a multi-process
> > environment, with no success at the moment.  What I have tried so
> > far is to call the flock system call right after the opening of an
> > HDF5 file.  Something like this:
> >
> > """
> >        self.file_id = H5Fopen(name, H5F_ACC_RDWR, H5P_DEFAULT)
> >        # Get the low-level file descriptor
> >        H5Fget_vfd_handle(self.file_id, H5P_DEFAULT, &file_handle)
> >        fd = (<int *>file_handle)[0]
> >        # Lock the file
> >        flock(fd, LOCK_EX)
> > """
> >
> > Then, I launch several processes that tries to access the same
> > file, write some dataset, and then close the file (hence, releasing
> > the lock).  When I run a single instance of the hosting program, it
> > runs well.  However, whenever I try to run more than one instance
> > simultaneously, a lot of errors happens (see attachment).
> >
> > If I use a separate lock file, everything works fine. In that case,
> > I lock my lockfile, then open the HDF file, write, close the HDF
> > file, and unlock the lock file.
> >
> > I have tried both HDF5 1.6.7 and 1.8.1 on a Linux box, with same
> > result.
> >
> > Any hints on why the above code is not working properly?
>
> You are fighting the metadata cache in HDF5.  Unfortunately there's
> currently no way to evict all the entries from the cache, even if you
> call H5Fflush(), so it's very likely that one or more of the
> processes will be dealing with stale metadata.  I've added a new
> feature request to our bugzilla database and maybe we'll be able to
> act on it at some point.

I see.  At any rate, I find it curious that locking using a regular file
works flawlessly in the same scenario.

Thanks,

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Quincey Koziol
Hi Francesc,

On Sep 9, 2008, at 5:36 AM, Francesc Alted wrote:

> A Monday 08 September 2008, Quincey Koziol escrigu?:
>> Hi Francesc,
>>
>> On Sep 8, 2008, at 1:44 PM, Francesc Alted wrote:
>>> Hi,
>>>
>>> I'm trying to support the locking of a HDF5 file in a multi-process
>>> environment, with no success at the moment.  What I have tried so
>>> far is to call the flock system call right after the opening of an
>>> HDF5 file.  Something like this:
>>>
>>> """
>>>       self.file_id = H5Fopen(name, H5F_ACC_RDWR, H5P_DEFAULT)
>>>       # Get the low-level file descriptor
>>>       H5Fget_vfd_handle(self.file_id, H5P_DEFAULT, &file_handle)
>>>       fd = (<int *>file_handle)[0]
>>>       # Lock the file
>>>       flock(fd, LOCK_EX)
>>> """
>>>
>>> Then, I launch several processes that tries to access the same
>>> file, write some dataset, and then close the file (hence, releasing
>>> the lock).  When I run a single instance of the hosting program, it
>>> runs well.  However, whenever I try to run more than one instance
>>> simultaneously, a lot of errors happens (see attachment).
>>>
>>> If I use a separate lock file, everything works fine. In that case,
>>> I lock my lockfile, then open the HDF file, write, close the HDF
>>> file, and unlock the lock file.
>>>
>>> I have tried both HDF5 1.6.7 and 1.8.1 on a Linux box, with same
>>> result.
>>>
>>> Any hints on why the above code is not working properly?
>>
>> You are fighting the metadata cache in HDF5.  Unfortunately there's
>> currently no way to evict all the entries from the cache, even if you
>> call H5Fflush(), so it's very likely that one or more of the
>> processes will be dealing with stale metadata.  I've added a new
>> feature request to our bugzilla database and maybe we'll be able to
>> act on it at some point.
>
> I see.  At any rate, I find it curious that locking using a regular  
> file
> works flawlessly in the same scenario.

        Locking using a regular file works because you are closing & re-
opening the HDF5 file for each process (which flushes all the metadata  
changes to the file on closing and re-reads them on re-opening the  
file).

        Quincey


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Francesc Alted
A Tuesday 09 September 2008, Quincey Koziol escrigu?:
[clip]

> >> You are fighting the metadata cache in HDF5.  Unfortunately
> >> there's currently no way to evict all the entries from the cache,
> >> even if you call H5Fflush(), so it's very likely that one or more
> >> of the processes will be dealing with stale metadata.  I've added
> >> a new feature request to our bugzilla database and maybe we'll be
> >> able to act on it at some point.
> >
> > I see.  At any rate, I find it curious that locking using a regular
> > file
> > works flawlessly in the same scenario.
>
> Locking using a regular file works because you are closing & re-
> opening the HDF5 file for each process (which flushes all the
> metadata changes to the file on closing and re-reads them on
> re-opening the file).

So, when using the HDF5 file itself for locking, as the lock process
happens after the library has already opened the file then it already
has read bits from stalled metadata cache.  Now I definitely see it.

Thanks for the explanation!

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Francesc Alted
A Tuesday 09 September 2008, Francesc Alted escrigu?:

> A Tuesday 09 September 2008, Quincey Koziol escrigu?:
> [clip]
>
> > >> You are fighting the metadata cache in HDF5.  Unfortunately
> > >> there's currently no way to evict all the entries from the
> > >> cache, even if you call H5Fflush(), so it's very likely that one
> > >> or more of the processes will be dealing with stale metadata.
> > >> I've added a new feature request to our bugzilla database and
> > >> maybe we'll be able to act on it at some point.
> > >
> > > I see.  At any rate, I find it curious that locking using a
> > > regular file
> > > works flawlessly in the same scenario.
> >
> > Locking using a regular file works because you are closing & re-
> > opening the HDF5 file for each process (which flushes all the
> > metadata changes to the file on closing and re-reads them on
> > re-opening the file).
>
> So, when using the HDF5 file itself for locking, as the lock process
> happens after the library has already opened the file then it already
> has read bits from stalled metadata cache.  Now I definitely see it.

Hmm, not quite.  After thinking a bit more on this issue, I think now
that the problem is not in the metadata cache, but it is a more
fundamental one: I'm effectively opening a file (and hence, reading
metadata, either from cache or from disk) *before* locking it, and that
will always lead to wrong results, irregardless of an existing cache or
not.

I can devise a couple of solutions for this.  The first one is to add a
new parameter to the H5Fopen to inform it that we want to lock the file
as soon as the file descriptor is allocated and before reading any
meta-information (either from disk or cache), but that implies an API
change.

The other solution is to increase the lazyness of the process of reading
the metadata until it is absolutely needed by other functions.  So, in
essence, the H5Fopen() should only basically have to open the
underlying file descriptor and that's all; then this descriptor can be
manually locked and the file metadata should be read later on, when it
is really needed.

All in all, both approaches seems to need too much changes in HDF5.  
Perhaps a better venue is to find alternatives to do the locking in the
application side instead of including the functionality in HDF5 itself.

Cheers,

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Quincey Koziol
Hi Francesc,

On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:

> A Tuesday 09 September 2008, Francesc Alted escrigu?:
>> A Tuesday 09 September 2008, Quincey Koziol escrigu?:
>> [clip]
>>
>>>>> You are fighting the metadata cache in HDF5.  Unfortunately
>>>>> there's currently no way to evict all the entries from the
>>>>> cache, even if you call H5Fflush(), so it's very likely that one
>>>>> or more of the processes will be dealing with stale metadata.
>>>>> I've added a new feature request to our bugzilla database and
>>>>> maybe we'll be able to act on it at some point.
>>>>
>>>> I see.  At any rate, I find it curious that locking using a
>>>> regular file
>>>> works flawlessly in the same scenario.
>>>
>>> Locking using a regular file works because you are closing & re-
>>> opening the HDF5 file for each process (which flushes all the
>>> metadata changes to the file on closing and re-reads them on
>>> re-opening the file).
>>
>> So, when using the HDF5 file itself for locking, as the lock process
>> happens after the library has already opened the file then it already
>> has read bits from stalled metadata cache.  Now I definitely see it.
>
> Hmm, not quite.  After thinking a bit more on this issue, I think now
> that the problem is not in the metadata cache, but it is a more
> fundamental one: I'm effectively opening a file (and hence, reading
> metadata, either from cache or from disk) *before* locking it, and  
> that
> will always lead to wrong results, irregardless of an existing cache  
> or
> not.
>
> I can devise a couple of solutions for this.  The first one is to  
> add a
> new parameter to the H5Fopen to inform it that we want to lock the  
> file
> as soon as the file descriptor is allocated and before reading any
> meta-information (either from disk or cache), but that implies an API
> change.
>
> The other solution is to increase the lazyness of the process of  
> reading
> the metadata until it is absolutely needed by other functions.  So, in
> essence, the H5Fopen() should only basically have to open the
> underlying file descriptor and that's all; then this descriptor can be
> manually locked and the file metadata should be read later on, when it
> is really needed.
>
> All in all, both approaches seems to need too much changes in HDF5.
> Perhaps a better venue is to find alternatives to do the locking in  
> the
> application side instead of including the functionality in HDF5  
> itself.

        Those are both interesting ideas that I hadn't thought of.  What I  
was thinking was to evict all the metadata from the cache and then re-
read it from the file.  This could be done at any point after the file  
was opened, although it would require that all objects in the file be  
closed when the cache entries were evicted.

        Quincey


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Francesc Alted
Quincey,

A Wednesday 10 September 2008, escrigu?reu:

> Hi Francesc,
>
> On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
> > A Tuesday 09 September 2008, Francesc Alted escrigu?:
> >> A Tuesday 09 September 2008, Quincey Koziol escrigu?:
> >> [clip]
> >>
> >>>>> You are fighting the metadata cache in HDF5.  Unfortunately
> >>>>> there's currently no way to evict all the entries from the
> >>>>> cache, even if you call H5Fflush(), so it's very likely that
> >>>>> one or more of the processes will be dealing with stale
> >>>>> metadata. I've added a new feature request to our bugzilla
> >>>>> database and maybe we'll be able to act on it at some point.
> >>>>
> >>>> I see.  At any rate, I find it curious that locking using a
> >>>> regular file
> >>>> works flawlessly in the same scenario.
> >>>
> >>> Locking using a regular file works because you are closing & re-
> >>> opening the HDF5 file for each process (which flushes all the
> >>> metadata changes to the file on closing and re-reads them on
> >>> re-opening the file).
> >>
> >> So, when using the HDF5 file itself for locking, as the lock
> >> process happens after the library has already opened the file then
> >> it already has read bits from stalled metadata cache.  Now I
> >> definitely see it.
> >
> > Hmm, not quite.  After thinking a bit more on this issue, I think
> > now that the problem is not in the metadata cache, but it is a more
> > fundamental one: I'm effectively opening a file (and hence, reading
> > metadata, either from cache or from disk) *before* locking it, and
> > that
> > will always lead to wrong results, irregardless of an existing
> > cache or
> > not.
> >
> > I can devise a couple of solutions for this.  The first one is to
> > add a
> > new parameter to the H5Fopen to inform it that we want to lock the
> > file
> > as soon as the file descriptor is allocated and before reading any
> > meta-information (either from disk or cache), but that implies an
> > API change.
> >
> > The other solution is to increase the lazyness of the process of
> > reading
> > the metadata until it is absolutely needed by other functions.  So,
> > in essence, the H5Fopen() should only basically have to open the
> > underlying file descriptor and that's all; then this descriptor can
> > be manually locked and the file metadata should be read later on,
> > when it is really needed.
> >
> > All in all, both approaches seems to need too much changes in HDF5.
> > Perhaps a better venue is to find alternatives to do the locking in
> > the
> > application side instead of including the functionality in HDF5
> > itself.
>
> Those are both interesting ideas that I hadn't thought of.  What I
> was thinking was to evict all the metadata from the cache and then
> re- read it from the file.  This could be done at any point after the
> file was opened, although it would require that all objects in the
> file be closed when the cache entries were evicted.

Well, I suppose that my ignorance on the internals of HDF5 is preventing
me understanding your solution.  Let's suppose that we have 2 processes
on a multi-processor machine.  Let's call them process 'a' and
process 'b'.  Both processes do the same thing: from time to time they
open a HDF5 file, lock it, write something on it, and close it
(unlocking it).

If process 'a' gets the lock first, then process 'b' will *open* the
file and will block until the file becomes unlocked.  While process 'b'
is waiting, process 'a' writes a bunch of data in the file.  When 'a'
finishes the writing and unlock the file then process 'b' unblocks and
gets the lock.  But, by then (and this is main point), process 'b'
already has got internal information about the opened file that is
outdated.

The only way that I see to avoid the problem is that the information
about the opened file in process 'b' would exclusively reside in the
metadata cache; so by refreshing it (or evicting it) the new processes
can get the correct information.  However, that solution does imply
that the HDF5 metadata cache is to be *shared* between both processes,
and I don't think this would be the case.

Cheers,

--
Francesc Alted
Freelance developer
Tel +34-964-282-249



Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Dimitris Servis
2008/9/10 Francesc Alted <faltet at pytables.com>

> Quincey,
>
> A Wednesday 10 September 2008, escrigu?reu:
> > Hi Francesc,
> >
> > On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
> > > A Tuesday 09 September 2008, Francesc Alted escrigu?:
> > >> A Tuesday 09 September 2008, Quincey Koziol escrigu?:
> > >> [clip]
> > >>
> > >>>>>         You are fighting the metadata cache in HDF5.  Unfortunately
> > >>>>> there's currently no way to evict all the entries from the
> > >>>>> cache, even if you call H5Fflush(), so it's very likely that
> > >>>>> one or more of the processes will be dealing with stale
> > >>>>> metadata. I've added a new feature request to our bugzilla
> > >>>>> database and maybe we'll be able to act on it at some point.
> > >>>>
> > >>>> I see.  At any rate, I find it curious that locking using a
> > >>>> regular file
> > >>>> works flawlessly in the same scenario.
> > >>>
> > >>>   Locking using a regular file works because you are closing & re-
> > >>> opening the HDF5 file for each process (which flushes all the
> > >>> metadata changes to the file on closing and re-reads them on
> > >>> re-opening the file).
> > >>
> > >> So, when using the HDF5 file itself for locking, as the lock
> > >> process happens after the library has already opened the file then
> > >> it already has read bits from stalled metadata cache.  Now I
> > >> definitely see it.
> > >
> > > Hmm, not quite.  After thinking a bit more on this issue, I think
> > > now that the problem is not in the metadata cache, but it is a more
> > > fundamental one: I'm effectively opening a file (and hence, reading
> > > metadata, either from cache or from disk) *before* locking it, and
> > > that
> > > will always lead to wrong results, irregardless of an existing
> > > cache or
> > > not.
> > >
> > > I can devise a couple of solutions for this.  The first one is to
> > > add a
> > > new parameter to the H5Fopen to inform it that we want to lock the
> > > file
> > > as soon as the file descriptor is allocated and before reading any
> > > meta-information (either from disk or cache), but that implies an
> > > API change.
> > >
> > > The other solution is to increase the lazyness of the process of
> > > reading
> > > the metadata until it is absolutely needed by other functions.  So,
> > > in essence, the H5Fopen() should only basically have to open the
> > > underlying file descriptor and that's all; then this descriptor can
> > > be manually locked and the file metadata should be read later on,
> > > when it is really needed.
> > >
> > > All in all, both approaches seems to need too much changes in HDF5.
> > > Perhaps a better venue is to find alternatives to do the locking in
> > > the
> > > application side instead of including the functionality in HDF5
> > > itself.
> >
> >       Those are both interesting ideas that I hadn't thought of.  What I
> > was thinking was to evict all the metadata from the cache and then
> > re- read it from the file.  This could be done at any point after the
> > file was opened, although it would require that all objects in the
> > file be closed when the cache entries were evicted.
>
> Well, I suppose that my ignorance on the internals of HDF5 is preventing
> me understanding your solution.  Let's suppose that we have 2 processes
> on a multi-processor machine.  Let's call them process 'a' and
> process 'b'.  Both processes do the same thing: from time to time they
> open a HDF5 file, lock it, write something on it, and close it
> (unlocking it).
>
> If process 'a' gets the lock first, then process 'b' will *open* the
> file and will block until the file becomes unlocked.  While process 'b'
> is waiting, process 'a' writes a bunch of data in the file.  When 'a'
> finishes the writing and unlock the file then process 'b' unblocks and
> gets the lock.  But, by then (and this is main point), process 'b'
> already has got internal information about the opened file that is
> outdated.
>
> The only way that I see to avoid the problem is that the information
> about the opened file in process 'b' would exclusively reside in the
> metadata cache; so by refreshing it (or evicting it) the new processes
> can get the correct information.  However, that solution does imply
> that the HDF5 metadata cache is to be *shared* between both processes,
> and I don't think this would be the case.
>

Hi Francesc,

I have similar issues and I think you're right when you say that this should
be solved at the application layer. It is pretty difficult when the library
cannot manage its own space to have efficient locking. What if for example
the user deletes the file? Or another process wants to move the file? For
the moment I think it is difficult to deal with this effectively so I will
try to solve it with the old hack of the flag: when a process enters the
root (in my case also other major nodes in the tree) I set an attribute and
the last thing the process does is to unset the attribute. This way I also
know if there was an issue and writing failed.

Just a thought...

Regards,

-- dimitris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20080910/a7bfe663/attachment.html>

Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Quincey Koziol
In reply to this post by Francesc Alted
Hi Francesc,

On Sep 10, 2008, at 11:28 AM, Francesc Alted wrote:

> Quincey,
>
> A Wednesday 10 September 2008, escrigu?reu:
>> Hi Francesc,
>>
>> On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
>>> A Tuesday 09 September 2008, Francesc Alted escrigu?:
>>>> A Tuesday 09 September 2008, Quincey Koziol escrigu?:
>>>> [clip]
>>>>
>>>>>>> You are fighting the metadata cache in HDF5.  Unfortunately
>>>>>>> there's currently no way to evict all the entries from the
>>>>>>> cache, even if you call H5Fflush(), so it's very likely that
>>>>>>> one or more of the processes will be dealing with stale
>>>>>>> metadata. I've added a new feature request to our bugzilla
>>>>>>> database and maybe we'll be able to act on it at some point.
>>>>>>
>>>>>> I see.  At any rate, I find it curious that locking using a
>>>>>> regular file
>>>>>> works flawlessly in the same scenario.
>>>>>
>>>>> Locking using a regular file works because you are closing & re-
>>>>> opening the HDF5 file for each process (which flushes all the
>>>>> metadata changes to the file on closing and re-reads them on
>>>>> re-opening the file).
>>>>
>>>> So, when using the HDF5 file itself for locking, as the lock
>>>> process happens after the library has already opened the file then
>>>> it already has read bits from stalled metadata cache.  Now I
>>>> definitely see it.
>>>
>>> Hmm, not quite.  After thinking a bit more on this issue, I think
>>> now that the problem is not in the metadata cache, but it is a more
>>> fundamental one: I'm effectively opening a file (and hence, reading
>>> metadata, either from cache or from disk) *before* locking it, and
>>> that
>>> will always lead to wrong results, irregardless of an existing
>>> cache or
>>> not.
>>>
>>> I can devise a couple of solutions for this.  The first one is to
>>> add a
>>> new parameter to the H5Fopen to inform it that we want to lock the
>>> file
>>> as soon as the file descriptor is allocated and before reading any
>>> meta-information (either from disk or cache), but that implies an
>>> API change.
>>>
>>> The other solution is to increase the lazyness of the process of
>>> reading
>>> the metadata until it is absolutely needed by other functions.  So,
>>> in essence, the H5Fopen() should only basically have to open the
>>> underlying file descriptor and that's all; then this descriptor can
>>> be manually locked and the file metadata should be read later on,
>>> when it is really needed.
>>>
>>> All in all, both approaches seems to need too much changes in HDF5.
>>> Perhaps a better venue is to find alternatives to do the locking in
>>> the
>>> application side instead of including the functionality in HDF5
>>> itself.
>>
>> Those are both interesting ideas that I hadn't thought of.  What I
>> was thinking was to evict all the metadata from the cache and then
>> re- read it from the file.  This could be done at any point after the
>> file was opened, although it would require that all objects in the
>> file be closed when the cache entries were evicted.
>
> Well, I suppose that my ignorance on the internals of HDF5 is  
> preventing
> me understanding your solution.  Let's suppose that we have 2  
> processes
> on a multi-processor machine.  Let's call them process 'a' and
> process 'b'.  Both processes do the same thing: from time to time they
> open a HDF5 file, lock it, write something on it, and close it
> (unlocking it).
>
> If process 'a' gets the lock first, then process 'b' will *open* the
> file and will block until the file becomes unlocked.  While process  
> 'b'
> is waiting, process 'a' writes a bunch of data in the file.  When 'a'
> finishes the writing and unlock the file then process 'b' unblocks and
> gets the lock.  But, by then (and this is main point), process 'b'
> already has got internal information about the opened file that is
> outdated.
>
> The only way that I see to avoid the problem is that the information
> about the opened file in process 'b' would exclusively reside in the
> metadata cache; so by refreshing it (or evicting it) the new processes
> can get the correct information.  However, that solution does imply
> that the HDF5 metadata cache is to be *shared* between both processes,
> and I don't think this would be the case.

        No, the metadata cache doesn't need to be shared between both  
processes.  As long as each process evicts all it's metadata from the  
cache after it acquires the lock and flushes it's cache after it's  
done modifying the file but before releasing the lock, everything will  
work fine.  Since each process has no knowledge of the contents of the  
file after evicting everything in it's cache, it will always get the  
most recent information from the file and therefore see all the  
changes from previous lock owners.

        Quincey


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Quincey Koziol
In reply to this post by Dimitris Servis
Hi Dimitris,

On Sep 10, 2008, at 12:35 PM, Dimitris Servis wrote:

>
> 2008/9/10 Francesc Alted <faltet at pytables.com>
> Quincey,
>
> A Wednesday 10 September 2008, escrigu?reu:
> > Hi Francesc,
> >
> > On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
> > > A Tuesday 09 September 2008, Francesc Alted escrigu?:
> > >> A Tuesday 09 September 2008, Quincey Koziol escrigu?:
> > >> [clip]
> > >>
> > >>>>>         You are fighting the metadata cache in HDF5.  
> Unfortunately
> > >>>>> there's currently no way to evict all the entries from the
> > >>>>> cache, even if you call H5Fflush(), so it's very likely that
> > >>>>> one or more of the processes will be dealing with stale
> > >>>>> metadata. I've added a new feature request to our bugzilla
> > >>>>> database and maybe we'll be able to act on it at some point.
> > >>>>
> > >>>> I see.  At any rate, I find it curious that locking using a
> > >>>> regular file
> > >>>> works flawlessly in the same scenario.
> > >>>
> > >>>   Locking using a regular file works because you are closing &  
> re-
> > >>> opening the HDF5 file for each process (which flushes all the
> > >>> metadata changes to the file on closing and re-reads them on
> > >>> re-opening the file).
> > >>
> > >> So, when using the HDF5 file itself for locking, as the lock
> > >> process happens after the library has already opened the file  
> then
> > >> it already has read bits from stalled metadata cache.  Now I
> > >> definitely see it.
> > >
> > > Hmm, not quite.  After thinking a bit more on this issue, I think
> > > now that the problem is not in the metadata cache, but it is a  
> more
> > > fundamental one: I'm effectively opening a file (and hence,  
> reading
> > > metadata, either from cache or from disk) *before* locking it, and
> > > that
> > > will always lead to wrong results, irregardless of an existing
> > > cache or
> > > not.
> > >
> > > I can devise a couple of solutions for this.  The first one is to
> > > add a
> > > new parameter to the H5Fopen to inform it that we want to lock the
> > > file
> > > as soon as the file descriptor is allocated and before reading any
> > > meta-information (either from disk or cache), but that implies an
> > > API change.
> > >
> > > The other solution is to increase the lazyness of the process of
> > > reading
> > > the metadata until it is absolutely needed by other functions.  
> So,
> > > in essence, the H5Fopen() should only basically have to open the
> > > underlying file descriptor and that's all; then this descriptor  
> can
> > > be manually locked and the file metadata should be read later on,
> > > when it is really needed.
> > >
> > > All in all, both approaches seems to need too much changes in  
> HDF5.
> > > Perhaps a better venue is to find alternatives to do the locking  
> in
> > > the
> > > application side instead of including the functionality in HDF5
> > > itself.
> >
> >       Those are both interesting ideas that I hadn't thought of.  
> What I
> > was thinking was to evict all the metadata from the cache and then
> > re- read it from the file.  This could be done at any point after  
> the
> > file was opened, although it would require that all objects in the
> > file be closed when the cache entries were evicted.
>
> Well, I suppose that my ignorance on the internals of HDF5 is  
> preventing
> me understanding your solution.  Let's suppose that we have 2  
> processes
> on a multi-processor machine.  Let's call them process 'a' and
> process 'b'.  Both processes do the same thing: from time to time they
> open a HDF5 file, lock it, write something on it, and close it
> (unlocking it).
>
> If process 'a' gets the lock first, then process 'b' will *open* the
> file and will block until the file becomes unlocked.  While process  
> 'b'
> is waiting, process 'a' writes a bunch of data in the file.  When 'a'
> finishes the writing and unlock the file then process 'b' unblocks and
> gets the lock.  But, by then (and this is main point), process 'b'
> already has got internal information about the opened file that is
> outdated.
>
> The only way that I see to avoid the problem is that the information
> about the opened file in process 'b' would exclusively reside in the
> metadata cache; so by refreshing it (or evicting it) the new processes
> can get the correct information.  However, that solution does imply
> that the HDF5 metadata cache is to be *shared* between both processes,
> and I don't think this would be the case.
>
> Hi Francesc,
>
> I have similar issues and I think you're right when you say that  
> this should be solved at the application layer. It is pretty  
> difficult when the library cannot manage its own space to have  
> efficient locking. What if for example the user deletes the file? Or  
> another process wants to move the file? For the moment I think it is  
> difficult to deal with this effectively so I will try to solve it  
> with the old hack of the flag: when a process enters the root (in my  
> case also other major nodes in the tree) I set an attribute and the  
> last thing the process does is to unset the attribute. This way I  
> also know if there was an issue and writing failed.

        Setting a flag in the file is not sufficient.  It's easy to imagine  
race conditions where two processes simultaneously check for the  
presence of the flag, determine it doesn't exist and set it, then  
proceed to modify the file.  Some other mechanism which guarantees  
exclusive access must be used.  (And even then, you'll have to use the  
cache management strategies I mentioned in an earlier mail).

        Note that we've given this a fair bit of thought at the HDF Group and  
have some good solutions, but would need to get funding/patches for  
this to get into the HDF5 library.

        Quincey


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Dimitris Servis
Hi Quincey,

2008/9/10 Quincey Koziol <koziol at hdfgroup.org>

> Hi Dimitris,
>
>
> On Sep 10, 2008, at 12:35 PM, Dimitris Servis wrote:
>
>
>> 2008/9/10 Francesc Alted <faltet at pytables.com>
>> Quincey,
>>
>> A Wednesday 10 September 2008, escrigu?reu:
>> > Hi Francesc,
>> >
>> > On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
>> > > A Tuesday 09 September 2008, Francesc Alted escrigu?:
>> > >> A Tuesday 09 September 2008, Quincey Koziol escrigu?:
>> > >> [clip]
>> > >>
>> > >>>>>         You are fighting the metadata cache in HDF5.
>>  Unfortunately
>> > >>>>> there's currently no way to evict all the entries from the
>> > >>>>> cache, even if you call H5Fflush(), so it's very likely that
>> > >>>>> one or more of the processes will be dealing with stale
>> > >>>>> metadata. I've added a new feature request to our bugzilla
>> > >>>>> database and maybe we'll be able to act on it at some point.
>> > >>>>
>> > >>>> I see.  At any rate, I find it curious that locking using a
>> > >>>> regular file
>> > >>>> works flawlessly in the same scenario.
>> > >>>
>> > >>>   Locking using a regular file works because you are closing & re-
>> > >>> opening the HDF5 file for each process (which flushes all the
>> > >>> metadata changes to the file on closing and re-reads them on
>> > >>> re-opening the file).
>> > >>
>> > >> So, when using the HDF5 file itself for locking, as the lock
>> > >> process happens after the library has already opened the file then
>> > >> it already has read bits from stalled metadata cache.  Now I
>> > >> definitely see it.
>> > >
>> > > Hmm, not quite.  After thinking a bit more on this issue, I think
>> > > now that the problem is not in the metadata cache, but it is a more
>> > > fundamental one: I'm effectively opening a file (and hence, reading
>> > > metadata, either from cache or from disk) *before* locking it, and
>> > > that
>> > > will always lead to wrong results, irregardless of an existing
>> > > cache or
>> > > not.
>> > >
>> > > I can devise a couple of solutions for this.  The first one is to
>> > > add a
>> > > new parameter to the H5Fopen to inform it that we want to lock the
>> > > file
>> > > as soon as the file descriptor is allocated and before reading any
>> > > meta-information (either from disk or cache), but that implies an
>> > > API change.
>> > >
>> > > The other solution is to increase the lazyness of the process of
>> > > reading
>> > > the metadata until it is absolutely needed by other functions.  So,
>> > > in essence, the H5Fopen() should only basically have to open the
>> > > underlying file descriptor and that's all; then this descriptor can
>> > > be manually locked and the file metadata should be read later on,
>> > > when it is really needed.
>> > >
>> > > All in all, both approaches seems to need too much changes in HDF5.
>> > > Perhaps a better venue is to find alternatives to do the locking in
>> > > the
>> > > application side instead of including the functionality in HDF5
>> > > itself.
>> >
>> >       Those are both interesting ideas that I hadn't thought of.  What I
>> > was thinking was to evict all the metadata from the cache and then
>> > re- read it from the file.  This could be done at any point after the
>> > file was opened, although it would require that all objects in the
>> > file be closed when the cache entries were evicted.
>>
>> Well, I suppose that my ignorance on the internals of HDF5 is preventing
>> me understanding your solution.  Let's suppose that we have 2 processes
>> on a multi-processor machine.  Let's call them process 'a' and
>> process 'b'.  Both processes do the same thing: from time to time they
>> open a HDF5 file, lock it, write something on it, and close it
>> (unlocking it).
>>
>> If process 'a' gets the lock first, then process 'b' will *open* the
>> file and will block until the file becomes unlocked.  While process 'b'
>> is waiting, process 'a' writes a bunch of data in the file.  When 'a'
>> finishes the writing and unlock the file then process 'b' unblocks and
>> gets the lock.  But, by then (and this is main point), process 'b'
>> already has got internal information about the opened file that is
>> outdated.
>>
>> The only way that I see to avoid the problem is that the information
>> about the opened file in process 'b' would exclusively reside in the
>> metadata cache; so by refreshing it (or evicting it) the new processes
>> can get the correct information.  However, that solution does imply
>> that the HDF5 metadata cache is to be *shared* between both processes,
>> and I don't think this would be the case.
>>
>> Hi Francesc,
>>
>> I have similar issues and I think you're right when you say that this
>> should be solved at the application layer. It is pretty difficult when the
>> library cannot manage its own space to have efficient locking. What if for
>> example the user deletes the file? Or another process wants to move the
>> file? For the moment I think it is difficult to deal with this effectively
>> so I will try to solve it with the old hack of the flag: when a process
>> enters the root (in my case also other major nodes in the tree) I set an
>> attribute and the last thing the process does is to unset the attribute.
>> This way I also know if there was an issue and writing failed.
>>
>
>        Setting a flag in the file is not sufficient.  It's easy to imagine
> race conditions where two processes simultaneously check for the presence of
> the flag, determine it doesn't exist and set it, then proceed to modify the
> file.  Some other mechanism which guarantees exclusive access must be used.
>  (And even then, you'll have to use the cache management strategies I
> mentioned in an earlier mail).
>
>        Note that we've given this a fair bit of thought at the HDF Group
> and have some good solutions, but would need to get funding/patches for this
> to get into the HDF5 library.
>
>        Quincey


I know it is not sufficient but locking will work only at the application
level and AFAIK a portable solution will try to use whole file locks or
separate lock files but that is done at a different level than HDF lib. For
a process to decide what to do, it has to check the locks and the special
attributes. Note also, that linking HDF5 statically, means each process'
cache is different anyway.

I am sure you've given it a good deal of thought but for the benefit of
having single files and no services/daemons, efficient file locking is
sacrificed and becomes cumbersome.

Best Regards,

-- dimitris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20080910/6a65b97f/attachment.html>

Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Francesc Alted
In reply to this post by Quincey Koziol
A Wednesday 10 September 2008, Quincey Koziol escrigu?:

> Hi Francesc,
>
> On Sep 10, 2008, at 11:28 AM, Francesc Alted wrote:
> > Quincey,
> >
> > A Wednesday 10 September 2008, escrigu?reu:
> >> Hi Francesc,
> >>
> >> On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
> >>> A Tuesday 09 September 2008, Francesc Alted escrigu?:
> >>>> A Tuesday 09 September 2008, Quincey Koziol escrigu?:
> >>>> [clip]
> >>>>
> >>>>>>> You are fighting the metadata cache in HDF5.  Unfortunately
> >>>>>>> there's currently no way to evict all the entries from the
> >>>>>>> cache, even if you call H5Fflush(), so it's very likely that
> >>>>>>> one or more of the processes will be dealing with stale
> >>>>>>> metadata. I've added a new feature request to our bugzilla
> >>>>>>> database and maybe we'll be able to act on it at some point.
> >>>>>>
> >>>>>> I see.  At any rate, I find it curious that locking using a
> >>>>>> regular file
> >>>>>> works flawlessly in the same scenario.
> >>>>>
> >>>>> Locking using a regular file works because you are closing &
> >>>>> re- opening the HDF5 file for each process (which flushes all
> >>>>> the metadata changes to the file on closing and re-reads them
> >>>>> on re-opening the file).
> >>>>
> >>>> So, when using the HDF5 file itself for locking, as the lock
> >>>> process happens after the library has already opened the file
> >>>> then it already has read bits from stalled metadata cache.  Now
> >>>> I definitely see it.
> >>>
> >>> Hmm, not quite.  After thinking a bit more on this issue, I think
> >>> now that the problem is not in the metadata cache, but it is a
> >>> more fundamental one: I'm effectively opening a file (and hence,
> >>> reading metadata, either from cache or from disk) *before*
> >>> locking it, and that
> >>> will always lead to wrong results, irregardless of an existing
> >>> cache or
> >>> not.
> >>>
> >>> I can devise a couple of solutions for this.  The first one is to
> >>> add a
> >>> new parameter to the H5Fopen to inform it that we want to lock
> >>> the file
> >>> as soon as the file descriptor is allocated and before reading
> >>> any meta-information (either from disk or cache), but that
> >>> implies an API change.
> >>>
> >>> The other solution is to increase the lazyness of the process of
> >>> reading
> >>> the metadata until it is absolutely needed by other functions.
> >>> So, in essence, the H5Fopen() should only basically have to open
> >>> the underlying file descriptor and that's all; then this
> >>> descriptor can be manually locked and the file metadata should be
> >>> read later on, when it is really needed.
> >>>
> >>> All in all, both approaches seems to need too much changes in
> >>> HDF5. Perhaps a better venue is to find alternatives to do the
> >>> locking in the
> >>> application side instead of including the functionality in HDF5
> >>> itself.
> >>
> >> Those are both interesting ideas that I hadn't thought of.  What
> >> I was thinking was to evict all the metadata from the cache and
> >> then re- read it from the file.  This could be done at any point
> >> after the file was opened, although it would require that all
> >> objects in the file be closed when the cache entries were evicted.
> >
> > Well, I suppose that my ignorance on the internals of HDF5 is
> > preventing
> > me understanding your solution.  Let's suppose that we have 2
> > processes
> > on a multi-processor machine.  Let's call them process 'a' and
> > process 'b'.  Both processes do the same thing: from time to time
> > they open a HDF5 file, lock it, write something on it, and close it
> > (unlocking it).
> >
> > If process 'a' gets the lock first, then process 'b' will *open*
> > the file and will block until the file becomes unlocked.  While
> > process 'b'
> > is waiting, process 'a' writes a bunch of data in the file.  When
> > 'a' finishes the writing and unlock the file then process 'b'
> > unblocks and gets the lock.  But, by then (and this is main point),
> > process 'b' already has got internal information about the opened
> > file that is outdated.
> >
> > The only way that I see to avoid the problem is that the
> > information about the opened file in process 'b' would exclusively
> > reside in the metadata cache; so by refreshing it (or evicting it)
> > the new processes can get the correct information.  However, that
> > solution does imply that the HDF5 metadata cache is to be *shared*
> > between both processes, and I don't think this would be the case.
>
> No, the metadata cache doesn't need to be shared between both
> processes.  As long as each process evicts all it's metadata from the
> cache after it acquires the lock and flushes it's cache after it's
> done modifying the file but before releasing the lock, everything
> will work fine.  Since each process has no knowledge of the contents
> of the file after evicting everything in it's cache, it will always
> get the most recent information from the file and therefore see all
> the changes from previous lock owners.

Ah, I think I finally got what you meant.  So, as I understand it, here
it is a small workflow of actions that reproduces your schema:

1. <open_file>
2. <acquire_lock>
3. <evict_metadata>
4. <write_things>
5. <close_file & release_lock & flush_metadata>

However, actions 2 and 3 are required to be manually added by the
developer.  Hence, I presume that you were thinking in adding some
function for doing those actions at the same time, right?  In that
case, maybe it could be worthwhile to ponder about adding a sort
of 'lock' parameter to the H5Fopen call instead.  That way, actions 1,
2 and 3 can be done in just one single step, much the same than action
5, that would close, release the lock and flush metadata.  The diagram
action would then looks like:

<open_file & acquire_lock & evict_metadata>
<write_things>
<close_file & release_lock & flush_metadata>

which seems quite handy to my eyes.  Well, just a thought.

Thanks,

--
Francesc Alted
Freelance developer
Tel +34-964-282-249



Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Dimitris Servis
Hi Francesc,


2008/9/10 Francesc Alted <faltet at pytables.com>

> A Wednesday 10 September 2008, Quincey Koziol escrigu?:
> > Hi Francesc,
> >
> > On Sep 10, 2008, at 11:28 AM, Francesc Alted wrote:
> > > Quincey,
> > >
> > > A Wednesday 10 September 2008, escrigu?reu:
> > >> Hi Francesc,
> > >>
> > >> On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
> > >>> A Tuesday 09 September 2008, Francesc Alted escrigu?:
> > >>>> A Tuesday 09 September 2008, Quincey Koziol escrigu?:
> > >>>> [clip]
> > >>>>
> > >>>>>>>       You are fighting the metadata cache in HDF5.  Unfortunately
> > >>>>>>> there's currently no way to evict all the entries from the
> > >>>>>>> cache, even if you call H5Fflush(), so it's very likely that
> > >>>>>>> one or more of the processes will be dealing with stale
> > >>>>>>> metadata. I've added a new feature request to our bugzilla
> > >>>>>>> database and maybe we'll be able to act on it at some point.
> > >>>>>>
> > >>>>>> I see.  At any rate, I find it curious that locking using a
> > >>>>>> regular file
> > >>>>>> works flawlessly in the same scenario.
> > >>>>>
> > >>>>>         Locking using a regular file works because you are closing
> &
> > >>>>> re- opening the HDF5 file for each process (which flushes all
> > >>>>> the metadata changes to the file on closing and re-reads them
> > >>>>> on re-opening the file).
> > >>>>
> > >>>> So, when using the HDF5 file itself for locking, as the lock
> > >>>> process happens after the library has already opened the file
> > >>>> then it already has read bits from stalled metadata cache.  Now
> > >>>> I definitely see it.
> > >>>
> > >>> Hmm, not quite.  After thinking a bit more on this issue, I think
> > >>> now that the problem is not in the metadata cache, but it is a
> > >>> more fundamental one: I'm effectively opening a file (and hence,
> > >>> reading metadata, either from cache or from disk) *before*
> > >>> locking it, and that
> > >>> will always lead to wrong results, irregardless of an existing
> > >>> cache or
> > >>> not.
> > >>>
> > >>> I can devise a couple of solutions for this.  The first one is to
> > >>> add a
> > >>> new parameter to the H5Fopen to inform it that we want to lock
> > >>> the file
> > >>> as soon as the file descriptor is allocated and before reading
> > >>> any meta-information (either from disk or cache), but that
> > >>> implies an API change.
> > >>>
> > >>> The other solution is to increase the lazyness of the process of
> > >>> reading
> > >>> the metadata until it is absolutely needed by other functions.
> > >>> So, in essence, the H5Fopen() should only basically have to open
> > >>> the underlying file descriptor and that's all; then this
> > >>> descriptor can be manually locked and the file metadata should be
> > >>> read later on, when it is really needed.
> > >>>
> > >>> All in all, both approaches seems to need too much changes in
> > >>> HDF5. Perhaps a better venue is to find alternatives to do the
> > >>> locking in the
> > >>> application side instead of including the functionality in HDF5
> > >>> itself.
> > >>
> > >>    Those are both interesting ideas that I hadn't thought of.  What
> > >> I was thinking was to evict all the metadata from the cache and
> > >> then re- read it from the file.  This could be done at any point
> > >> after the file was opened, although it would require that all
> > >> objects in the file be closed when the cache entries were evicted.
> > >
> > > Well, I suppose that my ignorance on the internals of HDF5 is
> > > preventing
> > > me understanding your solution.  Let's suppose that we have 2
> > > processes
> > > on a multi-processor machine.  Let's call them process 'a' and
> > > process 'b'.  Both processes do the same thing: from time to time
> > > they open a HDF5 file, lock it, write something on it, and close it
> > > (unlocking it).
> > >
> > > If process 'a' gets the lock first, then process 'b' will *open*
> > > the file and will block until the file becomes unlocked.  While
> > > process 'b'
> > > is waiting, process 'a' writes a bunch of data in the file.  When
> > > 'a' finishes the writing and unlock the file then process 'b'
> > > unblocks and gets the lock.  But, by then (and this is main point),
> > > process 'b' already has got internal information about the opened
> > > file that is outdated.
> > >
> > > The only way that I see to avoid the problem is that the
> > > information about the opened file in process 'b' would exclusively
> > > reside in the metadata cache; so by refreshing it (or evicting it)
> > > the new processes can get the correct information.  However, that
> > > solution does imply that the HDF5 metadata cache is to be *shared*
> > > between both processes, and I don't think this would be the case.
> >
> >       No, the metadata cache doesn't need to be shared between both
> > processes.  As long as each process evicts all it's metadata from the
> > cache after it acquires the lock and flushes it's cache after it's
> > done modifying the file but before releasing the lock, everything
> > will work fine.  Since each process has no knowledge of the contents
> > of the file after evicting everything in it's cache, it will always
> > get the most recent information from the file and therefore see all
> > the changes from previous lock owners.
>
> Ah, I think I finally got what you meant.  So, as I understand it, here
> it is a small workflow of actions that reproduces your schema:
>
> 1. <open_file>
> 2. <acquire_lock>
> 3. <evict_metadata>
> 4. <write_things>
> 5. <close_file & release_lock & flush_metadata>
>
> However, actions 2 and 3 are required to be manually added by the
> developer.  Hence, I presume that you were thinking in adding some
> function for doing those actions at the same time, right?  In that
> case, maybe it could be worthwhile to ponder about adding a sort
> of 'lock' parameter to the H5Fopen call instead.  That way, actions 1,
> 2 and 3 can be done in just one single step, much the same than action
> 5, that would close, release the lock and flush metadata.  The diagram
> action would then looks like:
>
> <open_file & acquire_lock & evict_metadata>
> <write_things>
> <close_file & release_lock & flush_metadata>
>
> which seems quite handy to my eyes.  Well, just a thought.
>
>

do you support NFS?

BR

-- dimitris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20080910/b6917264/attachment.html>

Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Ger van Diepen
In reply to this post by Dimitris Servis
Hi Francesc, Dimitris, Quincey,

The type of transactions determine the type of locking you want to do.
Databases have typically small transactions, so they need fine-grained
locking; hold the lock for a short period of time and hold the lock for
only that part of the database that needs to be changed.

I think HDF5 does not fall into this category. Usually a lot of data is
read or written, so a lock on the entire file is fine. Furthermore a
lock is held for a longer time period, so the overhead of having to
close and reopen the file can be acceptable.
It is, however, somewhat cumbersome that you also have to close and
reopen all the groups, datasets, etc, so it would be nice if you could
use lock/unlock instead of having to open and close the file. But I fear
there is not much you can do about that. You just cannot be sure that
another process did not change the data structures in the file unless
HDF5 uses some clever (but probably very hard to implement) schemes.

Maybe Francesc and Dimitris can explain what kind of lock granularity
they would like to have and what scenarios they are thinking of. I can
imagine that Francesc would like some finer grained locking for the
PyTables.
One must also consider the overhead in doing unnecessary unlocking.
I.e. if a process only does lock/unlock because there might be another
process accessing the file, you may do a lot of unnecessary flushing.

Note that file locking is supported over NFS, but AFAIK NFS does not
fully guarantee that the remote cache is updated when a file gets
changed.
Also note that Unix/Linux does not remove a file until all file handles
accessing it are closed. So if one process deletes the file, the other
one can still access it. I don't know about Windows.

Cheers,
Ger

>>> "Dimitris Servis" <servisster at gmail.com> 09/10/08 8:16 PM >>>
Hi Quincey,

2008/9/10 Quincey Koziol <koziol at hdfgroup.org>

> Hi Dimitris,
>
>
> On Sep 10, 2008, at 12:35 PM, Dimitris Servis wrote:
>
>
>> 2008/9/10 Francesc Alted <faltet at pytables.com>
>> Quincey,
>>
>> A Wednesday 10 September 2008, escrigu?reu:
>> > Hi Francesc,
>> >
>> > On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
>> > > A Tuesday 09 September 2008, Francesc Alted escrigu?:
>> > >> A Tuesday 09 September 2008, Quincey Koziol escrigu?:
>> > >> [clip]
>> > >>
>> > >>>>>         You are fighting the metadata cache in HDF5.
>>  Unfortunately
>> > >>>>> there's currently no way to evict all the entries from the
>> > >>>>> cache, even if you call H5Fflush(), so it's very likely
that
>> > >>>>> one or more of the processes will be dealing with stale
>> > >>>>> metadata. I've added a new feature request to our bugzilla
>> > >>>>> database and maybe we'll be able to act on it at some
point.
>> > >>>>
>> > >>>> I see.  At any rate, I find it curious that locking using a
>> > >>>> regular file
>> > >>>> works flawlessly in the same scenario.
>> > >>>
>> > >>>   Locking using a regular file works because you are closing &
re-
>> > >>> opening the HDF5 file for each process (which flushes all the
>> > >>> metadata changes to the file on closing and re-reads them on
>> > >>> re-opening the file).
>> > >>
>> > >> So, when using the HDF5 file itself for locking, as the lock
>> > >> process happens after the library has already opened the file
then
>> > >> it already has read bits from stalled metadata cache.  Now I
>> > >> definitely see it.
>> > >
>> > > Hmm, not quite.  After thinking a bit more on this issue, I
think
>> > > now that the problem is not in the metadata cache, but it is a
more
>> > > fundamental one: I'm effectively opening a file (and hence,
reading
>> > > metadata, either from cache or from disk) *before* locking it,
and
>> > > that
>> > > will always lead to wrong results, irregardless of an existing
>> > > cache or
>> > > not.
>> > >
>> > > I can devise a couple of solutions for this.  The first one is
to
>> > > add a
>> > > new parameter to the H5Fopen to inform it that we want to lock
the
>> > > file
>> > > as soon as the file descriptor is allocated and before reading
any
>> > > meta-information (either from disk or cache), but that implies
an
>> > > API change.
>> > >
>> > > The other solution is to increase the lazyness of the process
of
>> > > reading
>> > > the metadata until it is absolutely needed by other functions.
So,
>> > > in essence, the H5Fopen() should only basically have to open
the
>> > > underlying file descriptor and that's all; then this descriptor
can
>> > > be manually locked and the file metadata should be read later
on,
>> > > when it is really needed.
>> > >
>> > > All in all, both approaches seems to need too much changes in
HDF5.
>> > > Perhaps a better venue is to find alternatives to do the locking
in
>> > > the
>> > > application side instead of including the functionality in HDF5
>> > > itself.
>> >
>> >       Those are both interesting ideas that I hadn't thought of.
What I
>> > was thinking was to evict all the metadata from the cache and
then
>> > re- read it from the file.  This could be done at any point after
the
>> > file was opened, although it would require that all objects in
the
>> > file be closed when the cache entries were evicted.
>>
>> Well, I suppose that my ignorance on the internals of HDF5 is
preventing
>> me understanding your solution.  Let's suppose that we have 2
processes
>> on a multi-processor machine.  Let's call them process 'a' and
>> process 'b'.  Both processes do the same thing: from time to time
they
>> open a HDF5 file, lock it, write something on it, and close it
>> (unlocking it).
>>
>> If process 'a' gets the lock first, then process 'b' will *open*
the
>> file and will block until the file becomes unlocked.  While process
'b'
>> is waiting, process 'a' writes a bunch of data in the file.  When
'a'
>> finishes the writing and unlock the file then process 'b' unblocks
and
>> gets the lock.  But, by then (and this is main point), process 'b'
>> already has got internal information about the opened file that is
>> outdated.
>>
>> The only way that I see to avoid the problem is that the
information
>> about the opened file in process 'b' would exclusively reside in
the
>> metadata cache; so by refreshing it (or evicting it) the new
processes
>> can get the correct information.  However, that solution does imply
>> that the HDF5 metadata cache is to be *shared* between both
processes,
>> and I don't think this would be the case.
>>
>> Hi Francesc,
>>
>> I have similar issues and I think you're right when you say that
this
>> should be solved at the application layer. It is pretty difficult
when the
>> library cannot manage its own space to have efficient locking. What
if for
>> example the user deletes the file? Or another process wants to move
the
>> file? For the moment I think it is difficult to deal with this
effectively
>> so I will try to solve it with the old hack of the flag: when a
process
>> enters the root (in my case also other major nodes in the tree) I
set an
>> attribute and the last thing the process does is to unset the
attribute.
>> This way I also know if there was an issue and writing failed.
>>
>
>        Setting a flag in the file is not sufficient.  It's easy to
imagine
> race conditions where two processes simultaneously check for the
presence of
> the flag, determine it doesn't exist and set it, then proceed to
modify the
> file.  Some other mechanism which guarantees exclusive access must be
used.
>  (And even then, you'll have to use the cache management strategies
I
> mentioned in an earlier mail).
>
>        Note that we've given this a fair bit of thought at the HDF
Group
> and have some good solutions, but would need to get funding/patches
for this
> to get into the HDF5 library.
>
>        Quincey


I know it is not sufficient but locking will work only at the
application
level and AFAIK a portable solution will try to use whole file locks
or
separate lock files but that is done at a different level than HDF lib.
For
a process to decide what to do, it has to check the locks and the
special
attributes. Note also, that linking HDF5 statically, means each
process'
cache is different anyway.

I am sure you've given it a good deal of thought but for the benefit
of
having single files and no services/daemons, efficient file locking is
sacrificed and becomes cumbersome.

Best Regards,

-- dimitris


----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Francesc Alted
In reply to this post by Dimitris Servis
Hi Dimitris,

A Wednesday 10 September 2008, Dimitris Servis escrigu?:
[clip]

> > >       No, the metadata cache doesn't need to be shared between
> > > both processes.  As long as each process evicts all it's metadata
> > > from the cache after it acquires the lock and flushes it's cache
> > > after it's done modifying the file but before releasing the lock,
> > > everything will work fine.  Since each process has no knowledge
> > > of the contents of the file after evicting everything in it's
> > > cache, it will always get the most recent information from the
> > > file and therefore see all the changes from previous lock owners.
> >
> > Ah, I think I finally got what you meant.  So, as I understand it,
> > here it is a small workflow of actions that reproduces your schema:
> >
> > 1. <open_file>
> > 2. <acquire_lock>
> > 3. <evict_metadata>
> > 4. <write_things>
> > 5. <close_file & release_lock & flush_metadata>
> >
> > However, actions 2 and 3 are required to be manually added by the
> > developer.  Hence, I presume that you were thinking in adding some
> > function for doing those actions at the same time, right?  In that
> > case, maybe it could be worthwhile to ponder about adding a sort
> > of 'lock' parameter to the H5Fopen call instead.  That way, actions
> > 1, 2 and 3 can be done in just one single step, much the same than
> > action 5, that would close, release the lock and flush metadata.
> > The diagram action would then looks like:
> >
> > <open_file & acquire_lock & evict_metadata>
> > <write_things>
> > <close_file & release_lock & flush_metadata>
> >
> > which seems quite handy to my eyes.  Well, just a thought.
>
> do you support NFS?

Sorry, but I don't know what you are referring to exactly.  Could you be
a more explicit please?

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Dimitris Servis
Hi Francesc,

2008/9/11 Francesc Alted <faltet at pytables.com>

> Hi Dimitris,
>
> A Wednesday 10 September 2008, Dimitris Servis escrigu?:
> [clip]
> > > >       No, the metadata cache doesn't need to be shared between
> > > > both processes.  As long as each process evicts all it's metadata
> > > > from the cache after it acquires the lock and flushes it's cache
> > > > after it's done modifying the file but before releasing the lock,
> > > > everything will work fine.  Since each process has no knowledge
> > > > of the contents of the file after evicting everything in it's
> > > > cache, it will always get the most recent information from the
> > > > file and therefore see all the changes from previous lock owners.
> > >
> > > Ah, I think I finally got what you meant.  So, as I understand it,
> > > here it is a small workflow of actions that reproduces your schema:
> > >
> > > 1. <open_file>
> > > 2. <acquire_lock>
> > > 3. <evict_metadata>
> > > 4. <write_things>
> > > 5. <close_file & release_lock & flush_metadata>
> > >
> > > However, actions 2 and 3 are required to be manually added by the
> > > developer.  Hence, I presume that you were thinking in adding some
> > > function for doing those actions at the same time, right?  In that
> > > case, maybe it could be worthwhile to ponder about adding a sort
> > > of 'lock' parameter to the H5Fopen call instead.  That way, actions
> > > 1, 2 and 3 can be done in just one single step, much the same than
> > > action 5, that would close, release the lock and flush metadata.
> > > The diagram action would then looks like:
> > >
> > > <open_file & acquire_lock & evict_metadata>
> > > <write_things>
> > > <close_file & release_lock & flush_metadata>
> > >
> > > which seems quite handy to my eyes.  Well, just a thought.
> >
> > do you support NFS?
>
> Sorry, but I don't know what you are referring to exactly.  Could you be
> a more explicit please?
>

My apologies, sorry :-). My point was if you want your app to work over NFS,
you simply cannot rely on processes sharing metadata. Processes may reside
on different machines.

Regards,

-- dimitris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20080911/eedf8d7e/attachment.html>

Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Francesc Alted
In reply to this post by Ger van Diepen
Hi Ger,

A Thursday 11 September 2008, Ger van Diepen escrigu?:

> Hi Francesc, Dimitris, Quincey,
>
> The type of transactions determine the type of locking you want to
> do. Databases have typically small transactions, so they need
> fine-grained locking; hold the lock for a short period of time and
> hold the lock for only that part of the database that needs to be
> changed.
>
> I think HDF5 does not fall into this category. Usually a lot of data
> is read or written, so a lock on the entire file is fine. Furthermore
> a lock is held for a longer time period, so the overhead of having to
> close and reopen the file can be acceptable.

Yeah, this is my impression too.

> It is, however, somewhat cumbersome that you also have to close and
> reopen all the groups, datasets, etc, so it would be nice if you
> could use lock/unlock instead of having to open and close the file.
> But I fear there is not much you can do about that. You just cannot
> be sure that another process did not change the data structures in
> the file unless HDF5 uses some clever (but probably very hard to
> implement) schemes.

Cumbersome? in what sense?  For example, PyTables keeps track of all its
opened nodes (they live in its own internal metadata LRU cache), and
when the user ask to close the file, all the opened nodes (groups,
datasets) are closed automatically (in both PyTables and HDF5 levels).  
I don't know about HDF5, but if it doesn't do the same, that would be a
handy thing to implement (bar side effects that I don't see right now).

> Maybe Francesc and Dimitris can explain what kind of lock granularity
> they would like to have and what scenarios they are thinking of. I
> can imagine that Francesc would like some finer grained locking for
> the PyTables.
> One must also consider the overhead in doing unnecessary unlocking.
> I.e. if a process only does lock/unlock because there might be
> another process accessing the file, you may do a lot of unnecessary
> flushing.

I'm mainly looking about the locking functionality because a user
required it:

http://www.pytables.org/trac/ticket/185

And well, locking at file level would be enough for the time being, yes.  
More fine grained locking would require direct HDF5 support for this,
and I am afraid that that would imply too many changes on it.

> Note that file locking is supported over NFS, but AFAIK NFS does not
> fully guarantee that the remote cache is updated when a file gets
> changed.

Yeah, I don't know lately, but figthing with locking and NFS has always
been a difficult subject, to say the least.

Cheers,
Francesc

> Also note that Unix/Linux does not remove a file until all file
> handles accessing it are closed. So if one process deletes the file,
> the other one can still access it. I don't know about Windows.
>
> Cheers,
> Ger
>
> >>> "Dimitris Servis" <servisster at gmail.com> 09/10/08 8:16 PM >>>
>
> Hi Quincey,
>
> 2008/9/10 Quincey Koziol <koziol at hdfgroup.org>
>
> > Hi Dimitris,
> >
> > On Sep 10, 2008, at 12:35 PM, Dimitris Servis wrote:
> >> 2008/9/10 Francesc Alted <faltet at pytables.com>
> >> Quincey,
> >>
> >> A Wednesday 10 September 2008, escrigu?reu:
> >> > Hi Francesc,
> >> >
> >> > On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
> >> > > A Tuesday 09 September 2008, Francesc Alted escrigu?:
> >> > >> A Tuesday 09 September 2008, Quincey Koziol escrigu?:
> >> > >> [clip]
> >> > >>
> >> > >>>>>         You are fighting the metadata cache in HDF5.
> >>
> >>  Unfortunately
> >>
> >> > >>>>> there's currently no way to evict all the entries from the
> >> > >>>>> cache, even if you call H5Fflush(), so it's very likely
>
> that
>
> >> > >>>>> one or more of the processes will be dealing with stale
> >> > >>>>> metadata. I've added a new feature request to our bugzilla
> >> > >>>>> database and maybe we'll be able to act on it at some
>
> point.
>
> >> > >>>> I see.  At any rate, I find it curious that locking using a
> >> > >>>> regular file
> >> > >>>> works flawlessly in the same scenario.
> >> > >>>
> >> > >>>   Locking using a regular file works because you are closing
> >> > >>> &
>
> re-
>
> >> > >>> opening the HDF5 file for each process (which flushes all
> >> > >>> the metadata changes to the file on closing and re-reads
> >> > >>> them on re-opening the file).
> >> > >>
> >> > >> So, when using the HDF5 file itself for locking, as the lock
> >> > >> process happens after the library has already opened the file
>
> then
>
> >> > >> it already has read bits from stalled metadata cache.  Now I
> >> > >> definitely see it.
> >> > >
> >> > > Hmm, not quite.  After thinking a bit more on this issue, I
>
> think
>
> >> > > now that the problem is not in the metadata cache, but it is a
>
> more
>
> >> > > fundamental one: I'm effectively opening a file (and hence,
>
> reading
>
> >> > > metadata, either from cache or from disk) *before* locking it,
>
> and
>
> >> > > that
> >> > > will always lead to wrong results, irregardless of an existing
> >> > > cache or
> >> > > not.
> >> > >
> >> > > I can devise a couple of solutions for this.  The first one is
>
> to
>
> >> > > add a
> >> > > new parameter to the H5Fopen to inform it that we want to lock
>
> the
>
> >> > > file
> >> > > as soon as the file descriptor is allocated and before reading
>
> any
>
> >> > > meta-information (either from disk or cache), but that implies
>
> an
>
> >> > > API change.
> >> > >
> >> > > The other solution is to increase the lazyness of the process
>
> of
>
> >> > > reading
> >> > > the metadata until it is absolutely needed by other functions.
>
> So,
>
> >> > > in essence, the H5Fopen() should only basically have to open
>
> the
>
> >> > > underlying file descriptor and that's all; then this
> >> > > descriptor
>
> can
>
> >> > > be manually locked and the file metadata should be read later
>
> on,
>
> >> > > when it is really needed.
> >> > >
> >> > > All in all, both approaches seems to need too much changes in
>
> HDF5.
>
> >> > > Perhaps a better venue is to find alternatives to do the
> >> > > locking
>
> in
>
> >> > > the
> >> > > application side instead of including the functionality in
> >> > > HDF5 itself.
> >> >
> >> >       Those are both interesting ideas that I hadn't thought of.
>
> What I
>
> >> > was thinking was to evict all the metadata from the cache and
>
> then
>
> >> > re- read it from the file.  This could be done at any point
> >> > after
>
> the
>
> >> > file was opened, although it would require that all objects in
>
> the
>
> >> > file be closed when the cache entries were evicted.
> >>
> >> Well, I suppose that my ignorance on the internals of HDF5 is
>
> preventing
>
> >> me understanding your solution.  Let's suppose that we have 2
>
> processes
>
> >> on a multi-processor machine.  Let's call them process 'a' and
> >> process 'b'.  Both processes do the same thing: from time to time
>
> they
>
> >> open a HDF5 file, lock it, write something on it, and close it
> >> (unlocking it).
> >>
> >> If process 'a' gets the lock first, then process 'b' will *open*
>
> the
>
> >> file and will block until the file becomes unlocked.  While
> >> process
>
> 'b'
>
> >> is waiting, process 'a' writes a bunch of data in the file.  When
>
> 'a'
>
> >> finishes the writing and unlock the file then process 'b' unblocks
>
> and
>
> >> gets the lock.  But, by then (and this is main point), process 'b'
> >> already has got internal information about the opened file that is
> >> outdated.
> >>
> >> The only way that I see to avoid the problem is that the
>
> information
>
> >> about the opened file in process 'b' would exclusively reside in
>
> the
>
> >> metadata cache; so by refreshing it (or evicting it) the new
>
> processes
>
> >> can get the correct information.  However, that solution does
> >> imply that the HDF5 metadata cache is to be *shared* between both
>
> processes,
>
> >> and I don't think this would be the case.
> >>
> >> Hi Francesc,
> >>
> >> I have similar issues and I think you're right when you say that
>
> this
>
> >> should be solved at the application layer. It is pretty difficult
>
> when the
>
> >> library cannot manage its own space to have efficient locking.
> >> What
>
> if for
>
> >> example the user deletes the file? Or another process wants to
> >> move
>
> the
>
> >> file? For the moment I think it is difficult to deal with this
>
> effectively
>
> >> so I will try to solve it with the old hack of the flag: when a
>
> process
>
> >> enters the root (in my case also other major nodes in the tree) I
>
> set an
>
> >> attribute and the last thing the process does is to unset the
>
> attribute.
>
> >> This way I also know if there was an issue and writing failed.
> >
> >        Setting a flag in the file is not sufficient.  It's easy to
>
> imagine
>
> > race conditions where two processes simultaneously check for the
>
> presence of
>
> > the flag, determine it doesn't exist and set it, then proceed to
>
> modify the
>
> > file.  Some other mechanism which guarantees exclusive access must
> > be
>
> used.
>
> >  (And even then, you'll have to use the cache management strategies
>
> I
>
> > mentioned in an earlier mail).
> >
> >        Note that we've given this a fair bit of thought at the HDF
>
> Group
>
> > and have some good solutions, but would need to get funding/patches
>
> for this
>
> > to get into the HDF5 library.
> >
> >        Quincey
>
> I know it is not sufficient but locking will work only at the
> application
> level and AFAIK a portable solution will try to use whole file locks
> or
> separate lock files but that is done at a different level than HDF
> lib. For
> a process to decide what to do, it has to check the locks and the
> special
> attributes. Note also, that linking HDF5 statically, means each
> process'
> cache is different anyway.
>
> I am sure you've given it a good deal of thought but for the benefit
> of
> having single files and no services/daemons, efficient file locking
> is sacrificed and becomes cumbersome.
>
> Best Regards,
>
> -- dimitris
>
>
> ---------------------------------------------------------------------
>- This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to
> hdf-forum-subscribe at hdfgroup.org. To unsubscribe, send a message to
> hdf-forum-unsubscribe at hdfgroup.org.



--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe at hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe at hdfgroup.org.




Reply | Threaded
Open this post in threaded view
|

[hdf-forum] Is locking a file possible?

Dimitris Servis
Hi Ger, Francesc,

2008/9/11 Francesc Alted <faltet at pytables.com>

> Hi Ger,
>
> A Thursday 11 September 2008, Ger van Diepen escrigu?:
> > Hi Francesc, Dimitris, Quincey,
> >
> > The type of transactions determine the type of locking you want to
> > do. Databases have typically small transactions, so they need
> > fine-grained locking; hold the lock for a short period of time and
> > hold the lock for only that part of the database that needs to be
> > changed.
> >
> > I think HDF5 does not fall into this category. Usually a lot of data
> > is read or written, so a lock on the entire file is fine. Furthermore
> > a lock is held for a longer time period, so the overhead of having to
> > close and reopen the file can be acceptable.
>
> Yeah, this is my impression too.
>
> > It is, however, somewhat cumbersome that you also have to close and
> > reopen all the groups, datasets, etc, so it would be nice if you
> > could use lock/unlock instead of having to open and close the file.
> > But I fear there is not much you can do about that. You just cannot
> > be sure that another process did not change the data structures in
> > the file unless HDF5 uses some clever (but probably very hard to
> > implement) schemes.
>
> Cumbersome? in what sense?  For example, PyTables keeps track of all its
> opened nodes (they live in its own internal metadata LRU cache), and
> when the user ask to close the file, all the opened nodes (groups,
> datasets) are closed automatically (in both PyTables and HDF5 levels).
> I don't know about HDF5, but if it doesn't do the same, that would be a
> handy thing to implement (bar side effects that I don't see right now).
>
> > Maybe Francesc and Dimitris can explain what kind of lock granularity
> > they would like to have and what scenarios they are thinking of. I
> > can imagine that Francesc would like some finer grained locking for
> > the PyTables.
> > One must also consider the overhead in doing unnecessary unlocking.
> > I.e. if a process only does lock/unlock because there might be
> > another process accessing the file, you may do a lot of unnecessary
> > flushing.
>
> I'm mainly looking about the locking functionality because a user
> required it:
>
> http://www.pytables.org/trac/ticket/185
>
> And well, locking at file level would be enough for the time being, yes.
> More fine grained locking would require direct HDF5 support for this,
> and I am afraid that that would imply too many changes on it.
>
> > Note that file locking is supported over NFS, but AFAIK NFS does not
> > fully guarantee that the remote cache is updated when a file gets
> > changed.
>
> Yeah, I don't know lately, but figthing with locking and NFS has always
> been a difficult subject, to say the least.
>
> Cheers,
> Francesc
>
> > Also note that Unix/Linux does not remove a file until all file
> > handles accessing it are closed. So if one process deletes the file,
> > the other one can still access it. I don't know about Windows.
> >
> > Cheers,
> > Ger
> >
> > >>> "Dimitris Servis" <servisster at gmail.com> 09/10/08 8:16 PM >>>
> >
> > Hi Quincey,
> >
> > 2008/9/10 Quincey Koziol <koziol at hdfgroup.org>
> >
> > > Hi Dimitris,
> > >
> > > On Sep 10, 2008, at 12:35 PM, Dimitris Servis wrote:
> > >> 2008/9/10 Francesc Alted <faltet at pytables.com>
> > >> Quincey,
> > >>
> > >> A Wednesday 10 September 2008, escrigu?reu:
> > >> > Hi Francesc,
> > >> >
> > >> > On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
> > >> > > A Tuesday 09 September 2008, Francesc Alted escrigu?:
> > >> > >> A Tuesday 09 September 2008, Quincey Koziol escrigu?:
> > >> > >> [clip]
> > >> > >>
> > >> > >>>>>         You are fighting the metadata cache in HDF5.
> > >>
> > >>  Unfortunately
> > >>
> > >> > >>>>> there's currently no way to evict all the entries from the
> > >> > >>>>> cache, even if you call H5Fflush(), so it's very likely
> >
> > that
> >
> > >> > >>>>> one or more of the processes will be dealing with stale
> > >> > >>>>> metadata. I've added a new feature request to our bugzilla
> > >> > >>>>> database and maybe we'll be able to act on it at some
> >
> > point.
> >
> > >> > >>>> I see.  At any rate, I find it curious that locking using a
> > >> > >>>> regular file
> > >> > >>>> works flawlessly in the same scenario.
> > >> > >>>
> > >> > >>>   Locking using a regular file works because you are closing
> > >> > >>> &
> >
> > re-
> >
> > >> > >>> opening the HDF5 file for each process (which flushes all
> > >> > >>> the metadata changes to the file on closing and re-reads
> > >> > >>> them on re-opening the file).
> > >> > >>
> > >> > >> So, when using the HDF5 file itself for locking, as the lock
> > >> > >> process happens after the library has already opened the file
> >
> > then
> >
> > >> > >> it already has read bits from stalled metadata cache.  Now I
> > >> > >> definitely see it.
> > >> > >
> > >> > > Hmm, not quite.  After thinking a bit more on this issue, I
> >
> > think
> >
> > >> > > now that the problem is not in the metadata cache, but it is a
> >
> > more
> >
> > >> > > fundamental one: I'm effectively opening a file (and hence,
> >
> > reading
> >
> > >> > > metadata, either from cache or from disk) *before* locking it,
> >
> > and
> >
> > >> > > that
> > >> > > will always lead to wrong results, irregardless of an existing
> > >> > > cache or
> > >> > > not.
> > >> > >
> > >> > > I can devise a couple of solutions for this.  The first one is
> >
> > to
> >
> > >> > > add a
> > >> > > new parameter to the H5Fopen to inform it that we want to lock
> >
> > the
> >
> > >> > > file
> > >> > > as soon as the file descriptor is allocated and before reading
> >
> > any
> >
> > >> > > meta-information (either from disk or cache), but that implies
> >
> > an
> >
> > >> > > API change.
> > >> > >
> > >> > > The other solution is to increase the lazyness of the process
> >
> > of
> >
> > >> > > reading
> > >> > > the metadata until it is absolutely needed by other functions.
> >
> > So,
> >
> > >> > > in essence, the H5Fopen() should only basically have to open
> >
> > the
> >
> > >> > > underlying file descriptor and that's all; then this
> > >> > > descriptor
> >
> > can
> >
> > >> > > be manually locked and the file metadata should be read later
> >
> > on,
> >
> > >> > > when it is really needed.
> > >> > >
> > >> > > All in all, both approaches seems to need too much changes in
> >
> > HDF5.
> >
> > >> > > Perhaps a better venue is to find alternatives to do the
> > >> > > locking
> >
> > in
> >
> > >> > > the
> > >> > > application side instead of including the functionality in
> > >> > > HDF5 itself.
> > >> >
> > >> >       Those are both interesting ideas that I hadn't thought of.
> >
> > What I
> >
> > >> > was thinking was to evict all the metadata from the cache and
> >
> > then
> >
> > >> > re- read it from the file.  This could be done at any point
> > >> > after
> >
> > the
> >
> > >> > file was opened, although it would require that all objects in
> >
> > the
> >
> > >> > file be closed when the cache entries were evicted.
> > >>
> > >> Well, I suppose that my ignorance on the internals of HDF5 is
> >
> > preventing
> >
> > >> me understanding your solution.  Let's suppose that we have 2
> >
> > processes
> >
> > >> on a multi-processor machine.  Let's call them process 'a' and
> > >> process 'b'.  Both processes do the same thing: from time to time
> >
> > they
> >
> > >> open a HDF5 file, lock it, write something on it, and close it
> > >> (unlocking it).
> > >>
> > >> If process 'a' gets the lock first, then process 'b' will *open*
> >
> > the
> >
> > >> file and will block until the file becomes unlocked.  While
> > >> process
> >
> > 'b'
> >
> > >> is waiting, process 'a' writes a bunch of data in the file.  When
> >
> > 'a'
> >
> > >> finishes the writing and unlock the file then process 'b' unblocks
> >
> > and
> >
> > >> gets the lock.  But, by then (and this is main point), process 'b'
> > >> already has got internal information about the opened file that is
> > >> outdated.
> > >>
> > >> The only way that I see to avoid the problem is that the
> >
> > information
> >
> > >> about the opened file in process 'b' would exclusively reside in
> >
> > the
> >
> > >> metadata cache; so by refreshing it (or evicting it) the new
> >
> > processes
> >
> > >> can get the correct information.  However, that solution does
> > >> imply that the HDF5 metadata cache is to be *shared* between both
> >
> > processes,
> >
> > >> and I don't think this would be the case.
> > >>
> > >> Hi Francesc,
> > >>
> > >> I have similar issues and I think you're right when you say that
> >
> > this
> >
> > >> should be solved at the application layer. It is pretty difficult
> >
> > when the
> >
> > >> library cannot manage its own space to have efficient locking.
> > >> What
> >
> > if for
> >
> > >> example the user deletes the file? Or another process wants to
> > >> move
> >
> > the
> >
> > >> file? For the moment I think it is difficult to deal with this
> >
> > effectively
> >
> > >> so I will try to solve it with the old hack of the flag: when a
> >
> > process
> >
> > >> enters the root (in my case also other major nodes in the tree) I
> >
> > set an
> >
> > >> attribute and the last thing the process does is to unset the
> >
> > attribute.
> >
> > >> This way I also know if there was an issue and writing failed.
> > >
> > >        Setting a flag in the file is not sufficient.  It's easy to
> >
> > imagine
> >
> > > race conditions where two processes simultaneously check for the
> >
> > presence of
> >
> > > the flag, determine it doesn't exist and set it, then proceed to
> >
> > modify the
> >
> > > file.  Some other mechanism which guarantees exclusive access must
> > > be
> >
> > used.
> >
> > >  (And even then, you'll have to use the cache management strategies
> >
> > I
> >
> > > mentioned in an earlier mail).
> > >
> > >        Note that we've given this a fair bit of thought at the HDF
> >
> > Group
> >
> > > and have some good solutions, but would need to get funding/patches
> >
> > for this
> >
> > > to get into the HDF5 library.
> > >
> > >        Quincey
> >
> > I know it is not sufficient but locking will work only at the
> > application
> > level and AFAIK a portable solution will try to use whole file locks
> > or
> > separate lock files but that is done at a different level than HDF
> > lib. For
> > a process to decide what to do, it has to check the locks and the
> > special
> > attributes. Note also, that linking HDF5 statically, means each
> > process'
> > cache is different anyway.
> >
> > I am sure you've given it a good deal of thought but for the benefit
> > of
> > having single files and no services/daemons, efficient file locking
> > is sacrificed and becomes cumbersome.
> >
> > Best Regards,
> >
> > -- dimitris
> >
> >
> > ---------------------------------------------------------------------
> >- This mailing list is for HDF software users discussion.
> > To subscribe to this list, send a message to
> > hdf-forum-subscribe at hdfgroup.org. To unsubscribe, send a message to
> > hdf-forum-unsubscribe at hdfgroup.org.
>
>
>
> I also agree that the transaction scheme is what Ger describes and
definitely agree that this is actually the way to go. I cannot think of a
way that HDF5 could deal with the complexity of locking a file without FS
locks but maybe it's just me. In my case I allow only one process to write
to the file. This process has to acquire the lock first and set an attribute
to the particular object. Other processes can read but at least they know
that some objects are being changed at the time and the application can
decide to read the current state or wait. Also this gives me an indication
if a transaction was completed successfully, as I only support a finite
number of write transactions through my top level API.

Other ways could include lock files, but I like depending on one file or
diff files, but I have large datasets. And in general I (as I guess most of
other users of HDF5) have few and large transactions, that even if I could
write in parallel, they would after all be serialized when writing on the
disk I guess, so overall performance would be more or less the same and not
worth the effort to implement something more complex.

Thanks a lot!

-- dimitris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20080911/969a80f1/attachment.html>

12