Re: Hdf-forum Digest, Vol 102, Issue 25

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view

Re: Hdf-forum Digest, Vol 102, Issue 25

Friedhelm Matten
Hi John,

I ' m an Java developer for 20 years and relative new to python.
But I would like to help for a better effort.
How and where could I start and help?

Thanks and regards


-----Ursprüngliche Nachricht-----
Von: Hdf-forum [mailto:[hidden email]] Im Auftrag von
[hidden email]
Gesendet: Mittwoch, 20. Dezember 2017 15:03
An: [hidden email]
Betreff: Hdf-forum Digest, Vol 102, Issue 25

Send Hdf-forum mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit

or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific than
"Re: Contents of Hdf-forum digest..."

Today's Topics:

   1. Re: h5serv, Tablet, compound (John Readey)
   2. Re: Errors when install HDF5 (??)
   3. [RFC] [PATCH] Windows Unicode Filename support (Christian Seiler)


Message: 1
Date: Tue, 19 Dec 2017 22:25:11 +0000
From: John Readey <[hidden email]>
To: HDF Users Discussion List <[hidden email]>
Subject: Re: [Hdf-forum] h5serv, Tablet, compound
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi Freidhelm,

     I?d love to have pytables and pandas integrated with HDF Server, but
that would take some work.   A lot of effort went into h5pyd and it?s
working pretty well as a h5py drop in.  

      So, I think the best approach would be to first refactor pytables to
use h5py.   This is what Andy Scopatz proposed earlier:  Unfortunately, not
much progress has been made in this effort.  

     Anyway, once we have pytables on h5py, it would be easy to have
pytables use h5py or h5pyd depending on if the file is a local posix file,
or a HDF Server-based content.  And from there, pandas support should come
for free as well.

    For the question about production environments ? the project is
relatively new, so please exercise reasonable precautions ? Backup any data
you put into h5serv, follow best practices for security, don?t use for
nuclear power plant control systems ( , etc.


On 12/8/17, 5:45 AM, "Hdf-forum on behalf of ISCaD GmbH"
<[hidden email] on behalf of
[hidden email]> wrote:

    I begin use of h5serv and have to question.
    1. I would like to use pytables or pandas, is this
        possible with h5pyd or must I use compound
        datatype and when....has anybody an example?
    2. Is the reference with docker for production like environments
        safe ?
    Hdf-forum is for HDF software users discussion.
    [hidden email]


Message: 2
Date: Wed, 20 Dec 2017 21:46:45 +0800
From: ?? <[hidden email]>
To: HDF Users Discussion List <[hidden email]>
Subject: Re: [Hdf-forum] Errors when install HDF5
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="gb2312"

Hello, Barbara,

Thank you sooooo much for your reply, I am sorry to reply you late. I have
installed by running ?yum install hdf5 hdf5-devel?. Although I don?t know
which version I have installed, and the program I need runs well. I just
focus the program so I ignore the error I have meet. By the way, thank you
for your concern.

Have a nice day.
Jie Jiang

> ? 2017?12?7????12:06??? <[hidden email]> ???
> <config.log>

-------------- next part --------------
An HTML attachment was scrubbed...


Message: 3
Date: Wed, 20 Dec 2017 14:05:10 +0100
From: Christian Seiler <[hidden email]>
To: [hidden email]
Subject: [Hdf-forum] [RFC] [PATCH] Windows Unicode Filename support
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="utf-8"; Format="flowed"

Dear all,


I'd like to contribute a patch to HDF5, and this appears to be the
appropriate place to send it to. If I am mistaken, I'd appreciate a pointer
where to go instead. (It would be great if the website could have some
prominent information about how to contribute to HDF5. Also, I couldn't find
it, does HDF5 have a version control repository, such as Git or SVN?)

Problem description

Windows has two different representations of filenames: 8-bit fixed-width
"ANSI" and 16-bit "Unicode" (effectively UTF-16). The 8-bit representation
depends on the locale settings of the computer; the lower
128 values correspond to ASCII, while the upper 128 values depend on the
locale settings of the computer; in Germany, for example, code page 1252 is
typically used. (Very similar, but not identical to ISO-8859-1.)

When using standardized C / POSIX functions as HDF5 does (open, fopen,
etc.), which accept 8-bit strings, they will always assume the local 8-bit
encoding. The problem is that the local 8-bit will never be able to encode
all possible filenames that the operating system supports, as a fixed 8-bit
encoding will never be able to encode all Unicode characters. Furthermore,
in some languages there are so many characters that any fixed 8-bit encoding
will never be able to represent all of them.

This in turn means that on Windows systems it is possible to have HDF5 fail
to open a file if the file name (or the directory that contains it) contains
characters that are not representable in the local 8-bit encoding of the
system. For example, on a typical US Windows installation it is not possible
to use HDF5 to store files with names that contain e.g. Japanese characters,
even though the operating system itself does support these.

To actually access all possible files Microsoft offers alternatives to the
standard functions that accept UTF-16 filenames in form of wchar_t strings.
There is _wopen() instead of open(), and _wfopen() instead of fopen().

(For reference: other operating systems, such as Linux and Mac OS X, always
represent filenames as 8-bit strings; the operating system often does not
care about the precise encoding and leaves it up to the software itself
(though in practice this most likely will be UTF-8 nowadays), which means
that the standard 8-bit APIs can always be used to access any file on disk.)

Example consequences of this problem: GUI application, user chooses a file
from a "File Open" dialog, file name is converted appropriately and passed
to HDF5, HDF5 cannot load the file (that the user chose in the same
application) because the file (or a directory containing it) contains
characters that can't be represented in the local code page.

Rejected solutions

The most obvious solution would be to simply provide additional functions in
HDF5 that also accept wchar_t filenames on Windows systems.
However, HDF5 has a large number of methods that simply pass through file
names (or maybe even manipulate them a bit) and this would lead to a huge
duplication of existing code, which I don't believe is a good idea for the
long-term maintenance of HDF5.

An alternative suggestion (see e.g. [1]) would be to always assume on
Windows systems that the filename supplied is encoded in UTF-8 (which, due
to being variable-length, can represent all possible characters) and convert
it to UTF-16 before passing it to the wide functions (_wopen,
_wfopen) directly. This has the advantage that now all filenames can be
represented. However this has the huge disadvantage that most software does
not expect HDF5 to accept UTF-8-encoded file names, and if a program
converts a string that it got from a "File Open" dialog into the local 8bit
codepage (as many programs would do now), any character in the local code
page beyond ASCII would cease to work (as UTF-8 encodes them differently).
For example, since the German umlauts ?, ?, ? can be represented in the
local codepage, file names with these characters can actually be opened on
Windows systems with HDF5 at the moment (when using German locale settings,
at least), and this change would break existing programs if it were to be
added to HDF5 itself unconditionally.

Proposed solution

I'd like to propose the following solution instead. It is based on the
UTF-8 encoding idea, but keeps compatibility with existing software.

  - Default behavior: HDF5 behaves as it currently does and calls the
    standard "ANSI" open(), fopen(), etc. functions. It will hence
    continue to work with characters in the local code page.

  - Add a boolean to the file access property list that may be used to
    indicate that the file name is in UTF-8 on Windows systems (the
    boolean will be ignored on all other operating systems):

      H5Pset_windows_unicode_filenames(fapl, TRUE);

  - Update the filesystem drivers to check for this flag, and if it
    is set to actually do a conversion from UTF-8 to UTF-16 and then
    call the corresponding wide functions.

The advantage is that current code doesn't break, but users who want to
properly support Windows can actually do so, they just need to ensure they
encode their filenames in UTF-8. The other main advantage is that the patch
is not very invasive.

I've attached (against 1.10.1) that implements this. The following is
currently supported:

  - Property list flag accessors:

         H5Pset_windows_unicode_filenames(fapl, value);
         H5Pget_windows_unicode_filenames(fapl, &value);

  - SEC2/Windows driver

  - Core driver

  - stdio driver

I've successfully tested this in the following constellation on a Windows 10
system with German locale (using MinGW-w64/gcc7.2.0 as the compiler, 64bit):

  - Flag not set, files with Umlauts, calling HDF5 with the file names
    encoded in the current codepage. (Compatibility check for existing

  - Flag set, lots of different test cases (file names in pure ASCII,
    German Umlauts, Japanese characters, Hebrew characters, Arabic
    characters), calling HDF5 with the file names encoded in UTF-8
    and the flag set in the FAPL before calling the HDF5 functions.

I tested all three drivers (SEC2, Core, stdio) in both cases.

I also tested that the patch doesn't break on Linux (Debian 9, gcc 7.2.0,
64bit x86) to ensure that the patches don't harm non-Windows platforms.

What should work, but I haven't tested it:

  - The FAMILY driver, as that just passes through the FAPL to the
    underlying driver, and since UTF-8 is ASCII-compatible, any
    manipulation done in the driver should be safe as well.

What I believe doesn't make sense to implement:

  - The direct I/O driver. It appears to contain some Windows code, but
    the CMake build system will never build it on Windows, so I left
    that out. If that is wrong and the direct I/O driver should work on
    Windows, I'll be happy to update the patch.

What I didn't implement yet:

  - C++, Fortran and Java wrappers for the FAPL flag getters/setters

  - External File Lists (EFL) support (H5Defl.c)

  - HDF5 plugin libraries (H5PL.c)

  - Logging driver (H5FDlog.c)

  - Cache logging (H5Clog.c)

Feedback is appreciated, and it would be fantastic if this could be included
in a future version of HDF5. I would be willing to help out with the missing
pieces. I do think that those can be added incrementally, and the current
patch already improves the state of affairs on Windows quite a bit.

For the avoidance of doubt: my employer agrees to license these changes
under the same license that HDF5 1.10.1 is licensed under.

Best regards,

-------------- next part --------------
A non-text attachment was scrubbed...
Name: windows_unicode_filenames.patch
Type: text/x-patch
Size: 25299 bytes
Desc: not available


Subject: Digest Footer

Hdf-forum is for HDF software users discussion.
[hidden email]


End of Hdf-forum Digest, Vol 102, Issue 25

Hdf-forum is for HDF software users discussion.
[hidden email]

smime.p7s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view

Re: Hdf-forum Digest, Vol 102, Issue 25

Hi Friedhelm,

    Sorry, it took me so long to respond.   (I was busy with the holidays when you posted.)

     If you’d like to help (and improve your Python skills at the same time), I’d recommend contributing to one of the open source projects on github.  H5py ( ) and Pytables (  both have a number of issues where you are welcome to make contributions.  Likewise, with h5pyd ( the Python client for h5serv/HSDS.


On 12/20/17, 9:52 AM, "Hdf-forum on behalf of Friedhelm Matten" <[hidden email] on behalf of [hidden email]> wrote:

    Hi John,
    I ' m an Java developer for 20 years and relative new to python.
    But I would like to help for a better effort.
    How and where could I start and help?
    Thanks and regards

Hdf-forum is for HDF software users discussion.
[hidden email]