Quantcast

Trim of strings on read in HDF-Java

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Trim of strings on read in HDF-Java

Nelson, Jarom

I ran into an unexpected feature in the HDF-Java implementation. When a string (including strings within a string array) is read from a file, the strings are trimmed of any character <=  '\u0020', including newlines, tabs, and linefeeds. I thought this was strange.

Dataset.java method byteToString() has this code (jhdfobj-2.11.0):

 

            // trim only the end

            int end = str.length();

            while (end > 0 && str.charAt(end - 1) <= '\u0020')

                end--;

 

The full string is read from the file and converted from bytes up to the point of the above code, when the end index is trimmed to the first character above '\u0020'.

 

What’s the reason behind this?  My guess is to make sure that there are no null characters at the end of the string, but seems to be overly aggressive.

 

Jarom Nelson; x33953

Computer Scientist, NIF, LLNL

 


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Trim of strings on read in HDF-Java

bljones

Hi Jarom,

 

Off-hand we are not sure of the reason for this. I entered issue JAVA-1959 so that we investigate it further.

Thank you!

 

-Barbara

[hidden email]

 

From: Hdf-forum [mailto:[hidden email]] On Behalf Of Nelson, Jarom
Sent: Thursday, March 02, 2017 11:06 AM
To: [hidden email]
Subject: [Hdf-forum] Trim of strings on read in HDF-Java

 

I ran into an unexpected feature in the HDF-Java implementation. When a string (including strings within a string array) is read from a file, the strings are trimmed of any character <=  '\u0020', including newlines, tabs, and linefeeds. I thought this was strange.

Dataset.java method byteToString() has this code (jhdfobj-2.11.0):

 

            // trim only the end

            int end = str.length();

            while (end > 0 && str.charAt(end - 1) <= '\u0020')

                end--;

 

The full string is read from the file and converted from bytes up to the point of the above code, when the end index is trimmed to the first character above '\u0020'.

 

What’s the reason behind this?  My guess is to make sure that there are no null characters at the end of the string, but seems to be overly aggressive.

 

Jarom Nelson; x33953

Computer Scientist, NIF, LLNL

 


_______________________________________________
Hdf-forum is for HDF software users discussion.
[hidden email]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
Loading...