Opened 12 years ago

Closed 12 years ago

#1971 closed defect (fixed)

64bit LFS support for G_ftell() and PRI_OFF_T gis.h

Reported by: hamish Owned by: grass-dev@…
Priority: critical Milestone: 6.4.3
Component: LibGIS Version: svn-develbranch6
Keywords: r.in.bin, LFS, G_ftell(), PRI_OFF_T Cc:
CPU: x86-64 Platform: Linux

Description

Hi,

as reported on the users' ML, r.in.bin is failing for input files bigger than 2GB. thread: "ERROR: Bytes do not match file size with r.in.bin (but file size is correct!!)"

http://thread.gmane.org/gmane.comp.gis.grass.user/47202

It works in trunk, but not 6.x. Just after 6.4.1 was released r.in.bin got upgraded to off_t, but PRI_OFF_T in GRASS 6 set by gis.h remains as "ld" because

LFS_CFLAGS = -D_FILE_OFFSET_BITS=64

is missing from Grass.make (it is there in G7). That may just be the overflow in the error message though, I think the real problem is G_ftell() is always returning int, even when ftello() should be returning off_t.

fwiw in G6 only the flags for wxWidgets get D_FILE_OFFSET'd. In G7 PRI_OFF_T is "lld" and r.in.bin works on a >2GB test file.

here's a little Matlab/Octave code to make one:

%%%% make a >2GB binary data file
fd1 = fopen('lfs_test.bin', 'w');
rows=19450
cols=29404
for i = 1:rows
   row_content = [1:cols] * sqrt(i);
   fwrite(fd1, row_content, 'real*4');
end
fclose(fd1)
%%%%

and import command:

$ ls -l lfs_test.bin
-rw-r--r-- 1 hamish hamish 2287631200 May 10 12:43 lfs_test.bin

GRASS> r.in.bin -f input=lfs_test.bin output=outputmap bytes=4 \
         n=51:05:20.4N s=41:21:50.4N w=5:08:31.2W e=9:33:36E \
         r=19450 c=29404 anull=-9999.0

the error in GRASS 6 is:

WARNING: File Size -2007336096 ... Total Bytes 2287631200
ERROR: Bytes do not match file size

This is a 64bit system, and GRASS6> g.version -b | tr ' ' '\n' has the --enable-64bit option, but I'm not really sure why that's needed, as autoconf already knows the platform. (for cross-compiling from 32bit?)

thanks, Hamish

Change History (14)

comment:1 by hamish, 12 years ago

AFAICT r.in.bin is the only thing in GRASS 6 which uses G_ftell(), and nothing in the grass6-addons repo uses it. It was there in the 6.4.2 release though, so someone might be using it in a personal addon program. Could it harm backwards compatibility to change the function def'n, or would it be harmless?

Seeing that it is only r.in.bin, a less invasive fix to get the 6.4.3 release out the door might be to add the #ifdef HAVE_FSEEKO test into r.in.bin directly and avoid using G_ftell() altogether.

Hamish

in reply to:  1 ; comment:2 by glynn, 12 years ago

Replying to hamish:

AFAICT r.in.bin is the only thing in GRASS 6 which uses G_ftell(), and nothing in the grass6-addons repo uses it. It was there in the 6.4.2 release though, so someone might be using it in a personal addon program. Could it harm backwards compatibility to change the function def'n, or would it be harmless?

The worst that can happen is that off_t gets truncated to int when the result is assigned (which doesn't apply to r.in.bin, which assigns the result of G_ftell() to an off_t), rather than within G_ftell().

The truncation to int was done in r45468, but I don't know why. Possibly to avoid issues with off_t having different sizes in different files (didn't LFS have to be enabled on a module-by-module basis in 6.x?).

The irony is that having G_fseek() return int is even worse than using plain fseek(), which returns a long. This imposes a 32-bit limit even on "real" 64-bit platforms (i.e. not Windows, where even the 64-bit versions have a 32-bit long).

At the very least, G_fseek() should return a long, if not an off_t.

Seeing that it is only r.in.bin, a less invasive fix to get the 6.4.3 release out the door might be to add the #ifdef HAVE_FSEEKO test into r.in.bin directly and avoid using G_ftell() altogether.

7.0 has 31 uses of G_ftell() and 48 uses of G_fseek() (although a bunch of those are in r.viewshed and r.terraflow, which all come from the same AMI_STREAM template). The fact that 6.x only has one use indicate that everything else is using plain ftell(), which means that it will fail on files larger than 2GiB.

in reply to:  2 ; comment:3 by hamish, 12 years ago

Replying to glynn:

(didn't LFS have to be enabled on a module-by-module basis in 6.x?).

not any more:

r40620 | glynn | 2010-01-23 03:19:24 +1300 (Sat, 23 Jan 2010) | 4 lines

Enable LFS globally
Define, use HAVE_FSEEKO
Makefile cleanup

7.0 has 31 uses of G_ftell() and 48 uses of G_fseek() [...]. The fact that 6.x only has one use indicate that everything else is using plain ftell(), which means that it will fail on files larger than 2GiB.

outside of libgis, the module priorities would seem to me to be:

  • r.in.bin
  • r.in.ascii
  • r.in.poly
  • r.in.xyz (although there it's just used for a G_percent() estimate)
  • r3.in.ascii
  • r.out.mat
  • v.in.dxf

Hamish

comment:4 by hamish, 12 years ago

fwiw my glibc docs show vanilla fseek() returning int, it's ftell() that returns long:

int fseek(FILE *stream, long offset, int whence);
long ftell(FILE *stream);

in reply to:  3 comment:5 by glynn, 12 years ago

Replying to hamish:

Replying to glynn:

(didn't LFS have to be enabled on a module-by-module basis in 6.x?).

not any more:

r40620 | glynn | 2010-01-23 03:19:24 +1300 (Sat, 23 Jan 2010) | 4 lines

That's trunk.

7.0 has 31 uses of G_ftell() and 48 uses of G_fseek() [...]. The fact that 6.x only has one use indicate that everything else is using plain ftell(), which means that it will fail on files larger than 2GiB.

outside of libgis, the module priorities would seem to me to be:

FWIW, the modules and libraries which use G_fseek() and G_ftell() in 7.x are:

general/g.rename
imagery/i.find
include/iostream
lib/dspf
lib/gis
lib/iostream
lib/raster
lib/raster3d
lib/rst/interp_float
lib/sites
lib/vector/Vlib
lib/vector/diglib
ps/ps.map
raster/r.coin
raster/r.cut.mat
raster/r.in.arc
raster/r.in.ascii
raster/r.in.bin
raster/r.in.mat
raster/r.in.poly
raster/r.in.xyz
raster3d/r3.in.bin
vector/v.in.dxf
vector/v.support
vector/v.vol.rst

in reply to:  4 ; comment:6 by glynn, 12 years ago

Replying to hamish:

fwiw my glibc docs show vanilla fseek() returning int, it's ftell() that returns long:

Right. ftell() returns an offset, fseek() accepts one (and returns a status). The G_* versions should use at least a long for offsets.

in reply to:  6 comment:7 by hamish, 12 years ago

Glynn wrote:

That's trunk.

ah, the history was imported as the file was recently copied from trunk. ok, so 6.x is still LFS-on-demand. Added to devbr6 r.in.bin Makefile in r56207. formal audit of all modules which might need it can come later.

Replying to glynn:

The G_* versions should use at least a long for offsets.

if I undo r45468 and add

   #include <sys/types.h>	/* for off_t */

to include/gisdefs.h, then devbr6 builds ok and r.in.bin + the >2GB file imports ok for me on 64bit linux.

Glynn:

The truncation to int was done in r45468, but I don't know why. Possibly to avoid issues with off_t having different sizes in different files (didn't LFS have to be enabled on a module-by-module basis in 6.x?).

Martin, do you remember why the change to int was needed? I can't find anything in the archives about it.

thanks, Hamish

comment:8 by hamish, 12 years ago

in devbr6 I've committed r56422 which changes it back from int to off_t for testing.

Hamish

comment:9 by hamish, 12 years ago

Hi,

it would help testing if the nightly wingrass builds had the --enable-largefile ./configure switch enabled in the 6.5 build.

thanks, Hamish

in reply to:  9 ; comment:10 by hellik, 12 years ago

Replying to hamish:

it would help testing if the nightly wingrass builds had the --enable-largefile ./configure switch enabled in the 6.5 build.

changing there?:

http://trac.osgeo.org/grass/browser/grass/branches/develbranch_6/mswindows/osgeo4w/package.sh#L116

in reply to:  10 ; comment:11 by hamish, 12 years ago

Replying to hellik:

changing there?: [...]/develbranch_6/mswindows/osgeo4w/package.sh#L116

seems like it, thanks; change made in r56545. (see also grass-addons/tools/wingrass-packager/grass_compile.sh which calls that)

Hamish

in reply to:  11 ; comment:12 by mmetz, 12 years ago

Replying to hamish:

Replying to hellik:

changing there?: [...]/develbranch_6/mswindows/osgeo4w/package.sh#L116

seems like it, thanks; change made in r56545. (see also grass-addons/tools/wingrass-packager/grass_compile.sh which calls that)

Note that --enable-largefile has no effect with wingrass 6.x. Only trunk has LFS on windows. See also #1131 and the LFS-related changes to include/config.h.in [0].

[0] https://trac.osgeo.org/grass/browser/grass/trunk/include/config.h.in#L282

in reply to:  12 comment:13 by hamish, 12 years ago

Replying to mmetz:

Note that --enable-largefile has no effect with wingrass 6.x.

I guess the thing to do then is to delay worrying about it for wingrass, and just focus on get it working at least for 32bit Linux/Mac. I'll try to get to that tomorrow after a bit more testing, if all goes well I'll backport r56422 to relbr64 and then we should be ok to move the milestone to 6.4.4. (LFS audit for other modules as part of that)

Hamish

comment:14 by hamish, 12 years ago

Resolution: fixed
Status: newclosed

backported to relbr6 & tested ok on 64bit linux.

devbr6 tested ok in 32bit nightly wingrass build with a 1.1gb file (top half of the above sample).

closing ticket, moving LFS audit to #1990 and keeping an eye on #1131 for 6.4.4.

Hamish

Note: See TracTickets for help on using tickets.