Opened 8 years ago

Closed 6 years ago

#6540 closed defect (fixed)

2.2 GB GeoJson file crashes ogrinfo

Reported by: strk Owned by: warmerdam
Priority: normal Milestone: 2.3.0
Component: default Version: unspecified
Severity: normal Keywords:
Cc:

Description

Attachments (1)

json.tar.xz (1.3 KB ) - added by Kurt Schwehr 6 years ago.
json fuzzer corpus causing trouble with the new streaming parser

Download all attachments as: .zip

Change History (10)

comment:1 by Even Rouault, 8 years ago

This is a known issue with the driver that does full ingestion into memory, and this may require an amount of RAM up to 20 or 30 times the size of the serialized file.

comment:2 by Jukka Rahkonen, 8 years ago

The sample file has 612173 rows with all the osm tags as properties. I attached a small 5 row long excerpt to the ticket and those 5 first features have 207 different properties.

When I tried ogrinfo with my Windows machine I did not see crash. However, the computer was so badly jammed that it took 10 minutes after hitting Ctrl-Alt-Del before task manager opened so that I could kill the process.

comment:3 by Even Rouault, 6 years ago

Resolution: fixed
Status: newclosed

In 40457:

GeoJSON: support loading (true) GeoJSON files of arbitrary size (fixes #6540)

comment:4 by Even Rouault, 6 years ago

Milestone: 2.3.0

comment:5 by Kurt Schwehr, 6 years ago

Resolution: fixed
Status: closedreopened

Thanks for the new capabilities! I do however need a way to constrain the max amount before declaring a json file un parseable (especially when fuzzing). I need to be able to prevent DOS issues with massive allocations. In my environments, using too much ram gets the process killed rather ran erroring out on a alloc. The attached small files are from my fuzzer corpus and cause problems.

There may also be other issues in these files, but c.f.

#0  0x0000000000bca96b in realloc ()
#1  0x00000000005a49c5 in gdal_array_list_expand_internal ()
#2  0x00000000005a487d in gdal_array_list_put_idx ()
#3  0x00000000005a4a64 in gdal_array_list_add ()
#4  0x000000000047f4e1 in gdal_json_object_array_add ()
#5  0x0000000000485acf in OGRGeoJSONReaderStreamingParser::AppendObject(json_object*) ()
#6  0x0000000000487ecb in OGRGeoJSONReaderStreamingParser::Number(char const*, unsigned long) ()
#7  0x00000000009c5d09 in CPLJSonStreamingParser::Parse(char const*, unsigned long, bool) ()

and

#3  0x000000000082b876 in gdal_array_list_expand_internal () at arraylist.c:72
#4  0x000000000082b5ad in gdal_array_list_put_idx () at arraylist.c:83
#5  0x000000000117708b in CPLJSonStreamingParser::Parse(char const*, unsigned long, bool) () at cpl_json_streaming_parser.cpp:494
#6  0x00000000005dd731 in OGRGeoJSONReader::FirstPassReadLayer(OGRGeoJSONDataSource*, _IO_FILE*, bool&) () at ogrgeojsonreader.cpp:861
#7  0x00000000005cfae2 in OGRGeoJSONDataSource::LoadLayers(GDALOpenInfo*, GeoJSONSourceType) () at ogrgeojsondatasource.cpp:784
#8  0x00000000005ce9e0 in OGRGeoJSONDataSource::Open(GDALOpenInfo*, GeoJSONSourceType) () at ogrgeojsondatasource.cpp:144
#9  0x0000000000506233 in LLVMFuzzerTestOneInput ()

by Kurt Schwehr, 6 years ago

Attachment: json.tar.xz added

json fuzzer corpus causing trouble with the new streaming parser

comment:6 by Even Rouault, 6 years ago

In 40468:

GeoJSON: fix crash on empty key names (refs #6540, trunk only)

comment:7 by Even Rouault, 6 years ago

@schwehr I fixed a crash in r40468. Those files shouldn't cause excessive memory allocations

comment:8 by Even Rouault, 6 years ago

And there's the m_nCurObjMemEstimate > MAX_OBJECT_SIZE check that should make sure than a json_object* and its descendants do not take more than 100 MB.

comment:9 by Kurt Schwehr, 6 years ago

Resolution: fixed
Status: reopenedclosed

Even, r40468 seems to have fixed those for me. Thanks!

I am running a fuzzer on my existing corpus. I plan to run it for a few core hours and I'll report if I find anything.

Note: See TracTickets for help on using tickets.