Opened 2 years ago

Closed 12 months ago

#6540 closed defect (fixed)

2.2 GB GeoJson file crashes ogrinfo

Reported by: strk Owned by: warmerdam
Priority: normal Milestone: 2.3.0
Component: default Version: unspecified
Severity: normal Keywords:
Cc:

Attachments (1)

json.tar.xz (1.3 KB) - added by Kurt Schwehr 12 months ago.
json fuzzer corpus causing trouble with the new streaming parser

Download all attachments as: .zip

Change History (10)

comment:1 Changed 2 years ago by Even Rouault

This is a known issue with the driver that does full ingestion into memory, and this may require an amount of RAM up to 20 or 30 times the size of the serialized file.

comment:2 Changed 2 years ago by Jukka Rahkonen

The sample file has 612173 rows with all the osm tags as properties. I attached a small 5 row long excerpt to the ticket and those 5 first features have 207 different properties.

When I tried ogrinfo with my Windows machine I did not see crash. However, the computer was so badly jammed that it took 10 minutes after hitting Ctrl-Alt-Del before task manager opened so that I could kill the process.

comment:3 Changed 12 months ago by Even Rouault

Resolution: fixed
Status: newclosed

In 40457:

GeoJSON: support loading (true) GeoJSON files of arbitrary size (fixes #6540)

comment:4 Changed 12 months ago by Even Rouault

Milestone: 2.3.0

comment:5 Changed 12 months ago by Kurt Schwehr

Resolution: fixed
Status: closedreopened

Thanks for the new capabilities! I do however need a way to constrain the max amount before declaring a json file un parseable (especially when fuzzing). I need to be able to prevent DOS issues with massive allocations. In my environments, using too much ram gets the process killed rather ran erroring out on a alloc. The attached small files are from my fuzzer corpus and cause problems.

There may also be other issues in these files, but c.f.

#0  0x0000000000bca96b in realloc ()
#1  0x00000000005a49c5 in gdal_array_list_expand_internal ()
#2  0x00000000005a487d in gdal_array_list_put_idx ()
#3  0x00000000005a4a64 in gdal_array_list_add ()
#4  0x000000000047f4e1 in gdal_json_object_array_add ()
#5  0x0000000000485acf in OGRGeoJSONReaderStreamingParser::AppendObject(json_object*) ()
#6  0x0000000000487ecb in OGRGeoJSONReaderStreamingParser::Number(char const*, unsigned long) ()
#7  0x00000000009c5d09 in CPLJSonStreamingParser::Parse(char const*, unsigned long, bool) ()

and

#3  0x000000000082b876 in gdal_array_list_expand_internal () at arraylist.c:72
#4  0x000000000082b5ad in gdal_array_list_put_idx () at arraylist.c:83
#5  0x000000000117708b in CPLJSonStreamingParser::Parse(char const*, unsigned long, bool) () at cpl_json_streaming_parser.cpp:494
#6  0x00000000005dd731 in OGRGeoJSONReader::FirstPassReadLayer(OGRGeoJSONDataSource*, _IO_FILE*, bool&) () at ogrgeojsonreader.cpp:861
#7  0x00000000005cfae2 in OGRGeoJSONDataSource::LoadLayers(GDALOpenInfo*, GeoJSONSourceType) () at ogrgeojsondatasource.cpp:784
#8  0x00000000005ce9e0 in OGRGeoJSONDataSource::Open(GDALOpenInfo*, GeoJSONSourceType) () at ogrgeojsondatasource.cpp:144
#9  0x0000000000506233 in LLVMFuzzerTestOneInput ()

Changed 12 months ago by Kurt Schwehr

Attachment: json.tar.xz added

json fuzzer corpus causing trouble with the new streaming parser

comment:6 Changed 12 months ago by Even Rouault

In 40468:

GeoJSON: fix crash on empty key names (refs #6540, trunk only)

comment:7 Changed 12 months ago by Even Rouault

@schwehr I fixed a crash in r40468. Those files shouldn't cause excessive memory allocations

comment:8 Changed 12 months ago by Even Rouault

And there's the m_nCurObjMemEstimate > MAX_OBJECT_SIZE check that should make sure than a json_object* and its descendants do not take more than 100 MB.

comment:9 Changed 12 months ago by Kurt Schwehr

Resolution: fixed
Status: reopenedclosed

Even, r40468 seems to have fixed those for me. Thanks!

I am running a fuzzer on my existing corpus. I plan to run it for a few core hours and I'll report if I find anything.

Note: See TracTickets for help on using tickets.