wiki:FGDBSpecification

Context Navigation

Version 2 (modified by Even Rouault, 11 years ago) ( diff )
fixes for presence flags and string content

Introduction
Conventions
Specification of .gdbtable files
Specification of .gdbtablx files

Introduction

This is a work-in-progress reverse-engineered specification of .gdbtable and .gdbtablx files found in FileGDB datasets.

Conventions

ubyte: unsigned byte int16: little-endian 16-bit integer int32: little-endian 32-bit integer float64: little-endian 64-bit IEEE754 floating point number utf16: string in little-endian UTF-16 encoding string: (UTF-8 ?) string

A row or a feature are synonyms in this document.

Specification of .gdbtable files

.gdbtable files describe fields and contain row data.

They are made of an header, a section describing the fields, and a section describing the rows.

Header (40 bytes)

Format	Content
4 bytes	0x03 0x00 0x00 0x00 - unknown role. Constant among the files. Kind of signature ?
int32	number of (valid) rows
4 bytes	varying values - unknown role
4 bytes	0x05 0x00 0x00 0x00 - unknown role. Constant among the files
4 bytes	varying values - unknown role. Seems to be 0x00 0x00 0x00 0x00 for FGDB 10 files, but not for earlier versions
4 bytes	0x00 0x00 0x00 0x00 - unknown role. Constant among the files
int32	file size in bytes
4 bytes	0x00 0x00 0x00 0x00 - unknown role. Constant among the files
int32	offset in bytes at which the field description section begins, often (in FGDB 10) 0x28 0x00 0x00 0x00, i.e. 40
4 bytes	0x00 0x00 0x00 0x00 - unknown role. Constant among the files

Field description section

Fixed part

Format	Content
int32	size of header (this field excluded in bytes)
int32	version of the file ? Seems to be 3 for FGDB 9.X files and 4 for FGDB 10.X files
ubyte	layer geometry type. 1 = point, 2 = multipoint, 3= (multi)polyline, 4 = (multi)polygon
3 bytes	0x03 0x00 0x00 - unknown role
int16	number of fields (including geometry field and implicit OBJECTID field)

Repeated part (per field)

Following immediately: the description of the fields (repeated as many times as the number of fields)

Format	Content
ubyte	number of UTF-16 characters (not bytes) of the name of the field
utf16	name of the field
ubyte	number of UTF-16 characters (not bytes) of the alias of the field. Might be 0
utf16	alias of the field (ommitted if previous field is 0)
ubyte	0x00
ubyte	field type ( 0 = int16, 1 = int32, 2 = float32, 3 = float64, 4 = string, 5 = datetime, 6 = objectid, 7 = geometry, 8 = binary, 10 = ?, 11 = UUID, 12 = ? )

The next bytes for the field description depend on the field type.

For field type = 4 (string),

Format	Content
int32	maximum length of string
ubyte	unknown role
ubyte	unknown role

For field type = 6 (objectid),

Format	Content
ubyte	unknown role = 4
ubyte	unknown role = 2

For field type = 7 (geometry),

Format	Content
ubyte	unknown role = 0
ubyte	unknown role = 6 or 7
int16	length (in bytes) of the WKT string describing the SRS.
string	WKT string describing the SRS Or "{B286C06B-0879-11D2-AACA-00C04FA33C20}" for no SRS .
ubyte	"magic" (used after). Value is generally 5 or 7
float64	xorigin
float64	yorigin
float64	xyscale
float64	zorigin
float64	zscale
float64	morigin (omitted if magic = 5)
float64	mscale (omitted if magic = 5)
float64	xytolerance
float64	ztolerance
float64	mtolerance (omitted if magic = 5)
float64	xmin of layer extent
float64	ymin of layer extent
float64	xmax of layer extent
float64	ymax of layer extent

The organization of following bytes is a bit messy and seems to comply to the following algorithm : 1) Store current offset 2) Skip one byte 3) Read int32 value "magic2".

a) if magic2 = 0, then rewind to the stored offset and read 2 float64 (that happen to be NaN values). And then go to 2) b) otherwise (generally magic2 = 1 or magic2 = 3), skip magic2 x float64 values

For field type = 8 (binary),

Format	Content
ubyte	unknown role
ubyte	unknown role

For field type = 10, 11, 12,

Format	Content
ubyte	width : 38
ubyte	unknown role

For other field types,

Format	Content
ubyte	width in bytes (e.g. 2 for int16, 4 for int32, 4 for float32, 8 for float64, 8 for datetime)
ubyte	unknown role
ubyte	unknown role

Rows section

The rows section does not necessarily immediately follow the last field description. It starts generally a few bytes after, but not in a predictable way. Note : for FGDB layers created by the ESRI FGDB SDK API, there are 4 bytes between the end of the field description section and the beginning of the rows section : 0xDE 0xAD 0xBE 0xEF (!)

The rows section is a sequence of X rows (where X is the total number of features found in the .gdbtablx, which might be different from the number of valid rows found in the header of the .gdbtable). Each row starts at an offset indicated in the .gdbtablx file

Row description

Format	Content
int32	length in bytes of the row blob ( this field excluded)
ceil(number_fields / 8) * ubyte	flags describing if a field is null. See below explanation

Null fields flags

Each bit of the flags field encode for the presence or absence of the field content for the row. The flag is set to 1 if the field is missing/null, or 0 if the field is present/non-null (0 is used as well for spare bytes). The flag for the first field, in the order of the fields of the field description section (typically the geometry), is the least significant bit of the first byte of the flags field.

Note: there's no explicit data for OBJECTID and no flag bit for it. It must be ignored when considering the list of fields (for number_fields value in particular).

For each non-null field, the field content is appended in the order of the fields of the field description section.

Field content

Geometry field

This field is generally called "SHAPE".

Geometry blobs use 2 new encoding schemes :

varuint : a sequence of bytes [b0, b1, ... bN]. All bytes except last one have their msb (most significant bit) set to 1. The presence of a msb = 0 marks the end of the sequence. The value of the varuint is (b0 & 0x7F) | ((b1 & 0x7F) << 7) | ((b2 & 0x7F) << 14 | ... | (bN & 0x7F) << (7 * N). Note that a valid sequence might be just 1 byte.
varint : same concept as varuint. But the 2nd most significant bit of b0 (i.e. the one obtained by masking with 0x40) indicates the sign of the result, and should be ignored in the computation of the unsigned value. If the bit sign is set to 1, the value must be negated.

Format	Content
varuint	length of the geometry blob in bytes (this field excluded)
ubyte	geometry_type. 1 = 2D point, 3 = 2D (multi)linestring, 5 = 2D (multi)polygon. Other values possible. See SHPT_ enumaration of ogrpgeogeometry.h

For point geometries (geometry type = 1, 9, 21, 11)

Format	Content
varuint	x = (varuint + xorigin * xyscale) / xyscale
varuint	y = (varuint + yorigin * xyscale) / xyscale
varuint ( present only if Z component )	z = (varuint + zorigin * zscale) / zscale
varuint ( present only if M component )	m = (varuint + morigin * mscale) / mscale

For multipoint geometries (geometry type = 8, 20, 28, 18)

Format	Content
varuint	number of points

followed by points coordinates:

First point (i = 0):

Format	Content
varuint	x[0] = (varuint + xorigin * xyscale) / xyscale
varuint	y[0] = (varuint + yorigin * xyscale) / xyscale
varuint ( present only if Z component )	z[0] = (varuint + zorigin * zscale) / zscale
varuint ( present only if M component )	m[0] = (varuint + morigin * mscale) / mscale

For each next point (i > 0) (with dx = dy = dz = dm = 0 at initialization):

Format	Content
varint	dx = dx + varint. x[i] = x[0] + dx / xyscale
varint ( present only if Z component )	dz = dz + varint. z[i] = z[0] + dz / zscale
varint ( present only if Z component )	dm = dm + varint. m[i] = m[0] + dy / mscale

For (multi)linestring (geometry type = 3, 10, 23, 13) or (multi)polygon (geometry type = 5, 19, 25, 15)

Format	Content
varuint	total number of points of all following parts
varuint	number of parts, i.e. number of rings for (multi)polygon - inner and outer rings being at the same level, number of linestrings or a multilinestring, or 1 for a linestring)
varuint	number of points of first part
...	...
varuint	number of points of (number of parts - 1)th part (number of points of last part can be computed by substracting total number of points with the sum of the above numbers

followed by, for each part, points coordinates:

First point of first part :

Format	Content
varuint	x[0] = (varuint + xorigin * xyscale) / xyscale
varuint	y[0] = (varuint + yorigin * xyscale) / xyscale
varuint ( present only if Z component )	z[0] = (varuint + zorigin * zscale) / zscale
varuint ( present only if M component )	m[0] = (varuint + morigin * mscale) / mscale

For each next point (other points of the first part, or for all points of the following parts) :

Format	Content
varint	dx = dx + varint. x[i] = x[0] + dx / xyscale
varint ( present only if Z component )	dz = dz + varint. z[i] = z[0] + dz / zscale
varint ( present only if Z component )	dm = dm + varint. m[i] = m[0] + dy / mscale

String

Number of bytes of the string as a varuint, followed by string content

Other types

a int16 value for a int16 field, a int32 for a int32 field, etc..

Note : datetime values are the number of seconds since 30th dec 1899 00:00:00, encoded as float64

Specification of .gdbtablx files

.gdbtablx files contain the offset of the rows of the associated .gdbtable file.

Header (16 bytes)

Format	Content
4 bytes	0x03 0x00 0x00 0x00 - unknown role. Constant among the files. Kind of signature ?
4 bytes	0x01 0x00 0x00 0x00 (for GDB 10?), 0x03 0x00 0x00 0x00 (for GDB 9?) - unknown role.
int32	number of rows, included deleted rows
4 bytes	0x05 0x00 0x00 0x00 - unknown role. Constant among the files. Kind of signature ?

Offset section

The section starts immediately after the header (at offset 16) and is made of 5 x number_rows bytes. For each row,

Format	Content
int32	offset of the beginning of the row in the .gdbtable file, or 0 if the row is deleted
ubyte	constant to 0. unknown role

Padding section

A lot of bytes to 0.

Trailing section

The last few bytes look like 00 00 00 00 X 00 00 00 X 00 00 00 00 00 00 00 where X is non 0 (often 1). Unknown role

Attachments (1)

dump_gdbtable.py (140 bytes ) - added by Even Rouault 11 years ago. Redirect to https://github.com/rouault/dump_gdbtable

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text