Opened 6 years ago

Closed 6 years ago

#7185 closed defect (fixed)

Wrong text size when converting DXF to PDF

Reported by: Alan Thomas Owned by: warmerdam
Priority: normal Milestone:
Component: OGR_SF Version: svn-trunk
Severity: normal Keywords: pdf
Cc:

Description (last modified by Alan Thomas)

I converted autotest/ogr/data/leader-mleader.dxf to PDF using ogr2ogr:

ogr2ogr -overwrite -f "PDF" leader-mleader.pdf "C:\Projects\gdal\autotest\ogr\data\leader-mleader.dxf" -dsco STREAM_COMPRESS=NONE

The text sizes in the resulting PDF are excessively large.

Here's an example style string from ogrinfo -al on the original DXF (this is the style string for the big red text you see in the page):

LABEL(f:"Arial",t:"Apples",p:2,s:1g,c:#ff0000,a:10)

The font size in the resulting PDF is 1000.000000. Looking through the code, it's unclear to me where the multiplication by 1000 is taking place. I'm not sure what the correct size is; the text should be 1 DXF unit high, as implied by the style string, but a value of 1.0 in the PDF is too small.

(As a side note, each text object is associated with a grey circle, which seems unwanted.)

See also #5910.

Change History (5)

comment:1 by Alan Thomas, 6 years ago

Description: modified (diff)

comment:2 by Alan Thomas, 6 years ago

Upon investigation, it became apparent that the PDF writer's text output code was in need of some attention. I've got a patch that

  • fixes calculation of font sizes, line weights, etc that are given in ground units (g);
  • adds support for the three basic PDF fonts (Helvetica, Times, Courier) (f, bo, it)
  • switches to Helvetica as the default font, as I think this is more likely to be generally suitable for geospatial applications
  • adds support for text anchor position (p) and stretch (w)
  • removes the rendered POINT geometry (gray disc) at the text's anchor point

I'll upload the patch once the tests have been fixed. uploaded at https://github.com/OSGeo/gdal/pull/288

Unfortunately it is problematic to get the DXF font size to exactly match PDF font size. CAD software tends to measure text size by the capital height, whereas most graphics and word processing applications, including PDF, use the em height. Converting between the two values is only possible if you know the metrics of the specific font in use. I can see several options here:

  1. Admit defeat, and add a comment to the feature style specification saying that the exact interpretation of "font size" depends on the driver. ogr2ogr conversions involving text will often result in wrongly-sized text, up to 50% too small or too big in some cases.
  2. Somewhere in the output of drivers that handle text features (in a metadata area?), store a flag saying whether the font size reflects cap height or em height. In situations where we do know font metrics, like the PDF writer, we can use this information to adjust font size appropriately. This will solve the issue for conversions to PDF, but not from PDF (because the font metrics in the PDF will not be available to the other driver).
  3. On the basis that CAD data is more in line with GDAL's purpose than ordinary vector graphics formats, specify that GDAL "font size" relates to cap height. Then in the PDF reader, use the FontDescriptor data to obtain font metrics and convert em height to cap height. In the PDF writer, use built-in metric data for the 14 standard fonts to perform the reverse conversion. This will cause problems if we ever have a driver for a format that uses em height but does not store text metrics (SVG being one that comes to mind - possibly even some of the existing drivers using style strings).
  4. Specify that GDAL "font size" refers to either cap height or em height (doesn't really matter), then declare that cap height is always two-thirds of the em height and use that conversion everywhere. Two-thirds is a fair average, but there is wide variation.
  5. Create two separate fields in the feature style string, one for em height and one for cap height. Maybe overkill.
Last edited 6 years ago by Alan Thomas (previous) (diff)

comment:3 by Even Rouault, 6 years ago

Your knowledge about typographic issues goes far beyond mine. My quick research of what 'em' confused me (could'nt really make a sense of the wikipidia page). I'll defer to your judgement on the best outcome. It is obvious that 1:1 equivalence between formats is somewhat utopic and GDAL should only try to a reasonable effort to preserve appearance during conversions, and we probably don't want to implement typographic rules (in the case we'd go to that road, we'd better rely on third-party libraries specialized in that domain to get font metrics). Historically the OGR feature style spec has been (I believe) developed mostly for the need of the MapInfo file, so it could make sense to use the definition of font size used by MapInfo (which I don't know), and do our best with it. Adding specific fields to override/precise that basic metrics could also be an option.

comment:4 by Alan Thomas, 6 years ago

This is a useful primer for the difference between cap height and font size (em height).

In the New Year I will post to the mailing list about a range of proposed updates to the feature style specification.

comment:5 by Alan Thomas, 6 years ago

Resolution: fixed
Status: newclosed

In 41198:

PDF: Improvements to text and dash pattern support in writer (fixes #7185)

Note: See TracTickets for help on using tickets.