Web service method
/highlight-for-xml highlights the PDF file for text positions specified in an XML file.
This PDF highlighting method requires the following parameters:
uri- PDF document location.
xml- Highlights file location. This URL is typically served by a search application with highlighting data provided by the search engine.
For all available options, see API documentation.
Earlier versions of Adobe Reader (up to version 8, and 9 with option change) supported PDF highlighting, using as input an XML-like structure that contains term offsets and lengths. Document URL relying on this feature of Acrobat Reader specifies highlight file location as xml parameter after the hash:
Assuming that your search application already generates PDF document links in the above format, the simplest way to integrate Highlighter would be using our jQuery plugin:
jquery.pdf-highlighter.js and using jQuery selector attaches highlighter to all PDF links below the results element.
The standard Adobe highlight file uses
loc elements to specify highlight ranges. Each highlight range line has attributes:
pg: Specifies the page on which the highlight is located. Pages are numbered sequentially, with the first page in a file having a page number of zero.
pos: Specifies the offset of the highlight on the page. The offset of the first character on a page is zero. (While Adobe XML allows offsets to be specified either in words or characters, PDF Highlighter supports only character offsets.)
len: Specifies the number of words or characters to highlight.
For simpler integration with other tools, Highlighter support some non-standard extensions to the Adobe highlight file.
Highlighter supports some an extended highlight file format that simplifies document highlighting for NLP output data. File format differences are described below.
Additional body element attributes:
positions: By default,
posoffsets used in highlight ranges are Adobe compatible. If this attribute is set to
posoffsets used in highlight ranges match positions in text extracted by Highlighter (i.e. content returned by the
Highlight range line (the loc element) attributes:
pg: Page index attribute is not required. If
pgis missing, the
posoffset specifies text position in document instead of in page.
color: Specifies RGB color code to use for the highlight range.
The standard highlight file format dos not support phrase definition. See dtSearch Engine integration for extended format that allows phrase highlighting with dtSearch.