The web service method
/highlight-for-query takes the user's query string and finds hits to highlight using internal search engine.
A typical use case is the highlighting of documents listed in a search results page. (The first level search may be provided by any search solution, e.g. Elasticsearch, Apache Solr, or a database full text search).
This highlighting method requires the following parameters:
uri- PDF document location. The value can be combined with other options on the server too.
query- Search string containing words, phrases, etc.
There is also a number of parameters that affect document delivery. Check the service API documentation for details.
Due to the eventual feature differences of search engine used to find the document, the highlighted PDF may actually mark more or less words than found by the search engine. (Internally, the highlighter uses Apache Solr search engine which has highly customizable text analysis and search options.)
The simplest way to integrate Highlighter would be using our jQuery plugin. The approach involves:
- Adding data attributes to the results page HTML, and...
- Including and initializing pdfHighlighter plugin.
Adding data attributes to page HTML
To a common ancestor element of document links, add data attributes for the query and, optionally, language:
The query attribute may contain the complete search string including phrases (in quotes) and Boolean operators.
When rending the HTML page server side, make sure to HTML encode the query string. Otherwise, quotes from a phrase search could break your markup.
Initialize PDF Highlighter plugin
In the scripts section of your results page, include jQuery (if you don't use it already), the plugin script
jquery.pdf-highlighter.js, and initialize it:
In the above example, using jQuery selector we attached highlighter to all PDF links below the results element.
Search syntax for finding keywords and phrases is pretty universal and similar to searching on Google. Advanced querying syntax supported by PDF Highlighter is most similar to Apache Solr and Elasticsearch search engines.
Search for an exact phrase
To lookup for a phrase, enclose multiple words in quotation marks.
Proximity search allows you to find words near to each other within specified distance.
To find words acknowledgment and message within 5 words of each other.
Search using wildcards
Use * to matches any group of characters, or ? for a single character.
Searches for documents containing any word starting with the letters qual, such as qualify, quality, qualification, qualifier, and so forth.
To perform a fuzzy search, use the tilde ~ symbol at the end of a single-word term.
This search will match terms like roams, foam, & foams. It will also match the word "roam" itself.
An optional distance parameter specifies the maximum number of edits allowed, between 0 and 2, defaulting to 2. For example:
This will match terms like roams & foam - but not foams since it has an edit distance of "2".
NOTE: If fuzzy search is enabled globally, it will apply to all keyword searches without need to use the tilde symbol.
Boolean expressions, positive and negative terms
PDF Highlighter supports Boolean expressions, as well as positive and negative terms, with some specifics. Considering that a typical use case for PDF Highlighter is highlighting of documents that already matched user's document search request, it's generally safe to pass the same query to PDF Highlighter for document processing. All keywords which are not specifically excluded (with a "NOT" or with "-") will be highlighted.
If language is defined, PDF Highlighter automatically enables stemming. Stemming is a text analysis technique that allows you to find word variations. It means that if you search for term qualification search engine will also find documents with terms qualify, qualifier, etc.
To search using a regular expression, prefix your query with
Regular expression search works with page text as is, including white space.
Multi Query Highlighting
/highlight-for-query web service endpoint allows passing multiple queries (even thousands)
for PDF processing, at the same time allowing greater level of the highlighting process.
To annotate multiple PDF documents for a predefined set of phrases, check batch highlighting tool.
To send multiple queries to Highlighter, send
POST request to
/highlight-for-query with a payload as:
The query array may contain one or more items where:
- The only required field of a query item is the
- Unless the
typeof item is
phrase, the query can be any search string. There's no need to put query terms in quotes if the query type is set to phrase.
coloris desired highlighting color for the query item, specified as RGB value.
tagis client's identifier for the query item. If the
colorwas not specified but the
tagis defined, all queries with the same tag will get assigned the same color.
The payload object can also contain other parameters accepted by the PDF highlighting service.