dtSearch Engine
As dtSearch search engine is able to generate PDF highlight file, recommended approach for integration with PDF Highlighter is using "highlight-for-xml" service method.
dtSearch Engine
Follow dtSearch guide for web search application development and hit highlighting. Client requirements mentioned in the Highlighting hits in PDF files article do not apply when PDF Highlighter server is used! You should implement callback URL to serve "xml" as described in the dtSearch guide but, when combined with PDF Highlighter, any standard web browser can be used (no Acrobat Reader and no additional plugins required).
Phrase Highlighting
As PDF highlight file format does not contain any hints about search queries and relation between found words, PDF Highlighter cannot mark terms of a matching phrase as a single hit. However, PDF Highlighter supports phrase recognition when additional HitsByWord data from dtSearch results is provided.
HitsByWord data can be sent to Highlighter either as a part of highlight file or as request parameter.
Option 1: Including HitsByWord with highlight file
To include HitsByWord with highlight file, add dtsearch_hitsByWord
element to the generated xml with HitsByWord data added as a text node, as in the example:
With more recent dtSearch APIs that have SearchResults.UrlEncodeItemWithIndexId
method, implementation is straight forward.
When collecting search results, get URL encoded item from dtSearch using SearchResults.UrlEncodeItemWithIndexId
method,
not SearchResults.UrlEncodeItem
, and add it to your search results page:
In your controller action that returns highlight file from dtSearch, extend the generated file with HitsByWord data:
Option 2: Add dtsearch_hitsByWord parameter to highlighting request
In your dtSearch application that executes SearchJob
, enable dtsSearchWantHitsByWord
and dtsSearchWantHitsArray
flags.
Something along the line of:
Collect HitsByWord
when reading search results:
Add HitsByWord data as dtsearch_hitsByWord
parameter to highlight request along with uri and xml:
Depending on programming language and dtSearch API you're using to retrieve HitsByWord
, you may get List<String>
(list of strings) instead of a single String.
In that case join strings with a new line character as a separator ('\n') and send this resulting string as dtsearch_hitsByWord
parameter to Highlighter.
For common words, HitsByWord
may contain more data than web servers allow in a HTTP GET request. When sending HitsByWord
, it's recommended to either submit highlighting request using POST, or to set dtsearch_hitsByWord
parameter to callback URL where PDF Highlighter can get it.