As dtSearch search engine is able to generate PDF highlight file, recommended approach for integration with PDF Highlighter is using "highlight-for-xml" service method.
Follow dtSearch guide for web search application development and hit highlighting. Client requirements mentioned in the Highlighting hits in PDF files article do not apply when PDF Highlighter server is used! You should implement callback URL to serve "xml" as described in the dtSearch guide but, when combined with PDF Highlighter, any standard web browser can be used (no Acrobat Reader and no additional plugins required).
As PDF highlight file format does not contain any hints about search queries and relation between found words, PDF Highlighter cannot mark terms of a matching phrase as a single hit. However, PDF Highlighter supports phrase recognition when additional HitsByWord data from dtSearch results is provided.
HitsByWord data can be sent to Highlighter either as a part of highlight file or as request parameter.
Option 1: Including HitsByWord with highlight file
To include HitsByWord with highlight file, add
dtsearch_hitsByWord element to the generated xml with HitsByWord data added as a text node, as in the example:
With more recent dtSearch APIs that have
SearchResults.UrlEncodeItemWithIndexId method, implementation is straight forward.
When collecting search results, get URL encoded item from dtSearch using
SearchResults.UrlEncodeItem, and add it to your search results page:
In your controller action that returns highlight file from dtSearch, extend the generated file with HitsByWord data:
Option 2: Add dtsearch_hitsByWord parameter to highlighting request
In your dtSearch application that executes
Something along the line of:
HitsByWord when reading search results:
Add HitsByWord data as
dtsearch_hitsByWord parameter to highlight request along with uri and xml:
Depending on programming language and dtSearch API you're using to retrieve
HitsByWord, you may get
List<String> (list of strings) instead of a single String.
In that case join strings with a new line character as a separator ('\n') and send this resulting string as
dtsearch_hitsByWord parameter to Highlighter.
For common words,
HitsByWord may contain more data than web servers allow in a HTTP GET request. When sending
HitsByWord, it's recommended to either submit highlighting request using POST, or to set
dtsearch_hitsByWord parameter to callback URL where PDF Highlighter can get it.