Highlighter has an extensive set of options that control the caching of highlighting results. It allows Highlighter to return response rapidly if the same set of terms is highlighted and the original document didn't change.
Highlighter has two types of caches:
- Instance cache is a short term in-memory cache that holds recently highlighted documents.
- Disk cache is a persistent cache that can keep documents for longer periods (10 days by default). The Highlighter automatically deletes expired items from the file system. Directory paths for highlighted documents and metadata can be controlled by "docsCacheDir" and "metaCacheDir" configuration options respectively.
The caching mechanism considers the following data:
- Highlighting request HTTP parameters. By default, only known parameters are taken.
- Document metadata such as file timestamp and length. Separate sets of options exist for a document being highlighted and PDF highlights XML file (if applicable). When documents are loaded via HTTP, Highlighter relies on response headers content.
- Document content (bytes).
By default, document bytes are not part of the equation as Highlighter assumes that file metadata is valid.
For all available options, check the
caching section of the sample configuration file and its
Example 1: dtSearch encodes term locations for highlighting in a URL to "highlights XML file". In this case we should probably not rely on HTTP headers returned by the script that handles XML file request. Instead, to make caching work, Highlighter should only deal with XML file URL:
Example 2: If you use Highlighter as a back-end web service (which does not handle user requests directly), you could tell Highlighter to cache document files in a shared directory served by the front-end web server (e.g. Apache or IIS):
Then, you could redirect user to
/somepath is mapped by your web server to
/shared/highlighter-output/ , and
<docRef> is a document key returned by the Highlighter service.