JObjects Highlighter is a web service for highlighting search terms in PDF documents. Depending on the received HTTP request, it will either:
- create a new PDF with highlights included ("burnt in"), or
- show the original PDF document in a specialized web-based viewer, with highlights on top of it.
Which method will be used for delivering the results depends on the
Accept HTTP header.
If the header is not provided, JObjects Highlighter determines the method using server preferences.
In this guide, we'll show you how to consume the PDF highlighting service and walk you through the integration steps and its obstacles. For simplicity, in most of the examples here, we use JObjects Highlighter Cloud Service so that you can try them right away. The same principles apply to the use of the JObjects Highlighter Server that you can download and install on your server, but you need to use your instance URL instead.
In the examples below we reference a PDF file named alice.pdf that you can download here, or you can run tests using your own PDF document.
Burning Highlights Into a PDF From the Command Line
If you're on a Linux system with
curl available, you can run the following command to upload local file alice.pdf to Highlight4me
(JObjects' hosted highlighting service), highlight word rabbit, and save the received output as alice-rabbit-highlighted.pdf.
Notice that by using language=en parameter, we have instructed JObjects Highlighter to use English language rules. That will highlight not only rabbit (singular) but rabbits (plural) version of the word as well.
For demo purposes, Highlight4me service allows highlighting of PDF files of up to 1MB in size.
To try it with larger documents, you can start a trial, get your API key and include it with your requests
apiKey parameter or
X-Api-Key HTTP header:
Burning highlights into PDF documents is a rather specific use case. If you need to do this in scale, check out our batch highlighting tool.
Getting Highlighting Results as JSON
We'll get a response similar to:
This response provides just basic information about highlighted items. However, the field
cacheKey contains a key that we can send to the
hits method of the web service, to get the position of each found keyword:
That will return something like:
You don't need to deal with this response directly - this data is consumed by our Highlighting PDF Viewer instead. Let's see how to use this on a website...
Web Page Integration
Integration of JObjects Highlighter with a website or application comes down to:
- Creating and invoking highlighting method (e.g.
- Opening Highlighting PDF Viewer in a frame or a window, and passing both the PDF and the
hitsURL to it.
A simple jQuery based solution could look like this:
You can try live example at https://jsfiddle.net/jobjects/quzf7dp1/28/
The above example shows the simplest case, when both PDF and the viewer are coming from the same origin (i.e., have the same hostname and protocol) - from our CDN. In the real world, you would probably have to handle the case when they have different origins.
Run the above example after modifyin the
https://jobjects.com/examples/alice.pdf, or open https://jsfiddle.net/jobjects/quzf7dp1/29/
In the PDF viewer you will see the error message "An error occurred while loading the PDF" and in the web browser console you can see something like:
If you don't have prior experience with CORS, this requires an explanation...
Dealing With CORS
Cross-origin resource sharing (CORS) is a mechanism that allows restricted resources on a web page to be requested from another domain outside the domain from which the first resource was served. (see an explanation by Wikipedia or Mozilla)
Resolve Copying the Viewer
A simple workaround for the CORS issue would be copying Highlighting PDF Viewer to the server hosting PDF documents. If you opt for this approach, you can download JObjects Highlighter asset files for use on your web server.
However, this may not always be possible or desirable, so let's see how to fix this on the web server configuration level.
Resolve Sending Access-Control Headers
To use PDF viewer hosted on one domain but pull PDFs from another, you need to configure web server serving PDFs to send
Access-Control-Allow-Origin HTTP header.
For example, you could return
Access-Control-Allow-Origin: * which would allow
To try it live, open the updated demo at https://jsfiddle.net/jobjects/quzf7dp1/31/.
What we did is that we've changed the
https://jobjects.com/examples/cors/all/alice.pdf. If you inspect HTTP headers in your browser developer tools,
you can notice that we're returning
Access-Control-Allow-Origin: * for this file path.
Of course, you may want to limit such access only to a specific host. In fact, we highly recommend it. In the live example at https://jsfiddle.net/jobjects/quzf7dp1/32/
we're fetching the PDF from a path that returns
Access-Control-Allow-Origin: https://cdn.highlight4.me header.
Note that adding
Access-Control-Allow-Origin header does not circumvent any user access controls you may have on the website.
Resolve Using Reverse Proxying
Another way to work around the CORS is to use reverse proxing. Using this concept, you would set up virtual paths on your web server that would internally be forwarded (by the web server) to the remote service.
For example, you could configure your web server to proxy two paths:
Then, you could reference the highlighting service in your scripts at
and use the PDF Viewer at
If you have installed JObjects Highlighter Server on your server, you will need to set up proxying on the
user-facing web server to make it accessible to your users.
Typically, you would proxy path
http://localhost:8998/. For details, see reverse proxying.