Does format really matter for technical documents? As long as you have access to the content you’re looking for, does it make any difference what kind of wrapper it arrives in? In fact it does. Document format can make a significant difference in how you find and digest content. Let me explain.

In the digital era, there are really two dominant formats for the delivery of technical documentation; HTML and PDF. One (PDF) has content packaged into individual files that contain a complete document description – independent of any specific software, hardware, or operating system. The other (HTML) operates more as a loosely structured container that, while limited in document definition capabilities, offers considerably more flexibility for content interaction and sharing.

Which format is best for me?

As it turns out, the differences between HTML and PDF create a considerable bias in regard to who prefers content delivered in what format. A survey on that topic was recently carried out by Elsevier. Their research found that PDF is the format of choice for in-depth reading since it offers users offline access to content in an environment that is easy to navigate, consume, print, and save. PDF also offers superior markup and review functions, but it lacks in the ability to instantly deliver updates when a change is implemented. On the other hand, HTML was preferred for discovering new content, immediate learning, and information sharing. HTML updates are more ‘universal’ in the sense that they can be published instantaneously to a URL – ensuring that all readers are simultaneously being presented with the most current information.

What’s important to note is that HTML and PDF are by no means mutually exclusive. In fact, they exist well together. A PDF can be easily linked to or from an HTML page – and vice versa. The challenge is that the authoring environments for either format are usually different. Unfortunately this means maintaining content across both formats can be complex and time consuming. Content authors must therefore decide what the most beneficial format will be for the material they are creating.

A “Real World” Example

As an example of how problematic maintaining both HTML and PDF formats for sharing information can be, let’s look at a hypothetical company. Our imaginary company, Tech Inc., is a provider of precision widgets. Their product line is extensive and includes numerous sub-products that are all developed based on a core group of established platforms. As with most technology companies, the product lifecycle at Tech Inc. is short. New features and enhancements are added to existing products on a frequent basis. Updates to the core platform are less frequent, but they have implications that can cascade across full product lines.

With each product release, a large volume of technical documentation is released – ranging from high level user guides to advanced technical installation and troubleshooting materials. While there are numerous similarities for documents that come from each core product family, the document updates must all be developed and managed separately – a process that consumes a considerable amount of resources.

From a delivery side, all Tech Inc. documents are published to multiple locations:

  • Internal technical support database
  • External reseller support website
  • Customer-facing online product help files

Each of these locations is an HTML-based content repository, however the documents themselves are all found in PDF format. This creates challenges:

  • PDF document content can’t be easily searched from an HTML interface, limiting content “findability”.
  • Each repository is managed independently – requiring a considerable amount of upkeep.
  • HTML is only used to develop the interface since technical content in that format would not be available for offline use.
  • Once a PDF document is downloaded, there is no way to ensure updates are delivered.
  • A license of Adobe Acrobat is required for each end user needing to perform any editorial or markup actions.

These challenges beg the question “is it possible to develop a solution that provides the benefits of both formats?”


Have Your Cake and Eat it Too

XML holds the promise to overcome this challenge. A document markup language, XML enables content to be authored in a single environment and then published to either HTML or PDF (or both). This is because XML content can be structured according a defined set of rules that can be customized for delivery in digital environments.

Breaking content down into individually defined XML elements offers content creators a considerably higher degree of flexibility for content delivery. This enables the benefits of using HTML because of its convenience and suitability for discovering content; yet still offering the capability to create PDF output for offline reference.

By delivering XML content in HTML environments, users can also be provided with the capability to submit comments and document markups. This information can be used in a variety of collaborative ways, from gathering product feature requests, to collecting user feedback to ensure documentation is in line with its use in the field. It can also offer backend data collection capabilities to help track document use, or even procedural sign-off for tracking regulatory requirements.

TerraView supports this paradigm very nicely; using HTML because of its convenience and suitability for discovering content and determining its relevance yet offering the capability to create PDF output for offline access and the storage of content for later reference.