White Papers

XML to Work: Advantages of Content Management

By Interleaf,
A Sponsor Member of OASIS

Introduction
As companies strive for success in an increasingly competitive global marketplace, content management has attracted attention as an information solution. Content management systems can help organizations leverage their tremendous investments in information throughout the enterprise. As with any new technology-based solution, there is confusion about what exactly constitutes content management. In this white paper we'll discuss the issues and technologies surrounding content management and examine the current state of content management.

Content management is more than just new technology. At its core, content management allows companies to use information to build stronger relationships along the supply chain, consequently tying customers, distributors, suppliers and manufacturers together. It employs new technology to customize information for customers by re-purposing existing knowledge of a customer's product, order and maintenance history. Content management creates a powerful win-win situation between the producer and user of a product, by maximizing customer productivity, satisfaction and loyalty and generating new revenue streams through increased order size and cross-selling.

Content Management: An Example

The following example shows how an effective content management system can help a company improve customer satisfaction and loyalty, increase revenue and compete with third party and after-market parts suppliers.

Diagram 1 

A manufacturer of custom industrial robots services its products through distributors and uses suppliers to provide sub-assemblies. When a robot malfunctions at a customer site, the robot operator receives a warning indicating that maintenance is required and calls the distributor to diagnose the problem.

The maintenance representative uses the serial number of the robot to obtain real-time documentation and troubleshooting guides for the malfunctioning robot. After diagnosing the problem, the rep orders the necessary replacement parts via a form directly from the order entry enabled troubleshooting guide. This online ordering form also displays delivery and availability information, allowing the rep to schedule an installation date with the customer.

When the parts arrive, the maintenance rep returns and installs the parts, and the robot is returned to service. The content management system has benefited the entire supply chain: the manufacturer received the parts order, the distributor was able to schedule its maintenance reps, and the customer minimized system downtime and knew when the outage would be corrected. If necessary, the manufacturer could have continued on the value chain to procure the part from the original supplier.

This type of content management system has additional benefits. First, the documentation and diagnostic materials are specific to the unit needing repair. In any manufacturing environment with frequent revisions or where most units are custom, tailored documentation greatly improves both the accuracy of documentation and the ease of maintenance. It lowers costs for the manufacturer and distributor by allowing the maintenance rep to see the exact procedures for the robot being repaired. Sections of the manual that begin "For serial numbers 29850-29899 use the following procedure..." are replaced with instructions for a customer's specific environment. This custom documentation helps reduce repair time, increase system uptime, and keep the customer's business up and running.

Second, a content management system gives manufacturers an opportunity to increase revenue through the sale of replacement parts and supplies. This is accomplished with warranty tracking systems that trace mean-time-to-failure of components and interdependencies among components. When customers order a part, they can be notified of other parts that are likely to need replacing, either because of interdependencies (such as gaskets and seals) or because of a historical failure relationship between parts. This gives customers the opportunity to avoid future unscheduled outages, and lets the manufacturer and distributor increase order size.

What is Content Management?

"Content management encompasses a set of processes and technologies, enabling the creation and packaging of content (documents, complex media, applets, components, etc.) as part of a dynamic and integrated Web-centric environment."

META Group

Let's examine this definition further. First, content management requires new, enabling technology. Second, content is not merely documents and words, but graphics, audio, video clips, live feeds and software components. This leads to two related questions that will be discussed later in this paper: What constitutes content? How can content be reused? Finally, the definition is decidedly web-oriented, bringing up two questions: Can content be shared between web and non-web uses? Are there new technologies that can assist in this process?

Enterprise content management emphasizes the need to address content management across all forms and formats of information stored throughout the extended enterprise. Within the enterprise, information is created using a wide variety of methods and tools, and this information is revised frequently. Enterprise content management takes the smallest, most appropriate units of information and allows them to be re-purposed and delivered in a personalized form to the individual requiring information.

What is Content?

In understanding content management, it may be helpful to distinguish it from document management and knowledge management. Document management deals with maintaining and storing documents. Knowledge management is concerned with making information accessible for decision making through index, query and search mechanisms. While content management shares some of the attributes of both document management (storing information) and knowledge management (accessing information), it goes beyond them to create a system for re-purposing and using information to drive business processes.

Because "content" encompasses a wide variety of information objects, an expanded repository and a new system of linking are required. The traditional method used to transfer a document into a document management repository won't work with a live feed. Nor will the standard method of linking, object linking and embedding (OLE). In fact, the most common method of reusing content is the familiar cut-and-paste technique and does not involve linking. Content management requires a new enabling technology, to accommodate the dynamic array of information.

One of the advantages of a content management system is that content can be created using the best tools for the job. Simple text documents can be created in Word while engineering drawings are built in sophisticated CAD/CAM software. Marketing literature may be created using Adobe Illustrator, complex technical documents might use Interleaf 7, etc. Users can employ the optimal tool for managing the creation and revision of a particular piece of content. Once the content is created, it is then imported into Microsoft Word or I7 to enforce structure.

Until now, the smallest reusable unit of content has been dictated by the output of the tool used to create it. The smallest unit of a text document was usually a Word file, a PowerPoint presentation, an Excel spreadsheet and so on. In re-purposing information, however, this is much too coarse a level of granularity. It is like saying that, in a library, a book is the lowest level at which you can access information, with no tables of contents or indexes.

To make content management possible, users must be able to access content in smaller units. Since these pieces of content exist electronically it should be possible to tag, index, search and reuse them in a variety of contexts and for a variety of purposes while continuing to maintain them using the most appropriate tool. In order to enable meaningful enterprise content management, a consistent markup is required that tags content not by how it looks but by what it means.

Structure and Format

<P>The Phantom of the Opera</P>
<film>The Phantom of the Opera</film>
<silent-film>The Phantom of the Opera</silent-film>
<film sound="no">The Phantom of the Opera</film>
<film sound="yes">The Phantom of the Opera</film>

The idea of identifying content is not new; numerous authoring tools define information formatting. In the case of Microsoft Word or other text-oriented authoring tools, this markup identifies formatting such as paragraph, italics, justification, font, and size, etc. While this is helpful for defining the appearance of information on the page, it is not intended to apply meaning to the content-it is only concerned with its presentation. In addition, the way in which Word identifies formatting differs from that of other word processing programs. There is no standard way to, for example, represent italics across different applications.

The SGML Approach

The idea that markup should be standard and separate from format information led to the creation of Standard Generalized Markup Language (SGML) in 1978. Designated an ISO standard in 1986, SGML provided two key markup innovations. The first was to provide a language for describing markup, not just a particular set of markup elements. The second was to separate the tagging of content from its presentation or style. In other words, you do not mark up content according to SGML, you write an SGML application that tags the content according to the rules set forth in the Document Type Definition (DTD). These rules do not define whether or not the content is centered or bold. Instead they define the structural elements that the content represents. The DTD describes what these elements are and an application aware of the specific DTD tags the content accordingly.

Since SGML seemed to solve the problems of tagging and reusing content in an open format, why didn't SGML become more widely used? For one thing, SGML and its companion language for writing style sheets, called DSSSL, are complex. SGML requires effort and expertise to be used effectively. While tools have been developed to assist in the process, because of SGML's complexity, they have been expensive. Today, there are no commercially available processors that implement the full DSSSL standard.

SGML did, however, lead to one of the most important developments in the history of computing-HTML. Without HTML, the Web as we know it would not exist. However, as the Web has expanded with video and live feeds, electronic commerce, and real time customization and modification of Web pages, HTML has shown some weaknesses. One of the most serious is that, in spite of being an SGML application, the HTML DTD emphasizes presentation over structure. Also, because HTML was developed on a text-oriented model, its linking facilities are extremely limited. In addition, because of limited style elements, HTML is unable to render content in browsers that is the same on all platforms and faithful to the original presentation. Finally, since HTML is oriented toward the static page, there is a heavy burden on the server whenever any type of ad hoc processing is required. These limitations are obvious to anyone who has experienced long download times or received hundreds of thousands of matches to a search.

Some attempts have been made to fix HTML. For example, the Dynamic HTML (DHTML) proposal introduced frames for downloading long pages and Cascading Style Sheets (CSS) for rendering. However, neither of these eliminated the structural deficiencies of the HTML DTD itself. Frames attempted to work around the problem HTML has with structure: the only top-level structural elements are 'head:' and 'body:.' CSS was designed to work around HTML's problem with layout: the layout is part of the markup.

In looking at DHTML with CSS, Microsoft made the following observation in a technical perspective on XML:

"CSS can still be used for simply structured XML data-and we anticipate that in such situations it will be useful. However, CSS does not provide a display structure that deviates from the structure of the data source. With XSL (eXtensible Style Language), it is possible to generate presentation structures (in HTML for instance) that are very different from the original XML data structure."

HTML also has content delivery limitations, especially when content is stored in databases, when there are complex interrelationships, and when the content is bound dynamically at the time of delivery. Without the ability to generate multiple, different presentations and to deliver a variety of content dynamically, enterprise content management is not possible.

Enter XML

Diagram 2 

In 1996 the World Wide Web Consortium and 80 SGML experts joined forces to develop a permanent solution to the problems of HTML. The result was a new language called XML (eXtensible Markup Language) together with a new style language called XSL (eXtensible Stylesheet Language) and, later, a new link language called XLink (eXtensible Links). XML is a simplified subset of SGML that is easy to use, designed specifically for the Web, and oriented toward content structure, not style.

XML technology has a number of advantages over SGML. First, XML is simpler to use and process than SGML, making it more likely that low cost tools that accept XML will be widely available. Second, because XML has been developed as an enhancement to the Web, it has broad industry support. XML has been adopted by Sun Microsystems and Microsoft, giving it a prominent place in Unix and Windows workplaces. Third, significant progress has already been made in defining standard XML DTDs for a variety of applications.

XML, combined with Java and object-oriented data technology, has become the enabling technology that makes enterprise content management possible.

True Enterprise Content Management

Enterprise content management binds customers, suppliers and manufacturers together, allowing information to flow back and forth along the supply chain, creating opportunities for success. To achieve these goals a content management system must allow content to be:

Created using familiar tools at any place within the enterprise
Structured and accessed in units appropriate to its meaning
Personalized and used in one-to-one marketing
Reused as often and in any combination desired
Easily updated and kept current
Faithfully rendered in a variety of presentation media.

Content Creation

The idea that the best available pieces of information should be easily reusable in any desired format is central to true content management. Too often the most useful illustrations are created by engineers using CAD/CAM software and are not readily available for use in maintenance manuals or product catalogs. Less detailed drawings are created and maintained for documentation purposes. To truly add value, an effective content management system must make "best-of-breed" text, graphics and multimedia available for use throughout the enterprise.

Content Repository

In order to store, maintain and manage content, XML/XSL, DTDs, etc., a true enterprise content management system contains a content repository. Here, content is broken down into reusable units, stored and managed with version control, check-in and check-out. Users can search on text, content, properties, structure and meta-data  (where used). The repository also manages links and references to different information units.

When a document or other content is received, a number of actions occur as it is entered into the content repository. A final validation against the XML DTD is performed. This check-in detects any exceptions to the rules established in the template for the DTD. Once the content has been validated, it is broken down and entered into the repository. This process creates the smallest meaningful unit that can be subsequently exposed and recombined for information re-purposing. These reusable units can be searched, combined, referenced and published interactively.

Content Editing and Composition

True enterprise content management systems provide vehicles through which reusable units are edited and searched and output presentation and style are defined. Composition and editing tools provide the means for flexible and easy reuse of content.

Editing tools provide mechanisms for defining output from the repository, including identifying reusable units of information and how they are to be formatted and presented. This output can be conditionally assembled to produce custom documentation directed to a specific assembly or customer.

Composition tools ensure that each reusable unit represents a meaningful piece of information. For instance, a six-step process to repair an assembly would be identified as a single reusable unit, as each of the six steps is required to complete the repair.

Content Publishing

The publishing component of a true enterprise content management system takes the XSL and XML document and produces a rendering. Content may be bound in a persistent form to make documents or CDs, or it can be linked and bound just prior to delivery for HTML documents. Combinations are possible, e.g., updates to a parts catalog might be posted daily to the Web and printed monthly. No matter what format is required, the reusable units are available to be published as often and in as many different formats and combinations as necessary. Customer-specific documents can be delivered real time via the Web or can be pressed on CD or printed. Each form can have its own presentation style, optimized for these selected media.

Conclusion

Content management systems provide powerful tools for reusing an organization's knowledge to increase revenue and profits. XML's ability to tag content by meaning, rather than appearance lets users create just the document they want, just in time. True enterprise content management has four components:

  1. Create content, adding structure and intelligence to it.
  2. Manage and link information.
  3. Publish and reuse information on-demand for individuals and groups, in a variety of formats, expertise levels and media.
  4. Add value to market-specific technology with true business solutions.

Implemented correctly, an enterprise content management system can help an organization achieve its strategic and business goals. Content management can increase revenue and profits through integrated maintenance and ordering systems, strengthen customer/supplier relationships with customized information, and lower maintenance and documentation costs for discrete manufacturers.

"Putting XML to Work: Advantages of Content Management" was written by Interleaf (www.interleaf.com). Interleaf is a sponsor member of OASIS,the Organization for the Advancement of Structured Information Standards (www.oasis-open.org).

OASIS is a nonprofit, international consortium dedicated to accelerating the adoption of product-independent formats based on public standards. These standards include XML, SGML and HTML as well as others that are related to structured information processing. Members of OASIS are providers, users and specialists of the technologies that make these standards work in practice. 1998 Interleaf, Inc. All rights reserved.

The information in this document is subject to change without notice and does not represent a commitment on the part of Interleaf or OASIS. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or for any purpose without the express written consent of Interleaf.

Interleaf
62 Fourth Avenue
Waltham, MA  02154 USA
Tel: +1.781.768.1578
Fax: +1.781.290.4955


Bottom Gear Image