XML.org XML.org
FOCUS AREAS | XML-DEV | XML DAILY NEWSLINK | REGISTRY | RESOURCES | ABOUT

SEARCH XML.org

OASIS STANDARD
FOCUS AREAS

DITA
ebXML
IDtrust
OpenDocument
UDDI

INDUSTRY FOCUS AREAS

Defense
e-Government
Financial Services
Healthcare
Human Resources
Insurance
Localisation
Printing & Publishing
Retail
Security
Tax/Accounting
Other Industries

NEWS

XML Daily Newslink
Vendor News
Cover Pages
OASIS

REGISTRY

Schemas/DTDs

RESOURCES

Call For Participation
XML Basics
Who's Who
XML FAQ
White Papers
Feature Articles
Glossary
Events
Related Sites
Newsletters
XML-DEV Mail List
User Groups
Books

ABOUT

Mission
Sponsorship
Contact Us

Whitepapers

The XML Cuisinart
Making Users Happier and Markup Better with XML and SGML Appliances

By Chet Ensign, Matthew Bender & Company, Inc., A Member of OASIS

Appliances: Tools To Change "Master" to "Mass Market"

Note: Throughout this paper, the terms "XML," "SGML" and "markup" are used interchangeably. Although the effect may sometimes be awkward, it has been done to avoid even more awkward grammatical constructions like "SG&X-ML." Readers should feel free to substitute whatever set of initials they prefer, because the ideas apply equally to both.

Once upon a time, you had to be a Thomas Alva Edison to do anything useful or practical with the electric motor. The device was novel and powerful and showed great promise for transforming the way people did their work. But it was not widely adopted into common use until bright product designers started wrapping plastic cases around it, sticking a few simple On/Off buttons on the front and giving the resulting gizmos catchy names like "Mix Master." Only then did the electric motor become a mass market, general-purpose technology.

XML and SGML are novel and powerful and have already shown us how to transform the way people produce information products. But all too often, to the end users (especially those who write the content), they look like something only a whiz kid could love, much less actually use. SGML tools have often faced an uphill battle for acceptance; XML, although it promises less technical complexity, will likely face the same struggle for end user mind share. Many who saw the point of the technology from the beginning, who intuitively understand it, grumble at the fact that people resist learning and using it. "It is simply not that hard," we say. "Once you try it, you'll like it."

Yet the end users have a point. General-purpose markup editors straight out of the box do not make structured writing easier or more intuitive to most writers. Making explicit something that was previously implied by formatting seems, at first, awkward, artificial and unnecessarily complicated. We can help make the process easier, ensure user acceptance, and get better quality data if we create the equivalent of the "XML Cuisinart"-simple but powerful tools focused on accomplishing specific markup tasks and tailored to the ways that our content creators approach their work.

Many of us have been doing this intuitively all along. We have been developing dialog boxes and batch scripts to help users be more productive and ease the process of tagging data. This paper simply proposes a conceptual category for these sorts of tools: the "markup appliance." It will explain why we need appliances and where they fit in our overall systems. It will describe the advantages and benefits of their use (not the least of which is making XML editing more approachable and palatable). It will list the characteristics that these tools have in common and, along the way, demonstrate several, real-life examples that have been applied to good effect.

Why Do We Need XML Appliances?

We should start by asking if there really is a need for tagging appliances. What can an appliance do for a user that a good structure-aware editor can't? How will it make our systems more successful, especially since developing appliances may increase our overall system development costs?

To answer these questions, let's look the goal of structured editing. That goal is not to create valid SGML or XML documents. There may be some technical satisfaction in that (for us anyway), but it's certainly not the point of the exercise.

The real goal is to capture expertise in content rich with unambiguously identified information that can be used to drive processing and build useful products. The people who will create that rich content are not us, but the writers, editors and data technicians who will use the systems we field.

Writers, et al are active agents in an information food chain. They have content and information products to produce, refine or turn into a product. Their goal is to get their part of the work done and out the door. Often they view SGML as irrelevant to their task, or even an outright obstacle. In "Structuring XML Documents" (Prentice Hall PTR, 1998, pg. 120), one of the books in The Charles F. Goldfarb Series on Open Information Management, David Megginson hits this nail on the head when he writes:

"Your authors - especially if they are new to structured documents - might think of markup not as part of writing but as something that they have to do in addition to writing. In other words, they might think of the time that they spend adding markup as overhead, and if so, they will want to reduce that time as much as possible."

We need to keep that thought firmly in mind if we are to build tools that they will use, and that, at the same time, will give us the structured content we need.

The Focus of Our Tools Needs to Be Their Expertise

Think of it this way. Everybody has their "sphere of complexity," a subject that they know in depth. Whether that subject is law, engineering, mathematics or medicine, organizations are willing to pay them quite well to put that expertise to use. Just as often, the people we want for their expertise could not care less about ours. Our job, as developers, is to make sure they don't need to. The more demands we make on them to learn an additional expertise, the more resistance we face-unless those added requirements are directly relevant to their knowledge.

Most of us get one or two complex subjects in which we can become skilled: civil litigation and Olympic one-design sailing; electrical engineering and Civil War reenactment; French cooking and brain surgery. One we get paid for, the other we get to do for fun. After that, we want things to be easy. We could all learn to start a cooking fire with a flint and steel, but few of us actually do. Most of us could learn to do our own auto maintenance, but just look at the success of "Jiffy Lube." Raising domesticated animals for food? Certainly feasible, but a shrink-wrapped Purdue oven-stuffer roaster is faster and doesn't leave all the feathers to clean.

Creators of content have at least one of their complex expertise slots already taken up. That's the one for which they get paid-law, engineering, physics, software development, etc. With one complexity slot already taken, we must consider ourselves lucky if they also turn out to be reasonably computer literate without driving them over the "two-expertise" limit. Assuming that a foundation of computer skills is there, can we really expect them to take to doing a lot of extra work if they see it as "overhead?"

No. If we want to achieve our overall goal, then we have to provide our users with tools to do what they need to do (create the content) while giving us the result we need (valid, complete and accurately applied markup). Part of getting there starts with good DTD design. But another part of the job is developing tools that make adding their knowledge to the markup more straightforward-in other words, markup appliances.

It Takes More Than a Structured Editor

Doing that takes more than just giving them an SGML editor, no matter how good the formatting looks on screen.

Markup editors have, from their beginnings, chased the paradigm of the WYSIWYG word processor. Out of the box, their approach to structured writing tries to be just like unstructured writing. In one sense, that's not surprising. For years, WYSIWYG has been the prevailing model for how documents are constructed on computers. It is also what customers say they want. While many of us argued the "content not format" theory, our DTDs also continued to be print-centric. In the beginning, it was hard for any of us to think outside the WYSIWYG box. It took a long time-and the impact of the World Wide Web-to disrupt that paradigm and get a new brand of thinking happening.

The problem was-and is-that markup editors are not WYSIWYG tools (nor should they be). WYSIWYG tools provide direct access to look and feel. They offer no access to structure and identity and no way to identify and store expert knowledge.

The fit has been wrong from the start, and that has become the crux of user complaints. Here we were, the developers, trying to take their WYSIWYG away, but not providing them with anything new or different or better in its place.

As Megginson points out, structured writing makes the two acts of WYSIWYG writing-composing and formatting-explicitly different. One is now the act of typing content and one is the act of applying structure. It also makes the structuring of one more demanding, because you have more choices. Now, you have to state precisely what it is about this text that makes it italic instead of just deciding that you want it italicized. Smoothing over the rough edges of that split and making it easier on the end user is what the appliance idea is all about.

What Does An Appliance Look Like?

If we accept the notion that appliances will help make structured information easier to create, then what do these things look like? What are their characteristics? What makes an 'appliance' different from any other general purpose structured editing tool?

An SGML or XML appliance may be as simple as a batch script that allows the user to provide a few initial settings then executes on an entire collection of tagged files. It may be as involved as a series of interactive dialog boxes that walk the user through a specific task. The defining characteristic of an appliance is that it is focused on performing one or two small but specific sets of tasks. It does not try to be a general-purpose tool. Instead, it is designed to optimize some specific task and help its user execute that task quickly and efficiently. A good appliance will reduce the task to the fewest necessary steps, minimize distractions that interfere with the user's concentration, limit the choices or options to just those required, and help focus the user's concentration on the key parts of the task at hand.

An appliance will also carry out as many of the logical parts of the task as possible itself. There is no point in making users do something by hand if the tagging can be done automatically. If we can programatically tag some of the input data (even if the experts still need to double-check the resulting markup), we ought to do it. Leaving users to carry out actions that they know the computer could do makes them justifiably frustrated-with us and with the tools we give them. The goal of our efforts should be to build tools that let users do those tasks that only they can do, and require as little other effort from them as possible.

For example, suppose we want our experts to categorize hierarchical subdivisions of our content by assigning values to selected attributes. Certainly this can be done with an SGML editor. However, there are some drawbacks that make this less than ideal. For one thing, they have to scroll through the document, looking for each division-type element. Not necessarily efficient. For another, several steps will be needed to open the attribute value dialog box, select those attributes of interest and set their values. (One of our users called this a recipe for carpal tunnel carping to the HR department.) Further, those division elements may have a number of attributes, only a few of which do we want our experts to touch. Lastly, we may also want to control the universe of choices available to them, without providing either an unrestricted CDATA input box or hardwiring the valid values directly into the DTDs. (In fact, different teams may require different sets of values. In that case, we definitely want to avoid specifying them in the DTD.)

The problem is tailor-made for solving with an appliance. Using a programmable editor or a GUI-development tool, we can build a classification-tagging tool to help users perform the task. The tool can be set up so that it jumps directly from one subdivision element to the next and presents only the specific attributes that we want the user to set, leaving all the rest hidden from view. It can also be set up to present lists of valid attribute values, either by including them in its code or drawing them from external configuration files.

In fact, Matthew Bender built such a tool for its legal editors. Their productivity jumped from being able to tag 10 or 12 elements in an hour to being able to tag over 100, with a significant increase in accuracy. Their satisfaction jumped too, because the tool was designed in cooperation with them. It reflected their feedback, matched their language and concept of the problem, and gave them only what they needed to do the job.

Batch Process Appliance

Fewer than half the applicable attributes are included.

So what does a markup appliance look like? In general, it will have many of the following characteristics:

It will not be a general purpose tool. It will be limited to one or two focused functions, such as tagging specific elements values or fixing specific tagging problems;
Its objective will be simple, well defined and task-oriented, with the task being defined not by the developers but by the users. Typical objectives will be to capture expertise, streamline a fix, or support a forms-driven, rapid-reporting type of application;
The interface will be simple and tightly coupled to its purpose. In some of the appliances we have built, the interface simply provides one or two option choices and an email address where the results should be sent;
The design and language used in the tool will be the language and expertise of the users. An appliance will talk the experts' language, not the XML language. This also suggests that appliances for different groups of users will use different names and presentations for the same underlying data;
It will restrict choices to those of direct interest to what the user is doing. For example, an element may have 27 attributes, but if your task only requires setting values for two attributes, those two attributes are the only ones your appliance will show. Likewise, if a task is performed on one or two specific elements, the appliance's display will be designed to highlight those elements, and isolate those rest;
It will handle 'no-brainer' markup automatically. If an element can be automatically selected or an attribute value automatically set, then the appliance will do that;
It will simplify the process of making choices for the user and assist their decision-making as much as possible. You can not describe logical dependencies between elements or attribute values in a DTD, but you can program them into an application. If setting the value for the "State" attribute can reduce the number of valid choices in the "Jurisdiction" attribute from hundreds to a handful, then we owe it to our users to build our appliances to do that. If there is no way around the fact that a large number of elements will be valid at a certain point, then an appliance might be built to group the elements so that they can be picked from a series of cascading selection boxes.

btchapp

An interface to a batch process appliance.
Note how few choices are given to the user

Again, the goal of an appliance is to isolate and simplify a markup task, eliminating its stigma as clerical "overhead." Technical proficiency, whether in the SGML design itself or in the programming logic, should be hidden behind the tool. Ideally, you have been successful if your users never realize enough about the tool's innards to even think to express their appreciation.

The Three Types of Markup Appliance

We can group appliances into three basic types:

  1. Batch scripts that perform some task automatically, perhaps with some user interaction;
  2. 2. Interactive tools that help users solve a single task;
  3. 3. Hybrid tools that combine elements of both.

Batch appliances will be those that run with little or no human intervention. That does not mean that they eliminate the need for people. Often, some user input will be needed in the beginning to set options. Often, the result produced by the appliance will need human review and double-checking. But the appliance itself can get access to all the rules, logic and data that it needs to run independently.

Take, for an example, an automatic cross-referencing appliance. If you need to cross-reference large volumes of existing text, you don't want to do it by hand. Instead, you would build a batch appliance with rules for identifying text that looks like a reference, and actions for turning them into cross-references. The appliance may have access to external files that define the universe of potential cross-reference targets. It will certainly have logic for handling ambiguous text strings that it cannot resolve itself. And it will have general housekeeping code for logging events, flagging ambiguous text strings, and reporting its completion, status and any errors to the user. The resulting XML output will still need human QC to ensure the accuracy and correctness of the markup. But the batch appliance relieves your production staff of hours of drudgery and puts their efforts to better, more productive, use.

Interactive appliances will contain a dialog box or forms-driven interactive tool, either built on top of a more general-purpose SGML editing tool or built as a stand-alone application. (Indeed, one of the benefits of the appliance concept is that it allows you to break the process of developing a robust XML authoring environment into a series of discrete problems to be solved one-by-one.)

For example, lets look again at that cross-referencing problem. After running a batch appliance over the content, production staff needs to check the results and correct any erroneous tagging. Assuming that speed is of the essence, and the production staff wants to attack this problem head-on, you could provide them with an interactive appliance customized to hammer through the references. Instead of their having to scroll through the content, looking for each cross-reference tag, it could scroll for them. It could provide them with a point-and-click way to check the reference, and it could provide them with a simple interface for correcting the reference if they find a mistake.

Matthew Bender built a pair of appliances like this to help its production staff quickly perform an upgrade to a specific set of elements. The combination of a batch appliance, drawing information from a configuration file and from our document management system, and an interactive tool to let them rapidly QC the resulting SGML made feasible a project that everyone was dreading. The prospect of going through hundreds of gigabytes of data, finding and changing markup, was not making the production staff happy. The development of a simple set of appliances to batch through the tedious parts of the task and quickly verify the results was what made it feasible to tackle the project at all.

Hybrid appliances combine aspects of both batch and interactive tools. They may first execute a batch process then invoke an interactive component for cleanup, verification, additional tagging, etc. They will very likely have interfaces that are dramatically different from the 'page paradigm' and they may well draw input from other applications, such as your document management system. In this, they provide a double benefit. They help solve practical markup problems and, at the same time, they help evolve everyone's thinking beyond the WYSIWYG document bias.

To take the cross-reference problem again, envision an appliance that would first run in batch mode, identifying and tagging references as best it could. But, once finished, it would extract from the document strings of text that contained cross-reference tags and display them, line upon line, in a database table or a spreadsheet format. If it were really slick, it would present the markup as icons to make reading easier. The user could then verify or correct tagging in a highly productive environment, where all the markup of interest was gathered together in a very terse display. Once the user had finished all the checking and fixing, he/she would click the "Apply Changes" button and a batch process would write the corrected markup back into the source file.

This is one appliance Matthew Bender has not yet built. But the company does have several systems to build and problems to solve where a hybrid appliance will be the perfect solution. In fact, for some of those challenges, the hybrid approach will be the only practical way to get the problem solved.

Conclusions

Several years ago, the SGML community watched with amazement as HTML and the World Wide Web took the world by storm. It was terrible SGML (or so many SGML-veterans thought), yet it was more widely adopted in a few short years than SGML was before or since. We've all come to accept that simplicity and simple yet far-ranging hypertext capability were HTML's two most powerful and appealing qualities. The Web was, in a sense, the first SGML appliance.
An appliance is a situation where the parts, taken individually, may well be greater than the whole. By taking advantage of some, but not all, of the capabilities of different software products we create "obvious" tools for our end users. And often-not always, but often-those are the best kinds of tools we could give them.
Appliances, by explicitly not trying to be general application tools, create configurations that are intuitively obvious to people with expertise that we value, but without the technical sophistication in an expertise that is ours. If we focus on giving these users "obvious" tools, we ultimately serve our common purpose-to make the information world richer, more useful and accessible to all.

"The XML Cuisinart: Making Users Happier and Markup Better with XML and SGML Appliances" was written by Chet Ensign, Matthew Bender & Company, Inc. (www.bender.com)

Matthew Bender is a member of OASIS,
the Organization for the Advancement of Structured Information Standards (www.oasis-open.org).

OASIS is a nonprofit, international consortium dedicated to accelerating the adoption of product-independent formats based on public standards. These standards include XML, SGML and HTML as well as others that are related to structured information processing. Members of OASIS are providers, users and specialists of the technologies that make these standards work in practice. 1998 Matthew Bender & Company, Inc. All rights reserved.

The information in this document is subject to change without notice and does not represent a commitment on the part of Matthew Bender or OASIS. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or for any purpose without the express written consent of Matthew Bender & Company.

Matthew Bender & Company, Inc.
Two Park Avenue
New York, NY10016  USA
Email: censign@bender.com

Hosted by
OASIS

Join OASIS


Bottom Gear Image    

FOCUS AREAS |XML-DEV |XML.org DAILY NEWSLINK |REGISTRY |RESOURCES |ABOUT

Copyright © 1993-2008 OASIS ®. All rights reserved.
This site is hosted byOASIS