XML and Web Services In The News - 18 August 2006

Provided by OASIS | Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by Innodata Isogen


HEADLINES:

 Meet the Specs: SML Models Complex IT Systems
 Atom License Extension
 W3C Working Drafts for Compound Document Framework and WICD Profiles
 Review: Inside IBM DB2 Viper
 Comparing XML Office Document Formats: Using XML Metrics
 OASIS Discussion List for Unstructured Operation Markup Language (UOML)
 VoiceXML 2.1: The Upgrades Are Few, But Significant
 GPL 3 Lawyer Has His Regrets

Meet the Specs: SML Models Complex IT Systems
Kane Scarlett, IBM developerWorks
This article is part of a "Meet the specs" series which focuses on various components of the Service Modeling Language specification. The Service Modeling Language specification is a proposed open standard that defines a modeling language complete with a set of constructs to help you model complex system hierarchies for components that manage such elements as configuration, monitoring, policy, health, capacity planning, and Service Level Agreements (SLA). One of the effects of SML is to increase the automation of management tasks, thereby reducing the need for a human to intervene in necessary adjustments. In today's multivendor environment, customers demand open, standards-based methods to accelerate integration of management software technologies, methods that include the ability to speed software deployments and to reduce the overhead caused by needing human intervention. You can use SML to capture knowledge about the different parts of complex IT systems and the constraints that these parts must satisfy in order for the IT system to function properly. Using SML generates several effects, one that addresses the needs of the autonomic computing adopter and one that is shared by everyone employing the language: (1) To increase automation of some management tasks (because the knowledge is captured in a machine-readable way); (2) To allow those with different expertise, who touch the system at different points in the life cycle, to efficiently collaborate by sharing relevant expertise and have these different contributions seamlessly integrated.
See also: SML

Atom License Extension
James M. Snell, IETF Internet Draft
A Last Call review for the experimental "Atom License Extension" specification released as an IETF Internet Draft. The document defines how Atom feed publishers can associate licenses with the metadata of a feed or entry. Licenses associated using these mechanisms might be machine readable and are intended to communicate the various rights and obligations others may have with regards to the associated Atom Feed or Entry. For feed elements, the term 'metadata' refers to the values and attributes of the author, category, contributor, generator, icon, id, link, logo, rights, subtitle, title, and updated elements, as defined by RFC 4287, as well as all extension elements appearing as children of the feed element and all elements appearing as children of the author and contributor elements. It also includes the selection and arrangement of entry elements contained by the feed but not the metadata or content of the entries themselves. For entry elements, 'metadata' refers to the values and attributes of the author, category, content, contributor, id, link, published, rights, source, summary, title, and updated elements, as well as all extension elements appearing as children of the entry element and all elements appearing as children of the author and contributor elements. Multiple 'license' link relations specifying different href attribute values are are considered to be mutually exclusive alternatives. For instance, if an entry specifies both a Creative Commons License and the General Public License (GPL), the entry is considered to be licensed as either Creative Commons OR GPL as opposed to Creative Commons AND GPL. If multiple license link relations are specified, each SHOULD contain a title attribute specifying a human-readable label for the license. Because entries contained within a feed may originate from other sources, 'license' link relations appearing within a feed apply to the metadata of the containing feed element only and do not extend over the metadata or content of the contained entries.
See also: Atom references

W3C Working Drafts for Compound Document Framework and WICD Profiles
Timur Mehrvarz, Daniel Appelquist et al., (eds), W3C Working Drafts
Addressing 'Last Call' comments, W3C's Compound Document Formats Working Group has released four updated Working Drafts: "Compound Document by Reference Framework", "WICD Core 1.0", "WICD Full 1.0", and "WICD Mobile 1.0". Web Integration Compound Document (WICD) is a device independent Compound Document profile based on XHTML, CSS and SVG. The "Compound Document by Reference Framework 1.0" specification defines a language-independent processing model for combining arbitrary document formats. Combining content delivery formats can often be desirable in order to provide a seamless experience to the user. For example, XHTML-formatted content can be augmented by SVG objects, to create a more dynamic, interactive and self adjusting presentation. A set of standard rules is required in order to provide this capability across a range of user agents and devices — for example, XHTML + SVG + MathML; XHTML + SMIL; XHTML + XForms; XHTML + VoiceML. The Compound Document Framework is language-independent. While it is meant to serve as the basis for integrating W3C's family of XML formats within its Interaction Domain (e.g., CSS, MathML, SMIL, SVG, VoiceXML, XForms, XHTML, XSL) with each other, it can also be used to integrate non-W3C formats with W3C formats or integrate non-W3C formats with other non-W3C formats. A Compound Document by inclusion combines XML markup from several namespaces into a single physical document. A number of standards exist, and continue to be developed, that are descriptions of XML markup within a single namespace. XHTML, XForms, VoiceXML, and MathML are some of the prominent examples of such standards, each having its own namespace. Each of these specifications focuses on one aspect of rich-content development. For example, XForms focuses on data collection and submission, VoiceXML on speech, and MathML on the display of mathematical notations.
See also: W3C Rich Web Clients Activity

Review: Inside IBM DB2 Viper
Sean McCown, InfoWorld
A technological marvel, IBM's new XML-powered server aims to change the face of database storage. IBM's newly released DB2 9.1 (previously code-named 'Viper') sheds many of the limitations of DB2 8, boosting performance, scalability, and security. But one feature in particular, the hybrid XML/relational engine, gives this Big Blue serpent its distinctive shape. For customers plunging into the new era of XML data management, Viper's innovations are tempting indeed. Native XML databases have been around for a while, but they require special libraries and aren't compatible with relational data. On the other hand, traditional relational databases have trouble dealing with hierarchical models and have only limited functionality in this area. So the major database vendors have been busy bolting XML capabilities onto their relational database products. IBM is no exception. IBM's technology outdoes its competitors, however, by preserving the native format of XML data. Five years in development, DB2's brand-new storage engine, dubbed pureXML, has one foot planted squarely in the world of relational databases and the other in that of XML databases. Instead of storing the XML as a BLOB (binary large object) or parsing it into relational key/value pairs, pureXML stores the XML file itself, with all its properties and hierarchical structure preserved. DB2 9.1 is an excellent database with groundbreaking features. Its new hybrid data engine offers true native XML storage, allowing the entire data store to be retrieved using either SQL or XQuery, interchangeably. In addition, the new Development Workbench wins big points with its support for building XQuery expressions. Other new features include row-level compression, scalability improvements, and advanced, granular access controls.

Comparing XML Office Document Formats: Using XML Metrics
Rick Jelliffe, O'Reilly Articles
Here are some XML metrics for a large document with almost 180,000 words, tables, lists, sidebars and some graphics. I chose a large document so that bootstrap effects would be minimized. I used the ODF v.1.0 specification, converting it from .SWX to .DOC and .ODT in Open Office 2.0, then converting the .DOC to .DOCX in Word 2007 beta. Then I used a COTS archiver to treat the ODT and DOCX files as ZIP archives, and extracted the XMLfiles containing the basic text and markup: content.xml (ODF) and word/document.xml (MSOOX). I chose to use a .SWX format because I didn't want to have any MS-dependencies in the data, .DOC being proprietary. I also resaved the document to .DOC, re-opened it and re-exported it to .DOCX and extracted the word/document.xml file. Resaving data is a good trick when doing data conversion, because it removes extraneous information or structures from the source: the first .DOC are what Open Office thinks .DOC looks like, the second .DOC is what Microsoft does things. The numbers seem to support the interpretation that beta MSOOX may be quite a bit less complex than ODF 1.1 at this stage, at least in the sense of using fixed structures more, and simpler in these sense of using fewer elements and attributes. ODF is flatter and has smaller filesize but seems to include more style headers than the MOOX does. The metrics indicate that the use of attributes may be significantly different between the two formats, for example for people looking at data conversion estimation. On the application level, Open Office loads the ODT file much faster than the Word 2007 beta loads the DOCX file. I'd wouldn't be surprised if MSOOX were easier to convert from (because of its regularity, scale and low complexity) while ODF were easier to convert into (because of its richness and flexibility), after the initial hurdle of converting anything to/from either of them was leapt.

OASIS Discussion List for Unstructured Operation Markup Language (UOML)
Staff, Announcement
OASIS announced that certain of its members members have requested a new discussion list regarding a possible new OASIS Unstructured Operation Markup Language (UOML) TC. The Unstructured Operation Markup Language specification "defines a universally representative unstructured document operating language through the abstract description of unstructured documents. The application program can realize document-related operation through UOML application, including document organization, page description, information safety, index and search, content extraction, fonts management, storage management, plug-in mechanism, and script description etc. UOML is expressed with standard XML, featuring mighty compatibility and openness." The proposers plan to contribute a draft version of the UOML to the TC when it is formed, for further review, discussion and refinement. The schema is suitable for operating written documents, including create, view, modify, query information that can be printed in paper, e.g. books, magazine, newspaper, office documents, maps, drawings, blueprints, but is not restricted to these kinds of documents. There are several commercial and free applications available based on UOML, with more currently under development. A standard for document operation will be of great utility to many users and software companies developing applications, and should be made available as soon as possible.
See also: the UOML web site

VoiceXML 2.1: The Upgrades Are Few, But Significant
Jeff Kusnitz, IBM developerWorks
The intent in VoiceXML 2.1 was to include a small number of features that were not included in VoiceXML 2.0, but were deemed significant enough to warrant documenting and standardizing. VoiceXML 2.1 has met these goals &emdash; it contains just eight features, of which only two are completely new; the other six are enhancements to existing VoiceXML elements. The VoiceXML Forum has more or less moved out of the specification-writing arena and is now focused primarily on marketing and education in the VoiceXML industry. To this effect, the Forum has put together a pair of certifications, one for VoiceXML platforms to certify they are compliant with the VoiceXML 2.0 specification and one for developers, to demonstrate that they have a well-balanced understanding of the VoiceXML 2.0 language. At the time this article was written, 17 VoiceXML platforms have been certified by the VoiceXML Forum, and more than 100 developers have taken and passed the Forum's developer certification examination. For those who are not necessarily interested in developing voice applications, but are instead interested in building voice application platforms (that is, a VoiceXML browser), the CATS group in the IETF is working on Media Resource Control Protocol version 2 (MRCP v2). MRCP v2 builds on the MRCP specification jointly developed by Cisco, Nuance, and SpeechWorks several years ago, which is now supported by most, if not all, of the major players in the industry, either by providing MRCP-based speech resources, or by providing VoiceXML browsers that "sit on top of" MRCP-based speech resource. In the voice application space, VoiceXML and related standards (SRGS, SSML, and so on) have emerged as "the" standards for building voice applications and platforms. Judging by the usefulness of the features released in VoiceXML 2.1, there's no reason to think that VoiceXML won't continue to be the standard for building voice applications for a long time coming.
See also: VoiceXML Platform Implementations

GPL 3 Lawyer Has His Regrets
Sean Michael Kerner, InternetNews.com
Few topics are as contentious in the Linux world as GPL 3, the general public license that details how many open source software programs can be used. During a session at the LinuxWorld conference here, Free Software Foundation counsel and co-author of the GPL 3 draft, Eben Moglen, explained the current status of the license draft discussion as well as some of the more contentious issues surrounding it. Three areas of disagreement remain and need to be sorted out within 65-80 days, according to Moglen. They include: patent clauses; digital rights management (DRM) policy, and compatibility terms. HP has already made known its objections to the patent clauses in draft 2. "It is the position of those that hold patents whose claims we have to take into account," Moglen said. He also admitted to mistakes in the draft process, and called it exhausting. The GPL version 3 effort began in January with a first draft that marked the first significant attempt at revising the GPL version 2 license, which has sat unchanged since June of 1991. Among the contentious issues in the first draft were approaches to DRM (Digital Rights Management) and patents. The second draft was released last month, after much community input and discussion. In terms of license compatibility, the issue is whether a GPL program may contain some file that contains restrictive terms that do not apply to the program as a whole. Moglen said the power of the additional permission sections in the latest draft means that, for example, parties who object to a particular requirement in the license can grant additional simple permissions that will change the bearing of the license on their project. For DRM, Moglen said the goal is to include the minimum necessary to protect the freedoms with software that the FSF is trying to guarantee.
See also: the FSF GPLv3 web site


XML.org is an OASIS Information Channel sponsored by BEA Systems, Inc., IBM Corporation, Innodata Isogen, SAP AG and Sun Microsystems, Inc.

Use http://www.oasis-open.org/mlmanage to unsubscribe or change an email address. See http://xml.org/xml/news_market.shtml for the list archives.


Bottom Gear Image