XML and Web Services In The News - 14 July 2006

Provided by OASIS | Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by SAP


HEADLINES:

 Streaming Techniques for XML Processing - Part 3
 The GML Simple Feature Profile and You
 Introducing DB2 9: Application Development Enhancements
 What's on O'Reilly's Open Source Executive Radar?
 W3C Semantic Web Activity to Include GRDDL, Deployment Working Groups
 Thinking XML: Manage XML Data Sets for Security
 Family Tree of Schema Languages for Markup Languages (2006)
 WS-I Basic Security Profile Enhanced Logging Specification Requirements

Streaming Techniques for XML Processing - Part 3
Tobias Trapp, SAP Blog
In the first part of this weblog I introduced STX and mentioned validation techniques beyond W3C XML schema as an application. STX is a transformation language in XML syntax that works event-based. Like in XSLT we can define templates that represent rules. These rules are evaluated while processing the input XML document in a linear way. There is a working draft for the STX-specification and Joost, an STX processor running under Java, that implements most features of the specification. Now I want to put it all together for an application in data exchange. We do data exchange to link electronic business process by making data of one system available for another system. Usually we don't want to accept any data -- we only accept valid XML documents. Using schema languages like W3C we have several advantages but validation against an W3C XML Schema has also disadvantages: (1) W3C XML Schema usually can't perform many checks, think of numerical checks for example (2) An XML message can contain thousands of serialized business objects. Sometimes we don't want to reject a huge XML message because a single business object is error prone. (3) The output of an error protocol a validation is hard to interpret. We would like to have error codes that are readable or can be interpreted by computer programs. Validation languages like Schematron sometimes perform better because we can code rules and assertions. Unfortunately most Schematron implementations rely on XSLT so that you can't check huge XML documents. In this weblog I will present a self made prototype of an validation language STV (Streaming Validation for XML) that is based on STX, so I expect a good performance. On the other hand compared to Schematron there is a lack of expressiveness. But combined with W3C XML Schema it is an powerful tool. An STV transformation defines a set of rules that consist of assertions. An assertion can be coded with variables that have to be assigned first. Within a rule we can initialize buffers that can be appended and processed. We can use STV to code checks that will be performed an a certain XML document. Using an XSLT 2.0 transformation we generate an STX program that performs those checks.
See also: Part 1

The GML Simple Feature Profile and You
Sam Bacharach, Directions Magazine
Here's a problem that a growing number of geospatial software developers face: adding support for the Open Geospatial Consortium's (OGC) OpenGIS Geography Markup Language Encoding Specification (GML). Simply stated, GML is a standard to encode geometry and attributes using XML. Once the marketing department and user input confirm that supporting this standard is worth doing, programmers have to make it happen. Sure, programmers can do that; they can do anything. What's the big deal? The big deal is that the current GML specification runs 600 pages, details 1,000 tags (named objects), defines many of the geometries for describing features on the earth, and also supports the ability to encode coverages (including imagery), topology, time, metadata and dynamic features. GML was designed to be very broad and cover many needs. Recall, too, that to fully implement the specification, the programmers have to create software that will not only write out data in this form, but also can read it in. It's perhaps akin to requesting support for the 64 colors in the big crayon box. After some discussion, the group has decided to include just "simple features." In essence, only the vocabulary of "simple features" is supported in the profile. Officially, the profile includes "points, lines, and polygons (and collections of these), with linear interpolation between vertices of lines, and planar (flat) surfaces within polygons." The GML Simple Feature Profile and other GML profiles that will appear in the coming months and years offer ways to create the right tool for the job, thus making everyone's geospatial life not only more interoperable, but also easier.
See also: GML references

Introducing DB2 9: Application Development Enhancements
Rav Ahuja, IBM developerWorks
DB2 9 (formerly codenamed "Viper") provides numerous enhancements that simplify database application development, reduce development time, and improve developer productivity. In addition to providing a platform for robust enterprise applications, DB2 9 is also optimized for rapidly building a new breed of "Web 2.0" applications based on Web services, XML feeds, data syndication, and more. New enhancements for developers in IBM DB2 9 for Linux, UNIX, and Windows include a new Developer Workbench, deeper integration with .NET environments, rich support for XML and SOA environments, new drivers and adapters for PHP and Ruby on Rails, and new application samples. DB2 9 features pureXML technology that provides a unique set of capabilities for managing and serving XML data in a highly efficient manner. pureXML technology consists of a true XML data type (that stores XML in its hierarchical format rather than as a large object or stuffed into relational columns), XML indexing, XML text search support, SQL/XML and XQuery support, schema evolution flexibility, and numerous other capabilities. DB2 add-ins for Visual Studio contain full support for pureXML including the functionality to perform several actions, including: update, import, and export XML data; validate an XML database against a registered XML schema; register and unregister XML schemas; generate sample data based on an XML schema. The DB2 driver for PHP is also included as part of the Zend Core for IBM -- a seamless, out-of-the-box, easy to install, and supported PHP development and production environment tailored for DB2, IBM Cloudscape, or Apache Derby data servers. This article, the final article in a series introducing the features of DB2 9, provides an overview of these enhancements.
See also: XML and Databases

What's on O'Reilly's Open Source Executive Radar?
Matt Asay, InfoWorld
From the Open Source Executive Briefing for presentation at the O'Reilly Open Source Convention (OSCON) in Portland, Oregon: (1) Open Source as Assymmetric Competition -- For years the software industry has largely competed on the basis of symmetry: Oracle versus IBM in databases; BEA versus IBM in application servers; etc. Feature wars, price wars, but not true competition wars. That is, competing by playing a different game, with different rules. Open source enables an alternative battleground upon which to compete, with community, code, and culture the new competitive tools. (2) Operations as Advantage - In a world where software is delivered as a service, the quality of a company's operational infrastructure is a key source of competitive advantage. This a world where scale matters. (3) Open Data - Tim O'Reilly has long believed that "data is the Intel Inside" of Web 2.0 applications, the source of competitive advantage and lock in. As a consequence, he also believes that it won't be long before "open data" becomes as hot-button an issue as open source software has been. (4) Open Source and Web 2.0 - Everyone knows that Google, Yahoo!, and many other "Web 2.0" companies are built on top of open source, but how exactly do they use it?
See also: the OSCON web site

W3C Semantic Web Activity to Include GRDDL, Deployment Working Groups
Staff, W3C Announcement
W3C has announced the renewal of the Semantic Web Activity with the chartering of three new groups. The new Working Groups have been formed to work on Semantic Web deployment, extracting RDF from XML (e.g., to process microformats), education, and outreach. The W3C Advisory Committee also approved the continuing work in RDF data access, rules interchange, and health care and life sciences. The mission of the GRDDL Working Group is to complement the concrete RDF/XML syntax with a mechanism to relate other XML syntaxes (especially XHTML dialects or "microformats") to the RDF abstract syntax via transformations identified by URIs. The goal of the Semantic Web initiative is as broad as that of the Web: to create a universal medium for the exchange of data. It is envisaged to smoothly interconnect personal information management, enterprise application integration, and the global sharing of commercial, scientific and cultural data. Semantic Web technologies allow data to be shared and reused across applications, enterprises, and communities. The principal technologies of the Semantic Web fit into a set of layered specifications. The current components are the Resource Description Framework (RDF) Core Model, the RDF Schema language and the Web Ontology language (OWL). Building on these core components is a standardized query language, SPARQL, for RDF enabling the 'joining' of decentralized collections of RDF data. These languages all build on the foundation of URIs, XML, and XML namespaces.
See also: the W3C Semantic Web Activity

Thinking XML: Manage XML Data Sets for Security
Uche Ogbuji, IBM developerWorks
This article discuss principles for managing XML deployment to avoid vulnerabilities. The principles are quite simple, and yet not often enough discussed among XML professionals. XML involves an interesting perspective on data management, one which many developers find new and strange at first. XML offers flexible support for loosely structured and hierarchical data, but it also comes with inevitable performance problems. Unfortunately, developers often don't consider problems that can arise from XML's transparency. Many XML applications build on raw XML dumps from databases and legacy applications. Software vendors have encouraged this approach by making monolithic XML dumps the most prominent XML features in their repertoire. The promised ease with which you can transform one XML format to another using XSLT leads to a cavalier philosophy: "Throw it all out as XML, and pick through for what you need." The problem is that this leaves the door wide open to security issues, such as XPath injection attacks. Good design for security is not all that different from good design for software quality. The more you clump and tangle things together, the harder it is to spot and protect against problems. The increased transparency of XML data requires an increased transparency of application processing workflow in order to mitigate problems from security to state control. Applications that work with large dumps of XML data, and use complex processing to scratch needed information from these data sets, are vulnerable to a sophisticated attacker who takes advantage of your blind spots. If you design applications that package and exchange small, controlled chunks of XML data in manageable processing stages, you reduce these blind spots and make the application easier to maintain. Understanding the implications of transparent data flow is key to the security of XML-based applications.
See also: the W3C XML Processing Model Working Group

Family Tree of Schema Languages for Markup Languages (2006)
Rick Jelliffe, O'Reilly Blog
This diagram from Rick Jelliffe presents an evolutionary view of "schema" languages from 1986 through 2006: "I've updated my 1999 diagram "Family Tree of Schema Languages for Markup Languages" to include the innovation coming from OASIS, ISO, W3C and other places since [W3C] XSD came out. I put ASL in, but left out things like ISO Topic Map Constraint Language, OASIS CAM and all the little toy languages that have fed into ISO DSDL. It's also rearranged to clarify where all the parts of DSDL fit. There is also activity at the next level up: RDF and business rules, that don't fit here but are good. The diagram was quite popular when it came out, I think largely so that people could figure out which abbreviations and acronyms to ignore."
See also: XML Schema Languages

WS-I Basic Security Profile Enhanced Logging Specification Requirements
Ram Poornalingam (ed), WS-I Working Group Draft
This specification defines the enhanced logging facilities used by the WS-I Test Tools to support the Basic Security Profile. Verifying Basic Security Profile conformance requires SOAP stack instrumentation. This Enhanced Logging specification addresses why instrumentation is necessary and how it can be achieved. The document assumes that the reader understands the usage of the WS-I Interoperability testing tools version 2.0. The WS-I Testing Tools are designed to help developers determine whether their Web services are conformant with WS-I profile guidelines. Complete BSP verification of encrypted SOAP message emitted by the application is not possible. The reason being, Basic Profile verification, a requirement of the BSP, of encrypted messages is not possible. To achieve BP verification, the unencrypted form of the message is necessary. The profile conformance coverage that can be achieved, without adhering to this specification, is only at the surface level.
See also: the WS-I web site


XML.org is an OASIS Information Channel sponsored by BEA Systems, Inc., IBM Corporation, Innodata Isogen, SAP AG and Sun Microsystems, Inc.

Use http://www.oasis-open.org/mlmanage to unsubscribe or change an email address. See http://xml.org/xml/news_market.shtml for the list archives.


Bottom Gear Image