XML and Web Services In The News - 25 May 2005

XML Databases Evolve: Open Source Apache Xindice, Berkeley DB XML Set Solid Base for Content Management
Rick Grehan, InfoWorld
Two XML database libraries that form a base for larger content management applications are reviewed in this article: the ASF's Xindice and Sleepycat's Berkeley DB XML. Both provide standards-compliant XML document manipulation. In addition, both are powerful developer tools that place eye-opening XML document storage, query, and retrieval capabilities into the hands of eager programmers. Xindice arranges its storage in the form of 'collections': think of collections as subfolders in file systems; collections contain 'subcollections' to an arbitrary depth. The 'files' in this analogy are the actual XML documents. Querying and updating are typically applied collectionwide, although you can adjust the granularity to manipulate individual documents. Xindice's command-line tool is useful for jump-starting your database. The tool creates new collections, feeds XML documents into the collections, and even feeds whole subdirectory hierarchies into Xindice. It uses XPath for querying collections and XUpdate for updating them; it would be nice if XQuery were supported, as it provides for much richer querying. Berkeley DB XML sits on top of the venerable Berkeley DB database and inherits Berkeley DB's transaction support, crash recovery, deadlock detection, encryption, and other features. In fact, you can freely intermix DB XML databases and 'ordinary' Berkeley DB databases in the same application without having to link additional libraries into that application. Sleepycat has improved this latest version greatly, adding XQuery support, the ability to manage large files with per-node storage, and new documentation to flatten the learning curve.
See also: XML and Databases

TAG Opinion on XML Binary Format
W3C Technical Architecture Group (TAG), Interim Summary
The TAG believes that more detailed analysis is needed before a W3C Binary XML Recommendation is sufficiently justified. In particular, we suggest that a quantitative analysis is necessary. For at least a few key use cases, concrete targets should be set for the size and/or speed gains that would be needed to justify the disruption introduced by a new format. We further suggest that representative binary technologies be benchmarked and analyzed to a sufficient degree that such speed or size improvements can be reasonably reliably predicted before we commit to a Recommendation. We feel that introduction of a binary format would be an important development for those who might benefit from its size or speed, but also for those who might be impacted by its impact on interoperability and perspicuity. Therefore, in order to justify a potential new format, the TAG would like to see the above issues addressed. As stated above, we make no prediction as to whether such an analysis will ultimately confirm the need for Binary XML; if it does, we will be glad to support development of a Recommendation at the W3C.
See also: W3C XML Binary Characterization WG

Threat Modeling Web Applications
J.D. Meier, Alex Mackman, and Blaine Wastell, Microsoft MSDN Library
This guidance presents the patterns & practices approach to creating threat models for Web applications. Threat modeling is an engineering technique you can use to help you identify threats, attacks, vulnerabilities, and countermeasures that could affect your application. You can use threat modeling to shape your application's design, meet your company's security objectives, and reduce risk. Using a pattern- based approach lets you organize vulnerabilities in a more systematic and repeatable manner. It also helps you leverage community knowledge and avoid reinventing wheels. The type of application you are building, along with its scenario and context, are important aspects for relevancy. For example, vulnerabilities for an Internet-facing Web application may not be the same as vulnerabilities for a reusable component in an intranet line of business application.
See also: Application Security Standards

XML Pipelines Version 1.0 Working Draft Specificationy
R. Alexander Milowski, Smallx XML Infoset and Pipelining Project
This working draft describes the semantics of the Smallx XML Infoset and Pipelining Technology. It presents the pipeline, the vocabular used to write pipelines, and any vocabulary consumed by a pipeline step. It's a a specification language for describing each step and how they are chained together. The information is coded as an XML document called a pipeline document. A pipeline is a sequence of steps chained together that process an infoset. Each step consumes an infoset as its input and produces an infoset as its output. Each step in the pipeline is chained together so that a step's output infoset is feed as the input infoset to another step. As such, it is easy to consider many of the W3C recommedations as steps that could be in a pipeline. For example, an XInclude processor consumes an infoset and replaces each 'include' element with certain XML content as specified in the recommendation and the instance. The output is another infoset which includes the references. Similarly, many other recommendations like XSLT, XML Base, XML Schema, etc. can be considered steps. Smallx is a library and set of tools that is being developed to process XML infosets. The library contains a full compliment of technologies, including XPath and XSLT.
See also: Smallx XML Infoset & Pipelining Technology

XML Matters: Tips and Tricks for a Friendlier DOM
Dethe Elza, IBM developerWorks
The Document Object Model (DOM) is one of the most widely implemented tools for manipulating XML and HTML data, but it is rarely used to its full potential. By taking advantage of the DOM and extending it to be even easier to use, you gain a powerful tool for XML applications, including dynamic Web applications. The DOM has its limitations and faults, but it also has many advantages: It is built into many applications; it works the same whether you're using Java technology, Python, or JavaScript; it is a lot more convenient that using SAX; and with the massaging demonstrated above, it can be elegant and powerful to use. More applications are beginning to support the DOM, including Mozilla-based applications, OpenOffice, and XMetaL from Blast Radius. More specifications require and extend the DOM (like SVG), so it is not going away any time soon.
See also: W3C Document Object Model (DOM)

Canadian Broadcasting in XML
John E. Simpson, XML.com
Most discussions of XML in the context of cell phones and similar devices include references to WML, WAP, and various web services. This month's "XML Tourist" column talks about XML-based descriptions of cellular and PCS broadcast systems. Canada Industry Canada's Spectrum Management Services (similar to the FCC in the United States) requires broadcasters to periodically upload data about all cellular/PCS stations and antennas that they operate. The service providers can upload the data either in ASCII text form or as XML. The data file includes some general information about the company (account) providing this upload; contact name, telephone number, and so on. Each broadcaster might have only one "station" (the location of the broadcast facility) or more than one station (by far the norm); data is supplied about the antennas placed at any station. This Industry Canada data-collection effort, with its simple XML schemata and SPS stylesheet, fairly puts the lie to complaints that XML applications are too difficult to use and provide too little payback for small-scale data sets.

Tag Team: Tracking the Patterns of Supermarket Shoppers
Staff, Wharton Marketing Newsletter
Wharton marketing professor Peter S. Fader analyzes seemingly random zigzag pattern lines that represent a new dataset showing the paths taken by individual shoppers in an actual grocery store. The data, charted for the first time by radio frequency identification (RFID) tags located on consumers' shopping carts, has the potential to change the way retailers in general think about customers and their shopping patterns. Bradlow and doctoral candidate Jeffrey S. Larson analyze this RFID-captured grocery store data, focusing exclusively on travel patterns without regard to purchase behavior or merchandising tactics. The results, they conclude, challenge many long-standing perceptions of shopper travel behavior within a supermarket, including ideas related to aisle traffic, special promotional displays, and perimeter shopping patterns. PathTracker RFID tags were placed on the bottom of every grocery cart in a supermarket in the western U.S.; these tags emit a signal every five seconds that is received by receptors installed at various locations throughout the store. Once collected, the signals are used to chart the position of the grocery cart and record its route through the entire store. Linking specific travel patterns to individual purchase decisions may lead to an improved understanding of consumer motivations for purchasing certain items, and can shed light on the complementarity and substitutability of goods in ways that a more traditional 'market basket' analysis cannot capture.
See also: RFID Resources and Readings


Bottom Gear Image