|
XML and Web Services In The News - 25 May 2005
XML Databases Evolve: Open Source Apache Xindice, Berkeley DB XML Set Solid Base for Content Management
Rick Grehan, InfoWorld
Two XML database libraries that form a base for larger content management
applications are reviewed in this article: the ASF's Xindice and
Sleepycat's Berkeley DB XML. Both provide standards-compliant XML
document manipulation. In addition, both are powerful developer tools
that place eye-opening XML document storage, query, and retrieval
capabilities into the hands of eager programmers. Xindice arranges its
storage in the form of 'collections': think of collections as subfolders
in file systems; collections contain 'subcollections' to an arbitrary
depth. The 'files' in this analogy are the actual XML documents.
Querying and updating are typically applied collectionwide, although
you can adjust the granularity to manipulate individual documents.
Xindice's command-line tool is useful for jump-starting your database.
The tool creates new collections, feeds XML documents into the
collections, and even feeds whole subdirectory hierarchies into Xindice.
It uses XPath for querying collections and XUpdate for updating them;
it would be nice if XQuery were supported, as it provides for much
richer querying. Berkeley DB XML sits on top of the venerable Berkeley
DB database and inherits Berkeley DB's transaction support, crash
recovery, deadlock detection, encryption, and other features. In fact,
you can freely intermix DB XML databases and 'ordinary' Berkeley DB
databases in the same application without having to link additional
libraries into that application. Sleepycat has improved this latest
version greatly, adding XQuery support, the ability to manage large
files with per-node storage, and new documentation to flatten the
learning curve.
See also: XML and Databases
TAG Opinion on XML Binary Format
W3C Technical Architecture Group (TAG), Interim Summary
The TAG believes that more detailed analysis is needed before a W3C
Binary XML Recommendation is sufficiently justified. In particular, we
suggest that a quantitative analysis is necessary. For at least a few
key use cases, concrete targets should be set for the size and/or speed
gains that would be needed to justify the disruption introduced by a new
format. We further suggest that representative binary technologies be
benchmarked and analyzed to a sufficient degree that such speed or size
improvements can be reasonably reliably predicted before we commit to a
Recommendation. We feel that introduction of a binary format would be
an important development for those who might benefit from its size or
speed, but also for those who might be impacted by its impact on
interoperability and perspicuity. Therefore, in order to justify a
potential new format, the TAG would like to see the above issues
addressed. As stated above, we make no prediction as to whether such
an analysis will ultimately confirm the need for Binary XML; if it
does, we will be glad to support development of a Recommendation at
the W3C.
See also: W3C XML Binary Characterization WG
Threat Modeling Web Applications
J.D. Meier, Alex Mackman, and Blaine Wastell, Microsoft MSDN Library
This guidance presents the patterns & practices approach to creating
threat models for Web applications. Threat modeling is an engineering
technique you can use to help you identify threats, attacks,
vulnerabilities, and countermeasures that could affect your application.
You can use threat modeling to shape your application's design, meet
your company's security objectives, and reduce risk. Using a pattern-
based approach lets you organize vulnerabilities in a more systematic
and repeatable manner. It also helps you leverage community knowledge
and avoid reinventing wheels. The type of application you are building,
along with its scenario and context, are important aspects for
relevancy. For example, vulnerabilities for an Internet-facing Web
application may not be the same as vulnerabilities for a reusable
component in an intranet line of business application.
See also: Application Security Standards
XML Pipelines Version 1.0 Working Draft Specificationy
R. Alexander Milowski, Smallx XML Infoset and Pipelining Project
This working draft describes the semantics of the Smallx XML Infoset and
Pipelining Technology. It presents the pipeline, the vocabular used to
write pipelines, and any vocabulary consumed by a pipeline step. It's
a a specification language for describing each step and how they are
chained together. The information is coded as an XML document called a
pipeline document. A pipeline is a sequence of steps chained together
that process an infoset. Each step consumes an infoset as its input and
produces an infoset as its output. Each step in the pipeline is chained
together so that a step's output infoset is feed as the input infoset
to another step. As such, it is easy to consider many of the W3C
recommedations as steps that could be in a pipeline. For example, an
XInclude processor consumes an infoset and replaces each 'include'
element with certain XML content as specified in the recommendation
and the instance. The output is another infoset which includes the
references. Similarly, many other recommendations like XSLT, XML Base,
XML Schema, etc. can be considered steps. Smallx is a library and set
of tools that is being developed to process XML infosets. The library
contains a full compliment of technologies, including XPath and XSLT.
See also: Smallx XML Infoset & Pipelining Technology
XML Matters: Tips and Tricks for a Friendlier DOM
Dethe Elza, IBM developerWorks
The Document Object Model (DOM) is one of the most widely implemented
tools for manipulating XML and HTML data, but it is rarely used to its
full potential. By taking advantage of the DOM and extending it to be
even easier to use, you gain a powerful tool for XML applications,
including dynamic Web applications. The DOM has its limitations and
faults, but it also has many advantages: It is built into many
applications; it works the same whether you're using Java technology,
Python, or JavaScript; it is a lot more convenient that using SAX; and
with the massaging demonstrated above, it can be elegant and powerful
to use. More applications are beginning to support the DOM, including
Mozilla-based applications, OpenOffice, and XMetaL from Blast Radius.
More specifications require and extend the DOM (like SVG), so it is
not going away any time soon.
See also: W3C Document Object Model (DOM)
Canadian Broadcasting in XML
John E. Simpson, XML.com
Most discussions of XML in the context of cell phones and similar
devices include references to WML, WAP, and various web services. This
month's "XML Tourist" column talks about XML-based descriptions of
cellular and PCS broadcast systems. Canada Industry Canada's Spectrum
Management Services (similar to the FCC in the United States) requires
broadcasters to periodically upload data about all cellular/PCS
stations and antennas that they operate. The service providers can
upload the data either in ASCII text form or as XML. The data file
includes some general information about the company (account) providing
this upload; contact name, telephone number, and so on. Each broadcaster
might have only one "station" (the location of the broadcast facility)
or more than one station (by far the norm); data is supplied about the
antennas placed at any station. This Industry Canada data-collection
effort, with its simple XML schemata and SPS stylesheet, fairly puts
the lie to complaints that XML applications are too difficult to use
and provide too little payback for small-scale data sets.
Tag Team: Tracking the Patterns of Supermarket Shoppers
Staff, Wharton Marketing Newsletter
Wharton marketing professor Peter S. Fader analyzes seemingly random
zigzag pattern lines that represent a new dataset showing the paths
taken by individual shoppers in an actual grocery store. The data,
charted for the first time by radio frequency identification (RFID)
tags located on consumers' shopping carts, has the potential to change
the way retailers in general think about customers and their shopping
patterns. Bradlow and doctoral candidate Jeffrey S. Larson analyze this
RFID-captured grocery store data, focusing exclusively on travel
patterns without regard to purchase behavior or merchandising tactics.
The results, they conclude, challenge many long-standing perceptions
of shopper travel behavior within a supermarket, including ideas related
to aisle traffic, special promotional displays, and perimeter shopping
patterns. PathTracker RFID tags were placed on the bottom of every
grocery cart in a supermarket in the western U.S.; these tags emit a
signal every five seconds that is received by receptors installed at
various locations throughout the store. Once collected, the signals
are used to chart the position of the grocery cart and record its
route through the entire store. Linking specific travel patterns to
individual purchase decisions may lead to an improved understanding of
consumer motivations for purchasing certain items, and can shed light
on the complementarity and substitutability of goods in ways that a
more traditional 'market basket' analysis cannot capture.
See also: RFID Resources and Readings
|