XML and Web Services In The News - 19 May 2006

Provided by OASIS | Edited by Robin Cover

This issue of XML.org Daily Newslink is sponsored by SAP


HEADLINES:

 Search as the User Interface for the Rest of Us
 Search Considered Integral
 Web Inventor Sees His Brainchild Ready for Big Leap
 First Public Draft of Open XML is Published by Ecma
 SOA Product Review: ActiveBPEL 2.0 from Active Endpoints
 RDFa Primer 1.0: Embedding RDF in XHTML
 UnREST over WS-* and Other "Enterprisey" Things

Search as the User Interface for the Rest of Us
Guy Creese, DMReview.com
Google's announcement of Google OneBox on April 19, 2006, is one more tremor signaling a tectonic plate shift that will have an impact on the market landscape for years to come. In a sentence, OneBox lets employees use the same familiar Google interface that they use to search the Web to access information within business applications. For example, they can pull up a purchase order stored in an Oracle application via Google, rather than using the typical Oracle application interface. This ability is going to have a far-reaching impact on business intelligence (BI) applications and interfaces, and is, therefore, worth talking about in detail. The Google OneBox search appliance is a physical box that enterprises install behind their firewall. The appliance indexes information by crawling corporate repositories, and then lets users search for it via the Google search box. Users log on to the system like they do any other business application and, therefore, can see only that information that they're allowed to see. OneBox does this by supporting native LDAP authentication as well as the Google Search Authorization Service Provider Interface. The system can support up to 3 million documents on a single server, and up to 25 queries per second. Users narrow their search via keywords. For example, if they interested in purchase order number 060875, they type in the string "po 060875." This will retrieve that PO's information (e.g., PO total, supplier name, buyer name, payment terms, carrier, freight terms) from an enterprise resource planning (ERP) system and display it in the Google interface. Google OneBox can retrieve information from systems such as Oracle Financials, Cisco Call Manager, Cognos, SAS, salesforce.com, Employease and NetSuite. OneBox uses a REST-based application programming interface (API) to make a call to the application; the application needs to reply via XML.

Search Considered Integral
Ryan Barrows, Jim Traverso, Morgan Stanley; ACM Queue
A combination of tagging, categorization, and navigation can help end-users leverage the power of enterprise search. "Most corporations must leverage their data for competitive advantage. The volume of data available to a knowledge worker has grown dramatically over the past few years, and, while a good amount lives in large databases, an important subset exists only as unstructured or semi-structured data. Without the right systems, this leads to a continuously deteriorating signal-to-noise ratio, creating an obstacle for busy users trying to locate information quickly. Three flavors of enterprise search solutions help improve knowledge discovery: Raw engines, Intranet appliances, and desktop search. All three search solutions are likely to show up in an enterprise with massive information management challenges. At Morgan Stanley we have had a group working on intranet search and raw search engines for more than five years and have been experimenting with desktop search since 2004. A fourth piece of this puzzle has yet to be popularized: combining tagging, categorization, and navigation to improve the overall experience for the end user. This piece is needed, as machine-relevance algorithms alone are not good enough to produce high-quality intranet results. In this article we discuss what such a system looks like, with a particular emphasis on solving enterprise-scale problems."

Web Inventor Sees His Brainchild Ready for Big Leap
Lucas van Grinsven, eWEEK
The World Wide Web is on the cusp of making its next big leap to become an open environment for collaboration and its inventor said he has not been so optimistic in years. Still, Tim Berners-Lee, the Briton who invented and then gave away the World Wide Web, warns that Internet crime and anti-competitive behavior need to be fought tooth and nail. Currently the director of the World Wide Web Consortium (W3C) which is a U.S.-headquartered forum of companies and organizations to improve the Web, Berners-Lee is only now realising his early vision of a two-way Web where people can easily work together on the same page and where the content on a page can be recognized by computers. Google Maps, whose geographic maps turn up on other sites combined with services, and photo sharing site Flickr, where members comment on each other's postings and developers can use the pictures to create new applications, are early examples of how Web sites can combine data from different sources. A new query language, SPARQL (pronounced "Sparkle"), is designed to make Web pages easier for machines to read, allowing all sorts of different data to be put to work on the Web. Berners-Lee is also concerned about how some Internet providers in the United States have started to filter data, giving priority to premium data for which the operator receives an additional fee. They can do this, because they own the cables, the service, the portals and other key applications. "The public will demand an open Internet... I tried then to make the Web technology, in turn, a universal, neutral, platform... It is of the utmost importance that, if I connect to the Internet, and you connect to the Internet, that we can then run any Internet application we want, without discrimination as to who we are or what we are doing."
See also: on Net Neutrality

First Public Draft of Open XML is Published by Ecma
Andy Updegrove, Consortium Standards Bulletin
"The first draft of Open XML has been posted for public viewing at the Ecma Website, five months after Ecma accepted Microsoft's submission of what was then less-appealingly referred to as the XML Reference Schema. The most detailed source of information I've found so far is this page at Brian Jones' blog, which focuses heavily on XML in Office and the development work on Open XML file formats. Brian is a Microsoft Office Program Manager who has frequently provided public comments on the progress and purpose of Open XML. According to Jones, the specification is now 4,000 pages long, roughly twice its original size, and has been the subject of weekly two hour conference calls and three day F2F meetings about every two months. A key decision in the creation of any standard is the level of detail to standardize upon. If the level is too low, then interoperability will suffer, because much of what is needed to make the product useful is left up to the vendor, and those additional features will be proprietary. But if the level is too high, then only clones can be built, which is good for interoperability, but death to innovation. It can also be death to competition, since if (as in this case) the standard is based on an existing product, then no would-be competitor would ever expect to be able to catch up with the incumbent, much less compete on price. The [1.3] the specification may be fine and even perhaps very good for making it possible for end users and external developers to do more with Office documents, but it may be useless for creating true competition in the marketplace..."
See also: Brian Jones' blog

SOA Product Review: ActiveBPEL 2.0 from Active Endpoints
Paul Maurer, Enterprise OpenSource
Business process execution Language support or BPEL is at the top of every enterprise SOA punch list. It's an XML-based language designed to support long-running complex business transactions in the form of orchestrated Web Service interactions. Like most XML formats, you wouldn't want to construct and debug a process of any complexity by hand and an "engine" is required to recognize and execute BPEL. This is where the tool vendors come in and Active Endpoints, Inc. has a design tool and engine product combination that we'll cover in this review. ActiveBPEL Designer is a world-class visual environment for working with BPEL-based processes. ActiveBPEL Designer is built on the seemingly ubiquitous Eclipse extensible development platform and has an interface with a clear and logical layout. The "Navigator" tab in the upper left region displays a hierarchical view of projects, folders, and files in the workspace. To the right of the Navigator is the "Web References" tab. This tab contains a registry of namespaces, messages, type definitions, and sample data, used in BPEL processes. It's populated automatically as WSDL files and XML schemas are added to the workspace. The "Web References" tab has many features for slicing and dicing the view, but my favorite is its ability to drag Web references and drop then in the process editor canvas. Active Endpoints has created an excellent BPEL design tool and execution engine that is freely downloadable, well documented and has good community support. There's virtually no cost of entry and enterprise reliability features can be purchased for mission-critical applications.
See also: BPEL references

RDFa Primer 1.0: Embedding RDF in XHTML
Ben Adida and Mark Birbeck, Updated W3C Working Draft
W3C has announced the publication of a new working draft for "Embedding RDF in XHTML," produced by the RDF in XHTML Task Force (HTML) of the W3C Semantic Web Best Practices and Deployment Working Group (SWBPD) and the W3C HTML Working Group. "Current web pages, written in HTML, are chock-full of structured data. When publishers can express the document's metadata, and when tools can read it, a new world of user functionality becomes available, letting users copy and paste structured data between applications and web sites. An event on a web page can be directly imported into a user's desktop calendar. A license on a document can be automatically detected so that the user is informed of his rights automatically. A photo's creator, camera setting information, resolution, and topic can be published to enable structured search and sharing. RDFa is a syntax for expressing such metadata in XHTML. The rendered, hypertext data of XHTML is reused by the RDFa markup, so that publishers don't repeat themselves. The underlying abstract metadata representation is RDF, which lets publishers build their own metadata vocabulary, extend others, and evolve their vocabulary with maximal interoperability over time. The metadata is closely tied to the data it describes, so that rendered data can be copied and pasted along with its relevant structure."
See also: W3C Semantic Web

UnREST over WS-* and Other "Enterprisey" Things
Anne Thomas Manes, Blog
The single, most important feature that inspires my enthusiasm about WS-* is that it has universal support from all the major vendors. The technology has become pretty much pervasive (although the industry is still stuggling with interoperability issues), and there's a huge ecosystem of vendors and products and tools that support it. WS-* also has some really interesting innovations (separation of header and body, the composability of the various SOAP extensions, policy-based management and control via intermediaries, etc), which I think make it particularly well-suited for enterprise-class service-oriented application systems. There. I've qualified it. WS-* is enterprisey. But is that really such a bad thing? If you need comprehensive enterprise-class semantics (security, reliability, session management, transactions, etc), then it really helps to use an enterprisey middleware system. But I can't ignore the debate between REST and WS-*. I'm a huge proponent of the KISS principle. So I don't recommend using WS-* for all service interactions. If an application doesn't require enterprisey infrastructure semantics, then it's much more appropriate to use a simpler middleware system, such as "plain old XML" (POX) over HTTP. In fact, for applications that require Internet scalability (e.g., mass consumer-oriented services), POX is a much better solution than WS-*.


XML.org is an OASIS Information Channel sponsored by Innodata Isogen and SAP.

Use http://www.oasis-open.org/mlmanage to unsubscribe or change an email address. See http://xml.org/xml/news_market.shtml for the list archives.


Bottom Gear Image