XML and Web Services In The News - 16 November 2006

Provided by OASIS | Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by Sun Microsystems, Inc.



HEADLINES:

 OASIS Forms Technical Committee to Standardize Content Analytics
 Google, Yahoo, Microsoft Partner on Open Source Search Protocol
 Grid Services Via Open Grid Services Architecture (OGSA)
 Managing SOA Semantics Using Ontologies and Supporting W3C Standards
 Timed Text Authoring Format: Distribution Format Exchange Profile (DFXP)


OASIS Forms Technical Committee to Standardize Content Analytics
Staff, OASIS Announcement
OASIS has announced the formation of a new effort aimed at standardizing semantic search and content analytics. The work of the OASIS Unstructured Information Management Architecture (UIMA) Technical Committee will advance a common method for meaningfully accessing data contained in text such as e-mails, blog entries, news feeds, and notes, as well as in audio recordings, images, and video. The OASIS work will be complemented by an Apache Software Foundation incubator project for developing UIMA-based open source software. "Unstructured information," according to the TC's charter, is "all the information that has not been carefully encoded in enterprise databases but rather exists as natural language text, speech or video. These applications rely on the rapid assignment of semantics to huge volumes of unstructured content exactly so that this content may be structured and exploited by traditional application infrastructure (e.g., database management systems, knowledgebase systems, information retrieval systems, etc.). Examples include natural language documents, email, speech, images and video. It is information that was not specifically encoded for machines to process but rather authored by humans for humans to understand. We say it is 'unstructured' because it lacks explicit semantics ("structure") required for applications to interpret the information as intended by the human author or required by the end-user application." David A. Ferrucci of IBM, convener of the OASIS UIMA Technical Committee: "UIMA will enable the productive use of content that exists as natural language text, speech, and video. By assigning semantics to this content, UIMA will allow information to be exploited by database management systems, information retrieval systems, and other traditional application infrastructure." OASIS will refine and finalize a set of UIMA specifications based on an initial contribution from IBM with input from DARPA, Carnegie Mellon University, Columbia University, Stanford University, University of Massachusetts-Amherst, MITRE Corporation, and Science Applications International Corporation (SAIC).
See also: the IBM announcement

Google, Yahoo, Microsoft Partner on Open Source Search Protocol
Juan Carlos Perez, InfoWorld
Strange bedfellows Google, Microsoft, and Yahoo have partnered to simplify how webmasters and online publishers submit their sites' content for indexing in the companies' search engines. In a rare collaborative effort, Google, Microsoft and Yahoo, which compete directly in Internet search and other online services, plan to announce on Thursday their support for the open source, Sitemap Protocol based on XML (Extensible Markup Language). This protocol, which Google created and has been using for about 18 months, will be adopted by Yahoo effective Thursday, and the three companies will collaborate to extend and enhance it. Yahoo has been using another protocol, which it will continue to support. Microsoft will stop using its current protocol after it implements Sitemap Protocol in its search engine in early 2007. A site map is a file that webmasters and publishers put on their sites to guide the search engines' automated Web crawlers in properly indexing their Web pages. Site maps are particularly useful in highlighting to crawlers the dynamic Web content that is served up on the fly. Crawlers generally index content contained in static Web pages without problems but often they have difficulty with dynamic content, such as the one that is generated as a result of a search query. A site map can be formatted using various protocols, but this means more work for webmasters and publishers, which is why Google, Microsoft and Yahoo are throwing their weight behind the Sitemap Protocol to promote it as a standard.
See also: XML Sitemap Format

Grid Services Via Open Grid Services Architecture (OGSA)
Daniel Rubio, searchWebServices.com
With XML serving as a catalyst, Web services have allowed many organizations to expose as well as access many resources that in years past might have been considered difficult to share. Grids initially emerged in research and academic projects to confront the limitations of standalone computing power, allowing applications to tap processing cycles on machines located across a network. OGSA is the creation of the Globus Alliance , a community of organizations and individuals dedicated to advancing grid technologies. Since its inception more than a decade ago, the Globus Alliance has been a pioneer in enabling grid applications through its flagship toolkit Globus, an open-source project that is behind many successful grid projects, such as: Open Science Grid, Earth System Grid, and Tera Grid. In its current state, OGSA's building blocks are based on the Web Services Resource Framework (WSRF) along with other WS-* specifications like WS-Notification, WS-Security and WS-Addressing. Version 3 of the Globus toolkit is the earlier OGSI based implementation, while Version 4 of the Globus toolkit is based on the more recent WS-* standards, both providing support for Java-, C- and Python-based grid Web services. Like any other software application, grids come with their own share of unique requirements which have to be dealt with from the outset. Prior to the emergence of OGSA, such choices required organizations to take the plunge on a particular software suite to achieve results with vendor or platform lock-in being the norm in grid development. On the other hand, WS-* standards have achieved a true consensus among Web services vendors, from ESB producers to platform initiatives like Project Tango for Java or WCF for .NET, many organizations have rallied around the use of such standards.

Managing SOA Semantics Using Ontologies and Supporting W3C Standards
Dave Linthicum, InfoWorld
[Part 2 on RDF and OWL] Resource Description Framework (RDF), a part of the XML story, provides interoperability between applications that exchange information. RDF is another Web standard that's finding use everywhere, including SOA. RDF was developed by the W3C to provide a foundation of metadata interoperability across different resource description communities and is the basis for the W3C movement to ontologies such as the use of Web Ontology Language (OWL). RDF uses XML to define a foundation for processing metadata and to provide a standard metadata infrastructure for both the Web and the enterprise. The difference between the two is that XML is used to transport data using a common format, while RDF is layered on top of XML defining a broad category of data. When the XML data is declared to be of the RDF format, applications are then able to understand the data without understanding who sent it. RDF benefits SOA in that it supports the concept of a common metadata layer that is sharable throughout an enterprise or between enterprises. Thus, RDF can be used as a common mechanism for describing data within the SOA problem domain. Using these Web-based standards as the jumping-off point for ontology and SOA, it's possible to define and automate the use of ontologies in both intra- and intercompany SOA domains. Domains made up of thousands of systems, all with their own semantic meanings, bound together in a common ontology that makes short work of SOA and defines a common semantic meaning of data. Extending from the languages, we have several libraries available for a variety of vertical domains, including financial services and e-Business. We also have many knowledge editors that now exist to support the creation of ontologies, as well as the use of natural-language processing methodologies. In other words, we have a standards set of tools to define, manage, and share application semantics from domain to domain, including from the enterprise to the Internet, and back. It's time we started to use them.

Timed Text Authoring Format: Distribution Format Exchange Profile (DFXP)
Mike Dolan, Geoff Freed, Sean Hayes (et al., eds), W3C Technical Report
W3C has announced the advancement of the "Timed Text (TT) Authoring Format 1.0 - Distribution Format Exchange Profile (DFXP)" specification to the level of Candidate Recommendation. Members of the Timed Text (TT) Working Group expect to request that the Director advance this document to Proposed Recommendation once the Working Group has, for each test in the DFXP 1.0 Test Suite, demonstrated support by two interoperable implementations. The Distribution Format Exchange Profile is intended to be used for the purpose of transcoding or exchanging timed text information among legacy distribution content formats presently in use for subtitling and captioning functions. "Imed Text" is textual information that is intrinsically or extrinsically associated with timing information. Typical applications of timed text are the real time subtitling of foreign-language movies on the Web, captioning for people lacking audio devices or having hearing impairments, karaoke, scrolling news items or teleprompter applications. The Timed Text Authoring Format (TT AF) Distribution Format Exchange Profile (DFXP) provides a standardized representation of a particular subset of textual information with which stylistic, layout, and timing semantics are associated by an author or an authoring system for the purpose of interchange and potential presentation. DFXP is expressly designed to meet only a limited set of requirements established by the "Timed Text (TT) Authoring Format 1.0 Use Cases and Requirements" docuument. In particular, only those requirements which service the need of performing interchange with existing, legacy distribution systems are satisfied. In addition to being used for interchange among legacy distribution content formats, DFXP content may be used directly as a distribution format, providing, for example, a standard content format to reference from a 'text' or 'textstream' media object element in a SMIL 2.1 document. Comments are welcome through 16-February-2007. W3C encourages developers to implement the specification and share their experience with the Synchronized Multimedia Working Group.
See also: the Interop report


XML.org is an OASIS Information Channel sponsored by BEA Systems, Inc., IBM Corporation, Innodata Isogen, SAP AG and Sun Microsystems, Inc.

Use http://www.oasis-open.org/mlmanage to unsubscribe or change an email address. See http://xml.org/xml/news_market.shtml for the list archives.


Bottom Gear Image