XML and Web Services In The News - 26 October 2006

Provided by OASIS | Edited by Robin Cover

This issue of XML Daily Newslink is sponsored by IBM Corporation



HEADLINES:

 A Meaningful Web for Humans and Machines, Part 1
 Last Call for Pronunciation Lexicon Specification (PLS) Version 1.0
 CellML Media Type Published as IETF Informational RFC
 Sun CEO Sets Open Source Java Time Frame
 Ellison Says Oracle Knows What's Best for Linux
 RELAX NG, the XML Schema Alternative
 On Web Standards, Libertarian Candidates Win


A Meaningful Web for Humans and Machines, Part 1
Lee Feigenbaum and Elias Torres, IBM developerWorks
The World Wide Web empowers human beings like never before. The sheer amount and diversity of information you encounter on the Web is staggering. You can find recipes and sports scores; you share calendars and contact information; you read news stories and restaurant reviews. You can constantly consume data on the Web that's presented in a variety of appealing ways: charts and tables, diagrams and figures, paragraphs and pictures. The Semantic Web is a mesh of information linked up in such a way as to be easily processed by machines, on a global scale. The Semantic Web extends the Web by using standards, markup languages, and related processing tools. Yet this content-rich, human-friendly world has a shadowy underworld. It's a world in which machines attempt to benefit from this wealth of data that's so easily accessible to humans. It's the world of aggregators and agents, reasoners and visualizations, all striving to improve the productivity of their human masters. But the machines often struggle to interpret the mounds of information intended for human consumption. In this series of articles we'll examine the existing and emerging technologies that enable machines and humans to easily access the wealth of Web-published data. We'll discuss the need for techniques that derive the human and machine-friendly data from a single Web page. Using examples, we will explore the relationships between the different techniques and will evaluate the benefits and drawbacks of each approach. The series will examine, in detail: a parallel Web of data representations, algorithmic approaches to generating machine-readable data, microformats, GRDDL, embedded RDF, and RDFa. In this first article, you meet the human-computer conflict, learn the criteria used to evaluate different technologies, and find a brief description of the major techniques used today to enable machine-human coexistence on the Web.

Last Call for Pronunciation Lexicon Specification (PLS) Version 1.0
Paolo Baggia (ed), W3C Technical Report
W3C's Voice Browser Working Group has released the second Last Call Working Draft for "Pronunciation Lexicon Specification (PLS) Version 1.0." The specification defines the syntax for specifying pronunciation lexicons to be used by Automatic Speech Recognition and Speech Synthesis engines in voice browser applications. The accurate specification of pronunciation is critical to the success of speech applications. Most Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) engines internally provide extensive high quality lexicons with pronunciation information for many words or phrases. To ensure a maximum coverage of the words or phrases used by an application, application-specific pronunciations may be required. For example, these may be needed for proper nouns such as surnames or business names. The Pronunciation Lexicon Specification (PLS) is designed to enable interoperable specification of pronunciation information for both ASR and TTS engines within voice browsing applications. The language is intended to be easy to use by developers while supporting the accurate specification of pronunciation information for international use. The language allows one or more pronunciations for a word or phrase to be specified using a standard pronunciation alphabet or if necessary using vendor specific alphabets. Pronunciations are grouped together into a PLS document which may be referenced from other markup languages, such as the Speech Recognition Grammar Specification (SRGS) and the Speech Synthesis Markup Language (SSML). In its most general sense, a lexicon is merely a list of words or phrases, possibly containing information associated with and related to the items in the list. Pronunciation lexicons are not limited to voice browsers, because they have proven effective mechanisms to support accessibility for persons with disabilities as well as greater usability for all users (for instance in screen readers and other user agents, such as multimodal interfaces).
See also: the W3C Voice Browser Activity

CellML Media Type Published as IETF Informational RFC
Andrew Miller (ed), IETF Approved RFC
The IETF RFC Editor announced that a new "Request for Comments" document is now available in online RFC libraries. "CellML Media Type", Request for Comment 4708, defines a method for exchanging mathematical models represented in a CellML Umbrella 1.0 compliant markup language. The CellML Umbrella format is a standardised markup meta-language for the interchange of mathematical models. The CellML Umbrella format provides a common base that is supported by a number of specific formats used in the interchange of mathematical models. The CellML Umbrella format provides enough information to determine which specific language is used to express the model. The syntax and semantics of the CellML Umbrella format are defined by "CellML Umbrella Specification 1.0". The CellML Umbrella format is an actual media format. Although CellML Umbrella documents contain elements in namespaces defined by other specifications such as RDF and MATHML, the elements in these namespaces do not contain sufficient information to define a mathematical model, and so CellML provides the information required to interconnect the different CellML components, as well as the information required to link CellML components to their metadata. As such, CellML Umbrella documents are more than just a collection of entities defined elsewhere, and so a new media type is required to identify CellML. As all well-formed CellML Umbrella documents are also well-formed XML documents, the convention described in Section 7 of RFC 3023 has been observed by use of the '+xml' suffix. The information in CellML Umbrella documents cannot be interpreted without understanding the semantics of the XML elements used to mark up the model structure. Therefore, the application top-level type is used instead of the text top-level type.
See also: the IETF RFC Editor web site

Sun CEO Sets Open Source Java Time Frame
Paul Krill, InfoWorld
Demonstrating a perhaps more aggressive path than anticipated, Sun Microsystems is set to announce the open-sourcing of the core Java platform within 30 to 60 days, Sun President and CEO Jonathan Schwartz said at the Oracle OpenWorld conference on Wednesday morning [2006-10-25]. The core platform encompasses the Standard Edition of Java, and it will be offered via an open source format under an OSI (Open Source Initiative)-approved license, likely the same one used for Sun's open source Solaris OS. Sun officials, including Rich Green, Sun executive vice president for software, have talked about Java being offered via open source in stages later this year and into 2007. Parts of it, such as the Java Enterprise Edition, already are available via open source, with the GlassFish application server constituting the open source enterprise variant. Schwartz also offered perspectives on a variety of technology trends. He noted that everyday, new devices besides PCs and servers are being networked, and he cited an amusement park's usage of RFID tags in dolls given to children. The dolls are then used to track children and the formation of lines at the park. Oil rigs also are being outfitted, he said. Sun focuses on customers who see technology as offering a competitive advantage rather than just as a cost center. The company through its Sun Fire servers has become the industry's fastest-growing provider of x64 servers, he said: "Special-purpose systems are dying out; at this point, you can take general-purpose infrastructure and replace almost all custom infrastructure."

Ellison Says Oracle Knows What's Best for Linux
Renee Boucher Ferguson, eWEEK
Asserting that to win acceptance in big companies Linux requires enterprise-grade support, Oracle CEO Larry Ellison said his company would provide full support for Red Hat Linux. While Oracle will remove the Red Hat trademarks from the Linux it distributes, Ellison denied that this would in any way "fragment" the Linux Market. Oracle needs to provide enhanced support for Linux, he contends, because enterprise customers are holding back on implementing Linux with Oracle's Grid computing system because of serious support issues. "The most serious issue: true enterprise support," said Ellison to a packed-to-the-rafters audience. "If a customer has an issue with the Linux kernel and a vendor fixes the bug, quite often it's not fixed in the version the customer is running. It's fixed in the future version that's about to come out. You have to upgrade to get the fix. That really is not acceptable to our large customers." What Oracle's support for Red Hat, now under the aegis of Oracle's Unbreakable Linux program, is not supposed to be is a death knell for Red Hat, according to Ellison. For premier support — a level of service that Red Hat doesn't even offer, again according to Ellison, Oracle is charging $1,200 per system per year for two processors, and $2,000 for larger systems. For that package users get two key features: back-porting and indemnification. Oracle's offer to back- port bug fixes means it will fix bugs in the version users are on, regardless of whether it's the latest version. The indemnification clause means Oracle takes on any legal claims users may be subject to from companies like the SCO Group — in whose wake indemnification seems critical.

RELAX NG, the XML Schema Alternative
Ed Tittel, SearchWebServices.com
Those who've been knocking around the XML or Web development communities for any length of time have come across the work of James Clark, if not evidence of the man himself. His is a pretty fascinating story, which you can read more about on his bio page. For the purposes of our discussion, let's just say he's been around the SGML and XML communities since the early 90s and has contributed a substantial and extremely useful body of work. His highlight reel includes an open source SGML parser he wrote in C, acting as technical lead during the development of the XML 1.0 Recommendations, enhancing SGML to make XML a formal subset of SGML, development of expat, "the world's fastest XML parser," co-authoring the XSL submission, editing the XSLT and XPath Recommendations, and last and most relevant, developing TREX, a schema language for XML that pre-dated (and many believe outclasses) XML Schema. In fact, TREX plus another alternate XML schema language named RELAX, gave rise to RELAX NG (where NG stands for Next Generation), which is an OASIS development project and is now also enshrined as ISO/IEC standard 19757-2. Why bother with RELAX NG when there's XML Schema, a W3C recommendation also available? Three short answers explain why this markup language is worth digging into: (1) The language is designed to be simple and easy to learn, which many would observe is not the case for XML Schema. (2) The language includes both an XML syntax and a compact non-XML syntax. It also supports XML namespaces and does not change the information set of any XML document it touches. (3) It works with XML Schema Datatypes (just as does XML Schema itself) and can draw on the expressive power of that markup language.
See also: RELAX NG as DSDL part 2

On Web Standards, Libertarian Candidates Win
Declan McCullagh and Anne Broache, CNET News.com
The Libertarian Party hasn't had much success in [US] national elections: It garnered just 353,265 votes in the 2004 presidential race and boasts precisely zero elected representatives in the U.S. Congress. But a survey of political sites by CNET News.com shows that Libertarian candidates are ahead in the race to ensure their pages comply with a widely accepted litmus test for good Web design, which can aid mobile device users and people with visual disabilities. Of approximately 1,000 campaign Web sites surveyed two weeks before the November 7, 2006 election, only 35 passed the validation tests created by the World Wide Web Consortium, or W3C. Seven of those were created by Libertarian candidates, some of whom have degrees in computer or electrical engineering or count themselves as free-software aficionados. Republicans came in a close second. Call the Libertarians the political party of geeks, for geeks. "I'll be the first to admit that we do have a lot of geeks in the party, and I'm one of them," Shane Cory, executive director of the national Libertarian Party, said Wednesday. To compile a list of campaign Web sites to review, News.com used a database of U.S. House of Representatives and U.S. Senate candidates created by Voter Information Services, a nonprofit and nonpartisan group. Then we wrote a computer program to test each campaign Web site against a "validator" maintained by the World Wide Web Consortium, or W3C, and record and then sort the results...


XML.org is an OASIS Information Channel sponsored by BEA Systems, Inc., IBM Corporation, Innodata Isogen, SAP AG and Sun Microsystems, Inc.

Use http://www.oasis-open.org/mlmanage to unsubscribe or change an email address. See http://xml.org/xml/news_market.shtml for the list archives.


Bottom Gear Image