JEX - Various links

Organizations

The Extensible Markup Language (XML) is the universal format for structured documents and data on the Web. XML in 10 points explains XML briefly. The base specifications are XML 1.0, W3C Recommendation Feb '98, and Namespaces, Jan '99.

xml.com

Our mission at XML.com is to help you discover XML and learn how this Internet technology can solve real-world problems in information management and electronic commerce. XML.com features a rich mix of information and services for the XML community. The site is designed to serve both people who are already working with XML and those HTML users who want to "graduate" to XML's power and complexity. A core feature of the site is the Annotated XML Specification, created by Tim Bray, co-editor of XML 1.0 and a contributing editor for XML.com.

Microsoft XML Page

Welcome to the XML Developer Center

Robin Cover's XML Web Page

The XML Cover Pages - Extensible Markup Language (XML) By: Robin Cover

OASIS

OASIS, the Organization for the Advancement of Structured Information Standards, is a non-profit, international consortium that creates interoperable industry specifications based on public standards such as XML and SGML. OASIS members include organizations and individuals who provide, use and specialize in implementing the technologies that make these standards work in practice.

Cafe au Lait

Cafe con Leche XML News and Resources

W3C Specs

XML 1.0

The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML

Annotated XML 1.0 specification

If you want to understand XML, you have to read the specification. However, to really get inside the specification and understand why it says what it does, you need an expert guide. Tim Bray, co-editor of the XML 1.0 specification, shares his knowledge and insights about XML, SGML and the working group behind the specification in this annotated version of the document.

XML Namespaces

XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references.

CSS Level 1

This document specifies level 1 of the Cascading Style Sheet mechanism (CSS1). CSS1 is a simple style sheet mechanism that allows authors and readers to attach style (e.g. fonts, colors and spacing) to HTML documents. The CSS1 language is human readable and writable, and expresses style in common desktop publishing terminology. One of the fundamental features of CSS is that style sheets cascade; authors can attach a preferred style sheet, while the reader may have a personal style sheet to adjust for human or technological handicaps. The rules for resolving conflicts between different style sheets are defined in this specification.

CSS Level 2

This specification defines Cascading Style Sheets, level 2 (CSS2). CSS2 is a style sheet language that allows authors and users to attach style (e.g., fonts, spacing, and aural cues) to structured documents (e.g., HTML documents and XML applications). By separating the presentation style of documents from the content of documents, CSS2 simplifies Web authoring and site maintenance. CSS2 builds on CSS1 (see [CSS1]) and, with very few exceptions, all valid CSS1 style sheets are valid CSS2 style sheets. CSS2 supports media-specific style sheets so that authors may tailor the presentation of their documents to visual browsers, aural devices, printers, braille devices, handheld devices, etc. This specification also supports content positioning, downloadable fonts, table layout, features for internationalization, automatic counters and numbering, and some properties related to user interface.

HTML 4.0

This specification defines the HyperText Markup Language (HTML), the publishing language of the World Wide Web. This specification defines HTML 4.01, which is a subversion of HTML 4. In addition to the text, multimedia, and hyperlink features of the previous versions of HTML (HTML 3.2 [HTML32] and HTML 2.0 [RFC1866]), HTML 4 supports more multimedia options, scripting languages, style sheets, better printing facilities, and documents that are more accessible to users with disabilities. HTML 4 also takes great strides towards the internationalization of documents, with the goal of making the Web truly World Wide.

XSL

XSL is a language for expressing stylesheets. It consists of two parts: a language for transforming XML documents, and an XML vocabulary for specifying formatting semantics. An XSL stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary.

XLinks

This specification defines the XML Linking Language (XLink), which allows elements to be inserted into XML documents in order to create and describe links between resources. It uses XML syntax to create structures that can describe links similar to the simple unidirectional hyperlinks of today's HTML, as well as more sophisticated links.

XPointers

This specification defines the XML Pointer Language (XPointer), the language to be used as the basis for a fragment identifier for any URI reference that locates a resource whose Internet media type is one of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity. XPointer, which is based on the XML Path Language (XPath), supports addressing into the internal structures of XML documents. It allows for examination of a hierarchical document structure and choice of its internal parts based on various properties, such as element types, attribute values, character content, and relative position.

XLink Design Principles

This document explicates the design principles behind the XLink language and its related XPointer language

XML Data

Schemas define the characteristics of classes of objects. This paper describes an XML vocabulary for schemas, that is, for defining and documenting object classes. It can be used for classes which as strictly syntactic (for example, XML) or those which indicate concepts and relations among concepts (as used in relational databases, KR graphs and RDF). The former are called "syntactic schemas;" the latter "conceptual schemas."

XMLR Query Language

The availability of large amounts of data on the Web raises several issues that the XML standard does not address. In particular, what techniques and tools should exist for extracting data from large XML documents, for translating XML data between different ontologies (DTD's), for integrating XML data from multiple XML sources, and for transporting large amounts of XML data to clients or for sending queries to XML sources. We propose a query language for XML, called XML-QL, as one possible answer to these questions. The language has a SELECT-WHERE construct, like SQL, and borrows features of query languages recently developed by the database research community for semistructured data.

DOM

The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page. This is an overview of DOM-related materials here at W3C and around the web.

SAX

SAX, the Simple API for XML, is a standard interface for event-based XML parsing, developed collaboratively by the members of the XML-DEV mailing list, currently hosted by OASIS. SAX 2.0 was released on Friday 5 May 2000, and is free for both commercial and non-commercial use.

Document Content Description (DCD) for XML

This document proposes a structural schema facility, Document Content Description (DCD), for specifying rules covering the structure and content of XML documents. The DCD proposal incorporates a subset of the XML-Data Submission [XML-Data] and expresses it in a way which is consistent with the ongoing W3C RDF (Resource Description Framework) [RDF] effort; in particular, DCD is an RDF vocabulary. DCD is intended to define document constraints in an XML syntax; these constraints may be used in the same fashion as traditional XML DTDs. DCD also provides additional properties, such as basic datatypes.

URLs

This document specifies a Uniform Resource Locator (URL), the syntax and semantics of formalized information for location and access of resources via the Internet.

URIs

A Uniform Resource Identifier (URI) is a compact string of characters for identifying an abstract or physical resource. This document defines the generic syntax of URI, including both absolute and relative forms, and guidelines for their use; it revises and replaces the generic definitions in RFC 1738 and RFC 1808. This document defines a grammar that is a superset of all valid URI, such that an implementation can parse the common components of a URI reference without knowing the scheme-specific requirements of every possible identifier type. This document does not define a generative grammar for URI; that task will be performed by the individual specifications of each URI scheme.

Dublin Core

The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. DCMI's activities include consensus-driven working groups, global workshops, conferences, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices.

Validating Parsers

Xerces-J (Java)

The rich generating and validating capabilities allow the Xerces-J Parser to be used for: Building XML-savvy Web servers. The next generation of vertical applications which will use XML as their data format. On-the-fly validation for creating XML editors. Ensuring the integrity of e-business data expressed in XML. Building truly internationalized XML applications.

Xerces-C (C++)

Xerces has rich generating and validating capabilities. The parser is used for: Building XML-savvy Web servers. Building next generation of vertical applications that use XML as their data format. On-the-fly validation for creating XML editors. Ensuring the integrity of e-business data expressed in XML Building truly internationalized XML applications

Xerces-P (Perl)

XML4P delivers the benefits of the XML4C DOM Parser in Perl5. XML4P includes a collection of Perl5 wrapper objects that internally use their XML4C counterparts for high-performance, scalable and localizable XML DOM parsing.

Larval (Java)

Lark is a non-validating XML processor implemented in the Java language; it attempts to achieve good trade-offs among compactness, completeness, and performance. Larval is a validating XML processor built on the same code base as Lark. This report gives an overview of the motivations for, facilities offered by, and usage of, the Lark processor.

SXP (Java)

XSilfide is a client/server based environment for distributing language ressources. The whole eXtended Silfide architecture is based on (1) the XML recommandation for encoding textual ressources and transient messages (server/server, client/server) and (2) the Java language for the implementation of both the server side tools and the client workspace.

fxp (ML)

fxp is a validating XML parser written completely in the functional programming language SML. It has a programming interface allowing for production of XML applications based on fxp.

XML for C++

IBM's XML for C++ parser (XML4C) is based on Apache's Xerces-C XML parser, which is a validating XML parser written in a portable subset of C++. XML4C integrates the Xerces-C parser with IBM's International Components for Unicode (ICU) and extends the number of encodings supported to over 150. It consists of three shared libraries (2 code and 1 data) which provide classes for parsing, generating, manipulating, and validating XML documents. XML4C is faithful to the XML 1.0 Recommendation and associated standards (DOM 1.0, SAX 1.0, DOM 2.0, SAX 2.0 etc). Source code, samples and API documentation are provided with the parser.

LTXML (C)

LT XML is an integrated set of XML tools and a developers' tool-kit, including a C-based API. The release now available will run on UNIX and WIN32. The LT XML tool-kit includes stand-alone tools for a wide range of processing of well-formed XML documents, including searching and extracting, down-translation (e.g. report generation, formatting), tokenising and sorting.

Non-validating Parsers

Lark (Java)

XP (Java)

XP is an XML 1.0 parser written in Java. It is fully conforming: it detects all non well-formed documents. It is currently not a validating XML processor. However it can parse all external entities: external DTD subsets, external parameter entities and external general entities.

Expat (C)

Expat is an XML 1.0 parser written in C. It aims to be fully conforming. It is currently not a validating XML processor. The current production version of expat 1.X can be downloaded from ftp://ftp.jclark.com/pub/xml/expat.zip.

XParse (JavaScript)

Xparse is a fully compliant* well-formed XML parser written in less than 5k of JavaScript. Please feel free to use and adapt it to your needs, but more importantly, learn XML, as its going to be very important to Information Technology in the near future.

Syntax Checkers

Frontier

XML Syntax Checker - Type in a URL below, click on Submit and a script on betty.userland.com will get the XML text, run it thru a parser, and report if it's well-formed or not.

STG XML Validation Form (validating)

This interface offers full XML 1.0 validation facilities. Its only notable deviation from the 1.0 spec comes in its handling of whitespace, which it ignores inside of markup where syntactically irrelevant. Note, though, that this deviation from the spec has nothing to do with the hotly debated issue of whitespace in actual character data (in which respect this validator follows the spec).

Formatting Processors

Xalan (XSLT)

Xalan is an XSLT processor for transforming XML documents into HTML, text, or other XML document types. Xalan-Java version 1.2.2 is a complete and robust implementation of the W3C Recommendations for XSL Transformations (XSLT) and the XML Path Language (XPath). Xalan can be used from the command line, in an applet or a servlet, or as a module in other program. By default, it uses the Xerces XML parser, but it can interface to any XML parser that conforms to the DOM level 2 or SAX level 1 specification.

LotusXSL (XSLT)

XSL provides a mechanism for formatting and transforming XML, either at the browser or on the server. It allows the developer to take the abstract data semantics of an XML instance and transform it into a presentation language such as HTML or into another XML document type. LotusXSL implements an XSLT processor in Java that can be used from the command line, in an applet or a servlet, or as a module in other program. By default, it uses the XML4J (Xerces) XML parser, but it can interface to any XML parser that impliments the Java API for XML Processing (JAXP) interface.

Jade (DSSSL)

Jade is an implementation of the DSSSL style language. The current version is 1.2.1.

Koala XSL Engine (XSLT)

This is an XSL processor written in Java, using the Simple API for XML (SAX 1.0) and the Document Object Model (DOM 1.0) API.

FOP (XSL-FO)

FOP is the world's first print formatter driven by XSL formatting objects. It is a Java application that reads a formatting object tree and then turns it into a PDF document. The formatting object tree, can be in the form of an XML document (output by an XSLT engine like XT or Xalan) or can be passed in memory as a DOM Document or (in the case of XT) SAX events.

TeXML (TeX)

The path to print begins with your XML document. You write an XSL transform which accepts your document type and outputs a new XML document which conforms to the TeXML document type. The java program, TeXMLatte transforms any document conforming to the TeXML document type into TeX.

XT (XSLT)

XT is an implementation in Java of XSL Transformations. This version of XT implements the PR-xslt-19991008 version of XSLT. Stylesheets written for earlier versions of the XSLT WD must be converted before they can be used with this version of XT.

Saxon (XSLT)

The SAXON package is a collection of tools for processing XML documents. The main components are: An XSLT processor, which implements the Version 1.0 XSLT and XPath Recommendations from the World Wide Web Consortium, found at http://www.w3.org/TR/1999/REC-xslt-19991116 and http://www.w3.org/TR/1999/REC-xpath-19991116 with a number of powerful extensions. This version of Saxon also includes some features defined in XSLT 1.1. A Java library, which supports a similar processing model to XSL, but allows full programming capability, which you need if you want to perform complex processing of the data or to access external services such as a relational database. A slightly improved version of the AElfred parser from Microstar. (But you can use SAXON with any SAX-compliant XML parser if you prefer).

PassiveTeX

PassiveTeX is a library of TeX macros which can be used to process an XML document which results from an XSL transformation to formatting objects. PassiveTeX provides a rapid development environment for experimenting with XSL FO, using a reliable pre-existing formatter. Running PassiveTeX with the pdfTeX variant of TeX generates high-quality PDF files in a single operation. PassiveTeX shows how TeX can remain the formatter of choice for XML, while hiding the details of its operation from the user.

Editors

XML Spy

XML Spy is centered around a professional validating XML editor that provides five advanced views on your documents: an Enhanced Grid View for structured editing, a Database/Table view that shows repeated elements in a tabular fashion, a Text View with syntax-coloring for low-level work, a graphical XML Schema design view, and an integrated Browser View that supports both CSS and XSL style-sheets.

XML Pro

XML Pro v2.0 features the IBM XML4J parser, offering solid support for the W3C Document Object Model and the SAX interface. Using the IBM parser, XML Pro integrates well with customized XML solutions for the enterprise. "Tools such as XML Pro are exactly what we had in mind when we made our XML technologies available -- to see partners and developers using our technology to harness the power of XML is great," said Michael Weiner, XML Marketing Manager, IBM. "The use of the XML for Java Parser in Vervet Logic's XML Pro editor illustrates how powerful tools can simplify data exchange with XML."

Visual XML

Visual XML is a tool that enables you to create and modify DTD and XML documents. For more information on XML, visit www.xml.com. Feel free to download a copy of this software. Take note that this is a beta version. You should not use it on production documents. This application is written in Java. The look & feel comes from the new pluggable look & feel of the Java Foundation Class (JFC Swing 1.1.1Beta2). For those of you who are familiar with XML, it can be of some interest to know that this software itself is built in part with an XML Application that takes care of the user interface. This piece of software is named Proto. For international users, read how easy it is to change this software and let it speak in your own language. A French version is included in the download. If you have any comments or suggestions, feel free to contact me. Pierre Morel

XML Languages

Mathematical Markup Language

MathML 2.0, a W3C Recommendation was released on 21 Feb 2001. A product of the W3C Math working group, MathML is a low-level specification for describing mathematics as a basis for machine to machine communication. It provides a much needed foundation for the inclusion of mathematical expressions in Web pages.

ICE

This document describes the Information and Content Exchange protocol for use by content syndicators and their subscribers. The ICE protocol defines the roles and responsibilities of syndicators and subscribers, defines the format and method of content exchange, and provides support for management and control of syndication relationships. We expect ICE to be useful in automating content exchange and reuse, both in traditional publishing contexts and in business-to-business relationships.

Precision Graphics Markup Language

This document is the specification for the Precision Graphics Markup Language (PGML). PGML is a 2D scalable graphics language designed to meet both the simple vector graphics needs of casual users and the precision needs of graphics artists. PGML uses the imaging model common to the PostScript language and Portable Document Format (PDF); it also contains additional features to satisfy the needs of Web applications.

Resource Description Framework

Resource Description Framework (RDF) is a foundation for processing metadata; it provides interoperability between applications that exchange machine-understandable information on the Web. RDF emphasizes facilities to enable automated processing of Web resources. RDF can be used in a variety of application areas; for example: in resource discovery to provide better search engine capabilities, in cataloging for describing the content and content relationships available at a particular Web site, page, or digital library, by intelligent software agents to facilitate knowledge sharing and exchange, in content rating, in describing collections of pages that represent a single logical "document", for describing intellectual property rights of Web pages, and for expressing the privacy preferences of a user as well as the privacy policies of a Web site. RDF with digital signatures will be key to building the "Web of Trust" for electronic commerce, collaboration, and other applications. This document introduces a model for representing RDF metadata as well as a syntax for encoding and transporting this metadata in a manner that maximizes the interoperability of independently developed Web servers and clients. The syntax presented here uses the Extensible Markup Language [XML]: one of the goals of RDF is to make it possible to specify semantics for data based on XML in a standardized, interoperable manner. RDF and XML are complementary: RDF is a model of metadata and only addresses by reference many of the encoding issues that transportation and file storage require (such as internationalization, character sets, etc.). For these issues, RDF relies on the support of XML. It is also important to understand that this XML syntax is only one possible syntax for RDF and that alternate ways to represent the same RDF data model may emerge.

Vector Markup Language

This document defines the Vector Markup Language (VML). VML is an application of Extensible Markup Language (XML) 1.0 which defines a format for the encoding of vector information together with additional markup to describe how that information may be displayed and edited. The first part of this document is an introduction, which gives an overview of the way VML is organized and how it interacts with both XML and HTML as defined by the HTML 4.0 Specification. This is followed by detailed technical definition of the behavior of every VML element and the permitted and recommended behaviors for all applications.

Weather Observation Markup Format (OMF)

The intent of OMF therefore is annotation of weather reports. The reports should be distributed as they are without any mangling. On the other hand, we would like to mark them up with station information (location, name), decoded data, and with a few derived parameters. This document is to define the format of this markup. OMF is an application of XML, and by its virtue, an application of SGML. SGML is used extensively within DoD for documenting of various types of information (military standards, procurement materials, service manuals). OMF brings weather observations into the same fold.

Extensible Logfile Format (XLF)

Docuverse DOM SDK is a full implementation of the W3C DOM (Document Object Model) API in Java, available for commercial and non-commercial use without licensing fee.

FlixML

flixml.org is John E. Simpson's page of information about two seemingly unrelated subjects: The Extensible Markup Language (XML), and B movies. If you want to dive into information on XML, check here first. If it's B movies you're after, start here.

Extensible Mail Transport Protocol (XMTP)

The XML MIME Transformation Protocol (XMTP) is a mapping of MIME/SMTP to XML. MIME is the lingua franca of the Web. Both the HTTP and SMTP protocols are MIME based. As XML gains in popularity it is useful to be able to represent MIME messages as XML documents. This mapping is straightforward and demonstrates handling of binary data in XML documents as base64 encodings. Using XMTP, SMTP messages can be transformed via XSLT into HTML pages for viewing. XMTP has been used to implement a telemedicine consultation system using SMTP e-mail and HTML

Personalized Information Description Language (PIDL)

This document describes an XML syntax for the Personalized Information Description Language (PIDL). The purpose of PIDL is to facilitate personalization of online information by providing enhanced interoperability between personalization applications. PIDL provides a common framework for applications to progressively process original contents and append personalized versions in a compact format. PIDL supports the personalization of different media (e.g. plain text, structured text, graphics, etc), multiple personalization methods (such as filtering, sorting, replacing, etc) and different delivery methods (for example SMTP, HTTP, IP-multicasting, etc).

XHTML

This specification defines XHTML 1.0, a reformulation of HTML 4 as an XML 1.0 application, and three DTDs corresponding to the ones defined by HTML 4. The semantics of the elements and their attributes are defined in the W3C Recommendation for HTML 4. These semantics provide the foundation for future extensibility of XHTML. Compatibility with existing HTML user agents is possible by following a small set of guidelines.

Channel Definition Format (CDF)

These documents contain reference information for Channel Definition Format (CDF) elements used with Active Channels, Active Desktop items, and Software Update Channels.

JEX - Java Extension