Related Topics: XML Magazine

XML: Article

What's New in XSLT 2.0

What's New in XSLT 2.0

The XSLT version 1.0 language definition has been an official recommendation of the W3C since 1999. Its use has expanded dramatically in the past 18 months, for processing XML and XML/SOAP security policies and for generating HTML Web pages.

Of course, nearly as soon as the language became official, people began proposing to change it. (Indeed, the original document has a page of suggested improvements for future versions.) These efforts began as a version 1.1 proposal, which was abandoned in favor of the current Working Draft (WD). We should see XSLT 2.0 become an official W3C Recommendation sometime this year.

Data Types
XSLT 1.0 dealt with four types of data - strings, numbers, Booleans, and nodesets. XSLT 2.0 has 48 atomic (built-in) data types, plus lists and unions constructed from them. There are now 16 numeric types; 9 variations of date, time, and duration; plus hexBinary and base64Binary, among others. Users may also create others from the built-in types to suit their needs.

Let's look at numbers. In XSLT 1.0, there was only a single variety of number, represented internally as a floating point double, and sometimes used explicitly as an integer. Now we'll have doubles, floats, various signed and unsigned integer precisions, and decimals. Decimals may be a new concept for some of you, unless you've been programming in COBOL recently. These are intended to provide exact representations of decimal fractions (e.g., dollars and cents) without the approximation caused by using floating point. So, we now have three different kinds of numeric constants we can create, instead of one:

  1. Integer: 1234
  2. Decimal: 12.34
  3. Double: 1234e-2
Notice that 12.34 used to be a double; now it's a decimal. And what about all the rest of the data types? Any type, including user-defined (derived) types, can be created using a constructor with the same name as the data type. For example:
  • xs:decimal(1234) creates a decimal value
  • xs:date("2003-03-17") creates a date

    Having all these new types will lead to greater flexibility, and, occasionally, things to watch out for. XSLT is now a strongly typed language, so it's possible that a parameter to a function you call will need to be an integer - if you pass a double instead, an error will result, unless you cast it properly.

    The new date and type data types are a great blessing to anyone who had to suffer through the ugly string manipulations required before. We now have duration, dateTime, date, time, gYearMonth, gYear, gMonthDay, gDay, and gMonth. What more could anyone want? And there will be a format-date() function available, although the details are not yet specified in the latest WD.

    Finally, we still have strings and Booleans. But what about nodesets? Technically, they have disappeared, replaced by sequences (lists) of nodes. In general, sequences are not necessarily in document order and may contain duplicates. However, all functions from XSLT 1.0 that returned nodesets in the past now return ordered sequences with duplicates removed. Old stylesheets will still work fine.

    Path Expressions
    XPath 2.0 has generalized its path expressions. Now, a primary expression (literal, function call, variable reference, or parenthesized expression) can appear at any step in the path, rather than just the first step. Basically, this allows any expression that returns a nodeset (sequence of nodes) to appear on either side of a "/". This is especially useful for expressions like:

    Book/(Chapter | Appendix)/Paragraph

    or

    document("a.xml")/id("ID01")

    There are two things to notice here. First, this applies to expressions, but not to patterns. So, you can use these in xsl:for-each or xsl:value-of statements, but not in the match patterns for templates. Match patterns have not changed, except that predicates in patterns can be XPath 2.0 expressions. The second thing to notice is that some things are legal that aren't especially useful:

    document("a.xml")/document("b.xml")

    or

    anything-on-the-left/$x

    Both of these ignore anything to the left of the last "/" and are equivalent to the nodeset that is the rightmost step. Hopefully, XSLT processors will produce a warning for these.

    Conditionals and Looping
    Expressions can now include bits of flow control within them, using if ... then ... else:

    <xsl:value-of select="
    if(item/price < 100)
    then 'cheap'
    else 'expensive' " />

    This is entirely within XPath expressions, and is separate from xsl:if. (Old-timers may be reminded of Algol syntax here.) You can now generate sequences with for loops. Consider the following:

    for $i in (0 to 9), $j in (1 to 10)
    return 10*$i + $j

    This returns the sequence of numbers from 1 to 100. Of course, in this case, a simpler equivalent would use a range expression, (1 to 100), to generate the same sequence.

    Grouping
    Processing groups of related elements is a problem that comes up repeatedly on XSLT mailing lists. The required techniques are well known among experienced users, but they have to be explained anew to every HTML programmer who comes over to XSLT. Consider the sample XML shown below:

    <cars>
    <car make="Dodge" model="caravan"
    color="red" price="28000" />
    <car make="Ford" model="probe"
    color="blue" price="14000" />
    <car make="Dodge" model="caravan"
    color="red" price="28000" />
    <car make="Ford" model="thunderbird"
    color="silver" price="45000" />
    <car make="Ferrari" color="red"
    price="280000" />
    <car make="Dodge" model="caravan"
    color="green" price="28000" />
    </cars>

    Suppose you wanted to list the cars by make, or by color. The new xsl:for-each-group instruction makes life much easier:

    <xsl:for-each-group select="cars/
    car" group-by="make">
    Makes: <xsl:value-of select=
    "current-group()/@make"
    separator=","/>
    </xsl:for-each-group>

    You have a lot of flexibility with the xsl:for-each-group element. In addition to the required select element shown above, the following attributes are available:

  • group-by: allows selection of a group of elements to be treated together
  • group-adjacent: groups of adjacent items that match
  • group-starting-with: each node matching the pattern starts a new group
  • group-ending-with: similarly, a matching node ends a group

    Exactly one of these four attributes must appear. Within the xsl:for-each-group element, the new current-group() function gives access to the group members, and will make grouping much simpler than it was with XSLT 1.0.

    Regular Expressions
    String handling in XSLT has always been a tedious process. Both XSLT and XPath have added new facilities for dealing with regular expressions. The most complex of these is xsl:analyze-string, which takes an input string and a regular expression. It partitions the input into a set of substrings matching the expression. These can be processed by xsl:matching-substring and xsl:non-matching-substring instructions, which can construct any content required for each of them.

    For simpler regular expression processing, XPath 2.0 has added three functions, fn:match(), fn:replace(), and fn:tokenize(). The first returns Boolean true or false, depending upon whether a string matches a given regular expression. The second replaces all substrings in the input string that match the regular expression with a replacement string, and returns the resulting string. The third uses a regular expression as a separator, and returns a sequence of substrings created by breaking the input at that separator. In addition, of course, XPath retains all its other string manipulation functions. This should go a long way to simplifying - and making more powerful - string handling in stylesheets.

    User-Defined Functions
    User-defined stylesheet functions allow creation of functions that can be called from within XPath expressions. Declaration is simple. For example:

    <xsl:function name="square" >
    <xsl:param name="x" />
    <xsl:result select="x*x" />
    </xsl:function>

    returns the square of its input parameter. Children of xsl:function declarations may only be xsl:param, xsl:variable, xsl:message, or xsl:result. This might seem to limit what functions are capable of doing. However, there is no limit to what can appear inside the xsl:variable element, including xsl:call-template and xsl:apply-template, so you really have a great deal of flexibility.

    Schemas
    One issue that has proven somewhat divisive in version 2.0 is the inclusion of schemas. There are those, including Microsoft and others, who believe that schemas are necessary to what people are doing with the language. Schemas provide both validation of data types in input documents, and clues to the XSLT processor about which data structures to expect. There are others who think it is too complicated, and that schema validation should be a separate process. There are reasonable arguments on both sides. In the end, it was decided to define a conformance level for which schema support was optional.

    With schema support, there is a new xsl:import-schema declaration. Every data type name that is not a built-in name must be defined in an imported schema. XPath expressions will be able to validate values against in-scope schemas, and will be able to use constructors and casts to imported data types.

    Inputs and Outputs
    XSLT 1.0 allowed for a primary input document, auxiliary input via the document() function, and a single output. Version 2.0 provides for multiple inputs in several ways. There will be an input() function that provides access to a sequence of input nodes, and a collection() function that allows specification of a URI that defines a node sequence. Both of these provide access to multiple documents or document fragments. In addition, the unparsed-text() function will read arbitrary external resources (e.g., files) and return each one as a string.

    For output, the new xsl:result-document instruction provides for named and unnamed output trees. Combined with xsl:output declarations, this allows for multiple output documents. It is a feature that has been widely requested.

    When Can I Try XSLT 2.0?
    There are still issues to be resolved before the Working Drafts turn into official W3C Recommendations. Committee members have said that they hope the process will be complete by late this summer or early fall. Meanwhile, Michael Kay, who is both the editor of the XSLT 2.0 WD and the creator of Saxon, has made a version of Saxon available that supports most of the new proposal. And, of course, most suppliers of XSLT processors are working to support version 2.0 of the language as soon as it becomes official. The committees are doing a great job improving XSLT, and I expect it to be enthusiastically adopted by the XML community.

    References

  • XSLT 1.0: www.w3.org/TR/1999/REC-xslt-19991116
  • XPath 1.0: www.w3.org/TR/1999/REC-xpath-19991116
  • XSLT 2.0: www.w3.org/TR/xslt20
  • XPath2.0: www.w3.org/TR/xpath20
  • XML Schema Part 1: Structures: www.w3.org/TR/xmlschema-1
  • XML Schema Part 2: Datatypes: www.w3.org/TR/xmlschema-2
  • Functions and Operators: www.w3.org/TR/xquery-operators
  • More Stories By Jeff Kenton

    Jeff Kenton is chairman of the OASIS XSLT/XPath Conformance Committee and the senior developer at DataPower Technology (www.datapower.com), Inc., for their XSLT 2.0 JIT compiler. Prior to joining DataPower, Jeff spent 25 years consulting in the Boston area. He was also a cofounder and the director of operating systems at Xyvision in 1982. His degrees are from MIT and Carnegie Mellon.

    Comments (1)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.