ali@0: ali@0: ali@0: ali@0: Implementation Defined Behaviour ali@0: ali@0: ali@0: The specification of the XEXPR language is laid out in a W3C note of ali@0: 21 November ali@0: 2000. However, the specification leaves quite a bit of ali@0: information to be deduced from the examples and leaves other parts of ali@0: the language loosly specified. This chapter documents the way that ali@0: libxexpr implements the language and gives the rationale for each ali@0: decision taken. ali@0: ali@0: ali@0: ali@0: Numbers ali@0: ali@0: ali@0: Numbers are defined in pseudo-BNF as: ali@0: ali@0: number : whitespace sign simple-number whitespace ali@0: ; ali@0: ali@0: whitespace : [ \t\n]* ali@0: ; ali@0: ali@0: sign : [+-]? ali@0: ; ali@0: ali@0: simple-number : 0x[0-9A-Fa-f]+ ali@0: | [0-9]+ ali@0: | [0-9]+\.[0-9]+ ali@0: | [0-9]+\.[0-9]+[eE][+-][0-9]+ ali@0: ; ali@0: that is: they have an optional leading sign and may be surrounded with ali@0: whitespace. ali@0: ali@0: ali@0: ali@0: Rationale ali@0: ali@0: ali@0: While negative numbers can be created using <subtract> without ali@0: the need for any signs, this seems overly cumbersome. ali@0: ali@0: ali@0: ali@0: The examples in the specification make it clear that where two numbers ali@0: are seperated by a space, this should be parsed as just two numbers ali@0: and not two numbers plus an interveening string which a strict reading ali@0: of the specification would imply. ali@0: ali@0: ali@0: ali@0: ali@0: ali@0: Bindings ali@0: ali@0: ali@0: Bindings are parsed as integers, floats or strings in that order (ie., ali@0: the first type that matches will be used). Thus the following pairs ali@0: of expressions are equivalent: ali@0: ali@0: <func x="+01 "/> ali@0: ali@0: <func> ali@0: <define name="x"><integer>1</integer></define> ali@0: </func> ali@0: ali@0: <func x=" 14.0e-1"/> ali@0: ali@0: <func> ali@0: <define name="x"><float>1.4</float></define> ali@0: </func> ali@0: ali@0: <func x="Hello "/> ali@0: ali@0: <func> ali@0: <define name="x"><string>Hello </string></define> ali@0: </func> ali@0: ali@0: ali@0: Rationale ali@0: ali@0: ali@0: This seems to satisfy the doctrine of least-surpise. ali@0: ali@0: ali@0: ali@0: ali@0: ali@0: Parsing of PCDATA ali@0: ali@0: ali@0: When numbers and strings are mixed in PCDATA, any whitespace surrounding ali@0: the numbers is taken to be part of the numbers rather than the strings. ali@0: Thus the following two expressions are equivalent: ali@0: ali@0: <foo>This is the 0xdeadbeef constant.</foo> ali@0: ali@0: <foo> ali@0: <string>This is the</string> ali@0: <integer>0xdeadbeef</integer> ali@0: <string>constant.</string> ali@0: </foo> ali@0: ali@0: ali@0: ali@0: Rationale ali@0: ali@0: ali@0: This seems more consistent with spaces between numbers not being ali@0: parsed as strings than the alternative. ali@0: ali@0: ali@0: ali@0: ali@0: ali@0: The <define> Function ali@0: ali@0: ali@0: The <define> function creates a new function in the environment in ali@0: which it is invoked. This is different than the <set> function ali@0: which will modify the definition of an existing function if such exists. ali@0: Only if no such function is defined in any of the active environments will ali@0: <set> create a new function (and then in the outermost, or global, ali@0: environment). ali@0: ali@0: ali@0: ali@0: Rationale ali@0: ali@0: ali@0: We know from the examples in the specification (eg., in ali@0: section 45) ali@0: that <subtract> changes the definition of its first argument ali@0: in at least the grandfather environment. It makes sense that <set> ali@0: should do the same. When we come to <define>, however, we ali@0: know from section ali@0: 3 that it is equivalent to an attribute on the parent element ali@0: and so it makes sense that it should create a variable in the parent ali@0: environment. ali@0: ali@0: ali@0: ali@0: ali@0: ali@0: The <get> Function ali@0: ali@0: ali@0: The following two expressions are equivalent: ali@0: ali@0: <get name="x"/> ali@0: <get>x</get> ali@0: The expression <x/> has the same effect except in the case of ali@0: <add> and <subtract> where these two expressions are ali@0: different: ali@0: ali@0: <add><x/>1</add> ali@0: <add><get>x</get>1</add> ali@0: The first changes the definition of <x>, the second does not. ali@0: ali@0: ali@0: ali@0: Note that IDs are allowed to start with the dot (.) and hyphen (-) ali@0: characters which are not valid as the first character in XML tags. ali@0: Thus get must be used in the following: ali@0: ali@0: <expr> ali@0: <define name=".net">4.5.50709</define> ali@0: <print><get>.net</get></print> ali@0: </expr> ali@0: ali@0: ali@0: ali@0: Since <get> returns a function definition (just like ali@0: <define>), it is possible to define functions of this type that ali@0: take arguments and even invoke them in a somewhat circuitous manner: ali@0: ali@0: <expr> ali@0: <define name=".product" args="a b c d"> ali@0: <add> ali@0: <multiply> ali@0: <a/> ali@0: <b/> ali@0: </multiply> ali@0: <multiply> ali@0: <c/> ali@0: <d/> ali@0: </multiply> ali@0: </add> ali@0: </define> ali@0: ali@0: <expr> ali@0: <define name="closure"/> ali@0: <set name="closure"> ali@0: <get>.product</get> ali@0: </set> ali@0: <closure>1 2 3 4</closure> ali@0: </expr> ali@0: </expr> ali@0: ali@0: ali@0: ali@0: Rationale ali@0: ali@0: ali@0: Section 14 ali@0: tells us that <get>x</get> and <x/> have the same ali@0: effect in most cases (and thus presumably not all cases) and it ali@0: would seem surprising if <get> were not to insulate a ali@0: function in this manner. ali@0: ali@0: ali@0: ali@0: ali@0: ali@0: Arithmetic Operators ali@0: ali@0: ali@0: The empty arithmetic operators (<add/>, <subtract/>, ali@0: <multiply/> and <divide/>) all evaluate to <nil/>. ali@0: ali@0: ali@0: ali@0: The <add> and <subtract> operators change their first ali@0: argument in some circumstances as in this example from the specification: ali@0: ali@0: <while> ali@0: <gt><x/> 0</gt> ali@0: <expr> ali@0: <print newline="true"><x/><print> ali@0: <subtract><x/> 1</subtract> ali@0: </expr> ali@0: </while> ali@0: ali@0: ali@0: ali@0: In general, the first agument will be modified if it is a function ali@0: invocation that has no bindings and no arguments. Thus the following ali@0: will print 9: ali@0: ali@0: <define name="x"><multiply>2 3</multiply></define> ali@0: <add><x/>3</add> ali@0: <print><x/></print> ali@0: whereas this will print 6: ali@0: ali@0: <define name="x"><multiply>2 3</multiply></define> ali@0: <add><x unused=""/>3</add> ali@0: <print><x/></print> ali@0: ali@0: ali@0: ali@0: Where arguments are modified, this occurs as the arguments are being ali@0: evaluated. Thus this expression: ali@0: ali@0: <add><x/><x/><x/></add> ali@0: will multiply <x> by 4 rather than by 3. ali@0: ali@0: ali@0: ali@0: Rationale ali@0: ali@0: ali@0: The examples in the specification imply that <add> and ali@0: <subtract> modify their first argument when it is a ali@0: variable. The iterative example in ali@0: section 8 ali@0: wouldn't work if <multiply> worked the same way (and the ali@0: definition of <2pi> in section 7 would ali@0: not be expected to modify the definition of <pi> each time ali@0: it is called). Note that this example is erroneous: IDs can't contain ali@0: numbers. ali@0: ali@0: ali@0: ali@0: It seems undesirable to modify functions that take arguments. ali@0: Making the decision based on the invocation rather than the ali@0: function definition makes expressions much easier to read. ali@0: ali@0: ali@0: ali@0: ali@0: ali@0: Comparison Functions ali@0: ali@0: ali@0: The empty comparison functions (<eq/>, <neq/>, <leq/>, ali@0: <geq/>, <lt/> and <gt/>) and comparison functions with ali@0: exactly one argument all evaluate to <true/>. ali@0: ali@0: ali@0: ali@0: The ordered comparison functions (<leq>, <geq>, <lt> ali@0: and <gt>) act as if the equivalent mathematical operator was ali@0: inserted between their arguments. Thus: ali@0: ali@0: <lt> ali@0: 1 2 3 ali@0: </lt> ali@0: is equivalent to the mathematical expression: ali@0: 1 < 2 < 3 ali@0: and: ali@0: ali@0: <leq> ali@0: 1 2 3 ali@0: </leq> ali@0: is equivalent to the mathematical expression: ali@0: 1 ≤ 2 ≤ 3 ali@0: ali@0: ali@0: ali@0: When comparing objects of different types: ali@0: ali@0: ali@0: Numbers will be implicitly cast between <float> and ali@0: <integer> where that involves no loss of precision ali@0: ali@0: ali@0: Strings will be compared byte-by-byte as UTF8 encoded strings ali@0: ali@0: ali@0: Functions will always be completely evaluated ali@0: ali@0: ali@0: Invocations of the constant functions are ordered as <false/> ali@0: < <nil/> < <true/> ali@0: ali@0: ali@0: ali@0: ali@0: ali@0: Rationale ali@0: ali@0: ali@0: The empty comparison functions equaluate to <true> by analogy ali@0: with the comparison functions. ali@0: ali@0: ali@0: ali@0: The ordering of the comparison functions is confused in the ali@0: specification with the examples for <lt> and <gt> agreeing ali@0: with libxexpr's behaviour and the examples for <leq> and ali@0: <geq> doing the opposite. The choice was arbitary. ali@0: ali@0: ali@0: ali@0: ali@0: ali@0: Redefining Builtin Functions ali@0: ali@0: ali@0: Attempting to redefine a builtin function results in an error. ali@0: ali@0: ali@0: ali@0: Rationale ali@0: ali@0: ali@0: While this could be implemented, there is no mention of it in the ali@0: specification and it would complicate the implementation with no ali@0: obvious benefit. ali@0: ali@0: ali@0: ali@0: ali@0: ali@0: Namespaces ali@0: ali@0: ali@0: libxexpr considers an element's namespace to be part of its name ali@0: and thus elements in a namespace other than the XEXPR namespace as ali@0: are always distinct from functions defined by XEXPR. In addition, ali@0: functions defined using <define> are defined in the ali@0: XEXPR namespace. ali@0: ali@0: ali@0: ali@0: libxexpr provides hooks for extending the XEXPR language by allowing ali@0: handlers to be installed for other namespaces. ali@0: ali@0: ali@0: ali@0: Elements which are in no namespace are treated as if they were in the ali@0: XEXPR namespace. ali@0: ali@0: ali@0: ali@0: Rationale ali@0: ali@0: ali@0: Being able to extend the XEXPR language is vital for it to be useful ali@0: and namespaces are the obvious way to do this. ali@0: ali@0: ali@0: ali@0: