1.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000
1.2 +++ b/docs/reference/implementation_defined_behaviour.xml Wed Oct 10 22:58:21 2012 +0100
1.3 @@ -0,0 +1,415 @@
1.4 +<?xml version="1.0"?>
1.5 +<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
1.6 + "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd"
1.7 +[
1.8 +]>
1.9 +<chapter id="implementation-defined-behaviour">
1.10 + <title>Implementation Defined Behaviour</title>
1.11 +
1.12 + <para>
1.13 + The specification of the XEXPR language is laid out in a W3C note of
1.14 + <ulink url="http://www.w3.org/TR/2000/NOTE-xexpr-20001121">21 November
1.15 + 2000</ulink>. However, the specification leaves quite a bit of
1.16 + information to be deduced from the examples and leaves other parts of
1.17 + the language loosly specified. This chapter documents the way that
1.18 + libxexpr implements the language and gives the rationale for each
1.19 + decision taken.
1.20 + </para>
1.21 +
1.22 + <sect1 id="idb-numbers">
1.23 + <title>Numbers</title>
1.24 +
1.25 + <para>
1.26 + Numbers are defined in pseudo-BNF as:
1.27 + <programlisting>
1.28 +number : whitespace sign simple-number whitespace
1.29 + ;
1.30 +
1.31 +whitespace : [ \t\n]*
1.32 + ;
1.33 +
1.34 +sign : [+-]?
1.35 + ;
1.36 +
1.37 +simple-number : 0x[0-9A-Fa-f]+
1.38 + | [0-9]+
1.39 + | [0-9]+\.[0-9]+
1.40 + | [0-9]+\.[0-9]+[eE][+-][0-9]+
1.41 + ;<!--
1.42 + --></programlisting>
1.43 + that is: they have an optional leading sign and may be surrounded with
1.44 + whitespace.
1.45 + </para>
1.46 +
1.47 + <simplesect id="idb-numbers-rationale">
1.48 + <title>Rationale</title>
1.49 +
1.50 + <para>
1.51 + While negative numbers can be created using <subtract> without
1.52 + the need for any signs, this seems overly cumbersome.
1.53 + </para>
1.54 +
1.55 + <para>
1.56 + The examples in the specification make it clear that where two numbers
1.57 + are seperated by a space, this should be parsed as just two numbers
1.58 + and not two numbers plus an interveening string which a strict reading
1.59 + of the specification would imply.
1.60 + </para>
1.61 + </simplesect>
1.62 + </sect1>
1.63 +
1.64 + <sect1 id="idb-bindings">
1.65 + <title>Bindings</title>
1.66 +
1.67 + <para>
1.68 + Bindings are parsed as integers, floats or strings in that order (ie.,
1.69 + the first type that matches will be used). Thus the following pairs
1.70 + of expressions are equivalent:
1.71 + <programlisting>
1.72 +<func x="+01 "/>
1.73 +
1.74 +<func>
1.75 + <define name="x"><integer>1</integer></define>
1.76 +</func>
1.77 +
1.78 +<func x=" 14.0e-1"/>
1.79 +
1.80 +<func>
1.81 + <define name="x"><float>1.4</float></define>
1.82 +</func>
1.83 +
1.84 +<func x="Hello "/>
1.85 +
1.86 +<func>
1.87 + <define name="x"><string>Hello </string></define>
1.88 +</func><!--
1.89 + --></programlisting>
1.90 + </para>
1.91 + <simplesect id="idb-bindings-rationale">
1.92 + <title>Rationale</title>
1.93 +
1.94 + <para>
1.95 + This seems to satisfy the doctrine of least-surpise.
1.96 + </para>
1.97 + </simplesect>
1.98 + </sect1>
1.99 +
1.100 + <sect1 id="idb-pcdata">
1.101 + <title>Parsing of PCDATA</title>
1.102 +
1.103 + <para>
1.104 + When numbers and strings are mixed in PCDATA, any whitespace surrounding
1.105 + the numbers is taken to be part of the numbers rather than the strings.
1.106 + Thus the following two expressions are equivalent:
1.107 + <programlisting>
1.108 +<foo>This is the 0xdeadbeef constant.</foo>
1.109 +
1.110 +<foo>
1.111 + <string>This is the</string>
1.112 + <integer>0xdeadbeef</integer>
1.113 + <string>constant.</string>
1.114 +</foo><!--
1.115 + --></programlisting>
1.116 + </para>
1.117 +
1.118 + <simplesect id="idb-pcdata-rationale">
1.119 + <title>Rationale</title>
1.120 +
1.121 + <para>
1.122 + This seems more consistent with spaces between numbers not being
1.123 + parsed as strings than the alternative.
1.124 + </para>
1.125 + </simplesect>
1.126 + </sect1>
1.127 +
1.128 + <sect1 id="idb-define">
1.129 + <title>The <define> Function</title>
1.130 +
1.131 + <para>
1.132 + The <define> function creates a new function in the environment in
1.133 + which it is invoked. This is different than the <set> function
1.134 + which will modify the definition of an existing function if such exists.
1.135 + Only if no such function is defined in any of the active environments will
1.136 + <set> create a new function (and then in the outermost, or global,
1.137 + environment).
1.138 + </para>
1.139 +
1.140 + <simplesect id="idb-define-rationale">
1.141 + <title>Rationale</title>
1.142 +
1.143 + <para>
1.144 + We know from the examples in the specification (eg., in
1.145 + <ulink url="http://www.w3.org/TR/xexpr/#id-0045">section 45</ulink>)
1.146 + that <subtract> changes the definition of its first argument
1.147 + in at least the grandfather environment. It makes sense that <set>
1.148 + should do the same. When we come to <define>, however, we
1.149 + know from <ulink url="http://www.w3.org/TR/xexpr/#id-0003">section
1.150 + 3</ulink> that it is equivalent to an attribute on the parent element
1.151 + and so it makes sense that it should create a variable in the parent
1.152 + environment.
1.153 + </para>
1.154 + </simplesect>
1.155 + </sect1>
1.156 +
1.157 + <sect1 id="idb-get">
1.158 + <title>The <get> Function</title>
1.159 +
1.160 + <para>
1.161 + The following two expressions are equivalent:
1.162 + <programlisting>
1.163 +<get name="x"/>
1.164 +<get>x</get><!--
1.165 + --></programlisting>
1.166 + The expression <x/> has the same effect except in the case of
1.167 + <add> and <subtract> where these two expressions are
1.168 + different:
1.169 + <programlisting>
1.170 +<add><x/>1</add>
1.171 +<add><get>x</get>1</add><!--
1.172 + --></programlisting>
1.173 + The first changes the definition of <x>, the second does not.
1.174 + </para>
1.175 +
1.176 + <para>
1.177 + Note that IDs are allowed to start with the dot (.) and hyphen (-)
1.178 + characters which are not valid as the first character in XML tags.
1.179 + Thus get must be used in the following:
1.180 + <programlisting>
1.181 +<expr>
1.182 + <define name=".net">4.5.50709</define>
1.183 + <print><get>.net</get></print>
1.184 +</expr><!--
1.185 + --></programlisting>
1.186 + </para>
1.187 +
1.188 + <para>
1.189 + Since <get> returns a function definition (just like
1.190 + <define>), it is possible to define functions of this type that
1.191 + take arguments and even invoke them in a somewhat circuitous manner:
1.192 + <programlisting>
1.193 +<expr>
1.194 + <define name=".product" args="a b c d">
1.195 + <add>
1.196 + <multiply>
1.197 + <a/>
1.198 + <b/>
1.199 + </multiply>
1.200 + <multiply>
1.201 + <c/>
1.202 + <d/>
1.203 + </multiply>
1.204 + </add>
1.205 + </define>
1.206 +
1.207 + <expr>
1.208 + <define name="closure"/>
1.209 + <set name="closure">
1.210 + <get>.product</get>
1.211 + </set>
1.212 + <closure>1 2 3 4</closure>
1.213 + </expr>
1.214 +</expr><!--
1.215 + --></programlisting>
1.216 + </para>
1.217 +
1.218 + <simplesect id="idb-get-rationale">
1.219 + <title>Rationale</title>
1.220 +
1.221 + <para>
1.222 + <ulink url="http://www.w3.org/TR/xexpr/#id-0014">Section 14</ulink>
1.223 + tells us that <get>x</get> and <x/> have the same
1.224 + effect in most cases (and thus presumably not all cases) and it
1.225 + would seem surprising if <get> were not to insulate a
1.226 + function in this manner.
1.227 + </para>
1.228 + </simplesect>
1.229 + </sect1>
1.230 +
1.231 + <sect1 id="idb-arithmetic">
1.232 + <title>Arithmetic Operators</title>
1.233 +
1.234 + <para>
1.235 + The empty arithmetic operators (<add/>, <subtract/>,
1.236 + <multiply/> and <divide/>) all evaluate to <nil/>.
1.237 + </para>
1.238 +
1.239 + <para>
1.240 + The <add> and <subtract> operators change their first
1.241 + argument in some circumstances as in this example from the specification:
1.242 + <programlisting>
1.243 +<while>
1.244 + <gt><x/> 0</gt>
1.245 + <expr>
1.246 + <print newline="true"><x/><print>
1.247 + <subtract><x/> 1</subtract>
1.248 + </expr>
1.249 +</while><!--
1.250 + --></programlisting>
1.251 + </para>
1.252 +
1.253 + <para>
1.254 + In general, the first agument will be modified if it is a function
1.255 + invocation that has no bindings and no arguments. Thus the following
1.256 + will print 9:
1.257 + <programlisting>
1.258 +<define name="x"><multiply>2 3</multiply></define>
1.259 +<add><x/>3</add>
1.260 +<print><x/></print><!--
1.261 + --></programlisting>
1.262 + whereas this will print 6:
1.263 + <programlisting>
1.264 +<define name="x"><multiply>2 3</multiply></define>
1.265 +<add><x unused=""/>3</add>
1.266 +<print><x/></print><!--
1.267 + --></programlisting>
1.268 + </para>
1.269 +
1.270 + <para>
1.271 + Where arguments are modified, this occurs as the arguments are being
1.272 + evaluated. Thus this expression:
1.273 + <programlisting>
1.274 +<add><x/><x/><x/></add><!--
1.275 + --></programlisting>
1.276 + will multiply <x> by 4 rather than by 3.
1.277 + </para>
1.278 +
1.279 + <simplesect id="idb-arithmetic-rationale">
1.280 + <title>Rationale</title>
1.281 +
1.282 + <para>
1.283 + The examples in the specification imply that <add> and
1.284 + <subtract> modify their first argument when it is a
1.285 + variable. The iterative example in
1.286 + <ulink url="http://www.w3.org/TR/xexpr/#id-0008">section 8</ulink>
1.287 + wouldn't work if <multiply> worked the same way (and the
1.288 + definition of <2pi> in <ulink
1.289 + url="http://www.w3.org/TR/xexpr/#id-0007">section 7</ulink> would
1.290 + not be expected to modify the definition of <pi> each time
1.291 + it is called). Note that this example is erroneous: IDs can't contain
1.292 + numbers.
1.293 + </para>
1.294 +
1.295 + <para>
1.296 + It seems undesirable to modify functions that take arguments.
1.297 + Making the decision based on the invocation rather than the
1.298 + function definition makes expressions much easier to read.
1.299 + </para>
1.300 + </simplesect>
1.301 + </sect1>
1.302 +
1.303 + <sect1 id="idb-comparison">
1.304 + <title>Comparison Functions</title>
1.305 +
1.306 + <para>
1.307 + The empty comparison functions (<eq/>, <neq/>, <leq/>,
1.308 + <geq/>, <lt/> and <gt/>) and comparison functions with
1.309 + exactly one argument all evaluate to <true/>.
1.310 + </para>
1.311 +
1.312 + <para>
1.313 + The ordered comparison functions (<leq>, <geq>, <lt>
1.314 + and <gt>) act as if the equivalent mathematical operator was
1.315 + inserted between their arguments. Thus:
1.316 + <programlisting>
1.317 +<lt>
1.318 + 1 2 3
1.319 +</lt><!--
1.320 + --></programlisting>
1.321 + is equivalent to the mathematical expression:
1.322 + <screen>1 < 2 < 3</screen>
1.323 + and:
1.324 + <programlisting>
1.325 +<leq>
1.326 + 1 2 3
1.327 +</leq><!--
1.328 + --></programlisting>
1.329 + is equivalent to the mathematical expression:
1.330 + <screen>1 ≤ 2 ≤ 3</screen>
1.331 + </para>
1.332 +
1.333 + <para>
1.334 + When comparing objects of different types:
1.335 + <itemizedlist>
1.336 + <listitem>
1.337 + Numbers will be implicitly cast between <float> and
1.338 + <integer> where that involves no loss of precision
1.339 + </listitem>
1.340 + <listitem>
1.341 + Strings will be compared byte-by-byte as UTF8 encoded strings
1.342 + </listitem>
1.343 + <listitem>
1.344 + Functions will always be completely evaluated
1.345 + </listitem>
1.346 + <listitem>
1.347 + Invocations of the constant functions are ordered as <false/>
1.348 + < <nil/> < <true/>
1.349 + </listitem>
1.350 + </itemizedlist>
1.351 + </para>
1.352 +
1.353 + <simplesect id="idb-comparison-rationale">
1.354 + <title>Rationale</title>
1.355 +
1.356 + <para>
1.357 + The empty comparison functions equaluate to <true> by analogy
1.358 + with the comparison functions.
1.359 + </para>
1.360 +
1.361 + <para>
1.362 + The ordering of the comparison functions is confused in the
1.363 + specification with the examples for <lt> and <gt> agreeing
1.364 + with libxexpr's behaviour and the examples for <leq> and
1.365 + <geq> doing the opposite. The choice was arbitary.
1.366 + </para>
1.367 + </simplesect>
1.368 + </sect1>
1.369 +
1.370 + <sect1 id="idb-redefining-builtins">
1.371 + <title>Redefining Builtin Functions</title>
1.372 +
1.373 + <para>
1.374 + Attempting to redefine a builtin function results in an error.
1.375 + </para>
1.376 +
1.377 + <simplesect id="idb-redefining-builtins-rationale">
1.378 + <title>Rationale</title>
1.379 +
1.380 + <para>
1.381 + While this could be implemented, there is no mention of it in the
1.382 + specification and it would complicate the implementation with no
1.383 + obvious benefit.
1.384 + </para>
1.385 + </simplesect>
1.386 + </sect1>
1.387 +
1.388 + <sect1 id="idb-namespaces">
1.389 + <title>Namespaces</title>
1.390 +
1.391 + <para>
1.392 + libxexpr considers an element's namespace to be part of its name
1.393 + and thus elements in a namespace other than the XEXPR namespace as
1.394 + are always distinct from functions defined by XEXPR. In addition,
1.395 + functions defined using <define> are defined in the
1.396 + XEXPR namespace.
1.397 + </para>
1.398 +
1.399 + <para>
1.400 + libxexpr provides hooks for extending the XEXPR language by allowing
1.401 + handlers to be installed for other namespaces.
1.402 + </para>
1.403 +
1.404 + <para>
1.405 + Elements which are in no namespace are treated as if they were in the
1.406 + XEXPR namespace.
1.407 + </para>
1.408 +
1.409 + <simplesect id="idb-namespaces-rationale">
1.410 + <title>Rationale</title>
1.411 +
1.412 + <para>
1.413 + Being able to extend the XEXPR language is vital for it to be useful
1.414 + and namespaces are the obvious way to do this.
1.415 + </para>
1.416 + </simplesect>
1.417 + </sect1>
1.418 +</chapter>