docs/reference/implementation_defined_behaviour.xml
changeset 0 bc8c9a11cbfc
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/docs/reference/implementation_defined_behaviour.xml	Wed Oct 10 22:58:21 2012 +0100
     1.3 @@ -0,0 +1,415 @@
     1.4 +<?xml version="1.0"?>
     1.5 +<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
     1.6 +               "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd"
     1.7 +[
     1.8 +]>
     1.9 +<chapter id="implementation-defined-behaviour">
    1.10 +  <title>Implementation Defined Behaviour</title>
    1.11 +
    1.12 +  <para>
    1.13 +    The specification of the XEXPR language is laid out in a W3C note of
    1.14 +    <ulink url="http://www.w3.org/TR/2000/NOTE-xexpr-20001121">21 November
    1.15 +    2000</ulink>. However, the specification leaves quite a bit of
    1.16 +    information to be deduced from the examples and leaves other parts of
    1.17 +    the language loosly specified. This chapter documents the way that
    1.18 +    libxexpr implements the language and gives the rationale for each
    1.19 +    decision taken.
    1.20 +  </para>
    1.21 +
    1.22 +  <sect1 id="idb-numbers">
    1.23 +    <title>Numbers</title>
    1.24 +
    1.25 +    <para>
    1.26 +      Numbers are defined in pseudo-BNF as:
    1.27 +      <programlisting>
    1.28 +number         : whitespace sign simple-number whitespace
    1.29 +	       ;
    1.30 +
    1.31 +whitespace     : [ \t\n]*
    1.32 +	       ;
    1.33 +
    1.34 +sign           : [+-]?
    1.35 +	       ;
    1.36 +
    1.37 +simple-number  : 0x[0-9A-Fa-f]+
    1.38 +	       | [0-9]+
    1.39 +	       | [0-9]+\.[0-9]+
    1.40 +	       | [0-9]+\.[0-9]+[eE][+-][0-9]+
    1.41 +	       ;<!--
    1.42 +   --></programlisting>
    1.43 +      that is: they have an optional leading sign and may be surrounded with
    1.44 +      whitespace.
    1.45 +    </para>
    1.46 +
    1.47 +    <simplesect id="idb-numbers-rationale">
    1.48 +      <title>Rationale</title>
    1.49 +
    1.50 +      <para>
    1.51 +        While negative numbers can be created using &lt;subtract&gt; without
    1.52 +	the need for any signs, this seems overly cumbersome.
    1.53 +      </para>
    1.54 +
    1.55 +      <para>
    1.56 +	The examples in the specification make it clear that where two numbers
    1.57 +	are seperated by a space, this should be parsed as just two numbers
    1.58 +	and not two numbers plus an interveening string which a strict reading
    1.59 +	of the specification would imply.
    1.60 +      </para>
    1.61 +    </simplesect>
    1.62 +  </sect1>
    1.63 +
    1.64 +  <sect1 id="idb-bindings">
    1.65 +    <title>Bindings</title>
    1.66 +
    1.67 +    <para>
    1.68 +      Bindings are parsed as integers, floats or strings in that order (ie.,
    1.69 +      the first type that matches will be used). Thus the following pairs
    1.70 +      of expressions are equivalent:
    1.71 +      <programlisting>
    1.72 +&lt;func x="+01 "/&gt;
    1.73 +
    1.74 +&lt;func&gt;
    1.75 +  &lt;define name="x"&gt;&lt;integer&gt;1&lt;/integer&gt;&lt;/define&gt;
    1.76 +&lt;/func&gt;
    1.77 +
    1.78 +&lt;func x=" 14.0e-1"/&gt;
    1.79 +
    1.80 +&lt;func&gt;
    1.81 +  &lt;define name="x"&gt;&lt;float&gt;1.4&lt;/float&gt;&lt;/define&gt;
    1.82 +&lt;/func&gt;
    1.83 +
    1.84 +&lt;func x="Hello "/&gt;
    1.85 +
    1.86 +&lt;func&gt;
    1.87 +  &lt;define name="x"&gt;&lt;string&gt;Hello &lt;/string&gt;&lt;/define&gt;
    1.88 +&lt;/func&gt;<!--
    1.89 +   --></programlisting>
    1.90 +    </para>
    1.91 +    <simplesect id="idb-bindings-rationale">
    1.92 +      <title>Rationale</title>
    1.93 +
    1.94 +      <para>
    1.95 +	This seems to satisfy the doctrine of least-surpise.
    1.96 +      </para>
    1.97 +    </simplesect>
    1.98 +  </sect1>
    1.99 +
   1.100 +  <sect1 id="idb-pcdata">
   1.101 +    <title>Parsing of PCDATA</title>
   1.102 +
   1.103 +    <para>
   1.104 +      When numbers and strings are mixed in PCDATA, any whitespace surrounding
   1.105 +      the numbers is taken to be part of the numbers rather than the strings.
   1.106 +      Thus the following two expressions are equivalent:
   1.107 +      <programlisting>
   1.108 +&lt;foo&gt;This is the 0xdeadbeef constant.&lt;/foo&gt;
   1.109 +
   1.110 +&lt;foo&gt;
   1.111 +  &lt;string&gt;This is the&lt;/string&gt;
   1.112 +  &lt;integer&gt;0xdeadbeef&lt;/integer&gt;
   1.113 +  &lt;string&gt;constant.&lt;/string&gt;
   1.114 +&lt;/foo&gt;<!--
   1.115 +   --></programlisting>
   1.116 +    </para>
   1.117 +
   1.118 +    <simplesect id="idb-pcdata-rationale">
   1.119 +      <title>Rationale</title>
   1.120 +
   1.121 +      <para>
   1.122 +	This seems more consistent with spaces between numbers not being
   1.123 +	parsed as strings than the alternative.
   1.124 +      </para>
   1.125 +    </simplesect>
   1.126 +  </sect1>
   1.127 +
   1.128 +  <sect1 id="idb-define">
   1.129 +    <title>The &lt;define&gt; Function</title>
   1.130 +
   1.131 +    <para>
   1.132 +      The &lt;define&gt; function creates a new function in the environment in
   1.133 +      which it is invoked. This is different than the &lt;set&gt; function
   1.134 +      which will modify the definition of an existing function if such exists.
   1.135 +      Only if no such function is defined in any of the active environments will
   1.136 +      &lt;set&gt; create a new function (and then in the outermost, or global,
   1.137 +      environment).
   1.138 +    </para>
   1.139 +
   1.140 +    <simplesect id="idb-define-rationale">
   1.141 +      <title>Rationale</title>
   1.142 +
   1.143 +      <para>
   1.144 +	We know from the examples in the specification (eg., in
   1.145 +	<ulink url="http://www.w3.org/TR/xexpr/#id-0045">section 45</ulink>)
   1.146 +	that &lt;subtract&gt; changes the definition of its first argument
   1.147 +	in at least the grandfather environment. It makes sense that &lt;set&gt;
   1.148 +	should do the same. When we come to &lt;define&gt;, however, we
   1.149 +	know from <ulink url="http://www.w3.org/TR/xexpr/#id-0003">section
   1.150 +	3</ulink> that it is equivalent to an attribute on the parent element
   1.151 +	and so it makes sense that it should create a variable in the parent
   1.152 +	environment.
   1.153 +      </para>
   1.154 +    </simplesect>
   1.155 +  </sect1>
   1.156 +
   1.157 +  <sect1 id="idb-get">
   1.158 +    <title>The &lt;get&gt; Function</title>
   1.159 +
   1.160 +    <para>
   1.161 +      The following two expressions are equivalent:
   1.162 +      <programlisting>
   1.163 +&lt;get name="x"/&gt;
   1.164 +&lt;get&gt;x&lt;/get&gt;<!--
   1.165 +   --></programlisting>
   1.166 +      The expression &lt;x/&gt; has the same effect except in the case of
   1.167 +      &lt;add&gt; and &lt;subtract&gt; where these two expressions are
   1.168 +      different:
   1.169 +      <programlisting>
   1.170 +&lt;add&gt;&lt;x/&gt;1&lt;/add&gt;
   1.171 +&lt;add&gt;&lt;get&gt;x&lt;/get&gt;1&lt;/add&gt;<!--
   1.172 +   --></programlisting>
   1.173 +      The first changes the definition of &lt;x&gt;, the second does not.
   1.174 +    </para>
   1.175 +
   1.176 +    <para>
   1.177 +      Note that IDs are allowed to start with the dot (.) and hyphen (-)
   1.178 +      characters which are not valid as the first character in XML tags.
   1.179 +      Thus get must be used in the following:
   1.180 +      <programlisting>
   1.181 +&lt;expr&gt;
   1.182 +  &lt;define name=".net"&gt;4.5.50709&lt;/define&gt;
   1.183 +  &lt;print&gt;&lt;get&gt;.net&lt;/get&gt;&lt;/print&gt;
   1.184 +&lt;/expr&gt;<!--
   1.185 +   --></programlisting>
   1.186 +    </para>
   1.187 +
   1.188 +    <para>
   1.189 +      Since &lt;get&gt; returns a function definition (just like
   1.190 +      &lt;define&gt;), it is possible to define functions of this type that
   1.191 +      take arguments and even invoke them in a somewhat circuitous manner:
   1.192 +      <programlisting>
   1.193 +&lt;expr&gt;
   1.194 +  &lt;define name=".product" args="a b c d"&gt;
   1.195 +    &lt;add&gt;
   1.196 +      &lt;multiply&gt;
   1.197 +        &lt;a/&gt;
   1.198 +        &lt;b/&gt;
   1.199 +      &lt;/multiply&gt;
   1.200 +      &lt;multiply&gt;
   1.201 +        &lt;c/&gt;
   1.202 +        &lt;d/&gt;
   1.203 +      &lt;/multiply&gt;
   1.204 +    &lt;/add&gt;
   1.205 +  &lt;/define&gt;
   1.206 +
   1.207 +  &lt;expr&gt;
   1.208 +    &lt;define name="closure"/&gt;
   1.209 +    &lt;set name="closure"&gt;
   1.210 +      &lt;get&gt;.product&lt;/get&gt;
   1.211 +    &lt;/set&gt;
   1.212 +    &lt;closure&gt;1 2 3 4&lt;/closure&gt;
   1.213 +  &lt;/expr&gt;
   1.214 +&lt;/expr&gt;<!--
   1.215 +   --></programlisting>
   1.216 +    </para>
   1.217 +
   1.218 +    <simplesect id="idb-get-rationale">
   1.219 +      <title>Rationale</title>
   1.220 +
   1.221 +      <para>
   1.222 +	<ulink url="http://www.w3.org/TR/xexpr/#id-0014">Section 14</ulink>
   1.223 +	tells us that &lt;get&gt;x&lt;/get&gt; and &lt;x/&gt; have the same
   1.224 +	effect in most cases (and thus presumably not all cases) and it
   1.225 +	would seem surprising if &lt;get&gt; were not to insulate a
   1.226 +	function in this manner.
   1.227 +      </para>
   1.228 +    </simplesect>
   1.229 +  </sect1>
   1.230 +
   1.231 +  <sect1 id="idb-arithmetic">
   1.232 +    <title>Arithmetic Operators</title>
   1.233 +
   1.234 +    <para>
   1.235 +      The empty arithmetic operators (&lt;add/&gt;, &lt;subtract/&gt;,
   1.236 +      &lt;multiply/&gt; and &lt;divide/&gt;) all evaluate to &lt;nil/&gt;.
   1.237 +    </para>
   1.238 +
   1.239 +    <para>
   1.240 +      The &lt;add&gt; and &lt;subtract&gt; operators change their first
   1.241 +      argument in some circumstances as in this example from the specification:
   1.242 +      <programlisting>
   1.243 +&lt;while&gt;
   1.244 +  &lt;gt&gt;&lt;x/&gt; 0&lt;/gt&gt;
   1.245 +  &lt;expr&gt;
   1.246 +    &lt;print newline="true">&lt;x/&gt;&lt;print&gt;
   1.247 +    &lt;subtract&gt;&lt;x/&gt; 1&lt;/subtract&gt;
   1.248 +  &lt;/expr&gt;
   1.249 +&lt;/while&gt;<!--
   1.250 +   --></programlisting>
   1.251 +    </para>
   1.252 +
   1.253 +    <para>
   1.254 +      In general, the first agument will be modified if it is a function
   1.255 +      invocation that has no bindings and no arguments. Thus the following
   1.256 +      will print 9:
   1.257 +      <programlisting>
   1.258 +&lt;define name="x"&gt;&lt;multiply&gt;2 3&lt;/multiply&gt;&lt;/define&gt;
   1.259 +&lt;add&gt;&lt;x/&gt;3&lt;/add&gt;
   1.260 +&lt;print&gt;&lt;x/&gt;&lt;/print&gt;<!--
   1.261 +   --></programlisting>
   1.262 +      whereas this will print 6:
   1.263 +      <programlisting>
   1.264 +&lt;define name="x"&gt;&lt;multiply&gt;2 3&lt;/multiply&gt;&lt;/define&gt;
   1.265 +&lt;add&gt;&lt;x unused=""/&gt;3&lt;/add&gt;
   1.266 +&lt;print&gt;&lt;x/&gt;&lt;/print&gt;<!--
   1.267 +   --></programlisting>
   1.268 +    </para>
   1.269 +
   1.270 +    <para>
   1.271 +      Where arguments are modified, this occurs as the arguments are being
   1.272 +      evaluated. Thus this expression:
   1.273 +      <programlisting>
   1.274 +&lt;add&gt;&lt;x/&gt;&lt;x/&gt;&lt;x/&gt;&lt;/add&gt;<!--
   1.275 +   --></programlisting>
   1.276 +      will multiply &lt;x&gt; by 4 rather than by 3.
   1.277 +    </para>
   1.278 +
   1.279 +    <simplesect id="idb-arithmetic-rationale">
   1.280 +      <title>Rationale</title>
   1.281 +
   1.282 +      <para>
   1.283 +	The examples in the specification imply that &lt;add&gt; and
   1.284 +	&lt;subtract&gt; modify their first argument when it is a
   1.285 +	variable. The iterative example in
   1.286 +	<ulink url="http://www.w3.org/TR/xexpr/#id-0008">section 8</ulink>
   1.287 +	wouldn't work if &lt;multiply&gt; worked the same way (and the
   1.288 +	definition of &lt;2pi&gt; in <ulink
   1.289 +	url="http://www.w3.org/TR/xexpr/#id-0007">section 7</ulink> would
   1.290 +	not be expected to modify the definition of &lt;pi&gt; each time
   1.291 +	it is called). Note that this example is erroneous: IDs can't contain
   1.292 +	numbers.
   1.293 +      </para>
   1.294 +
   1.295 +      <para>
   1.296 +	It seems undesirable to modify functions that take arguments.
   1.297 +	Making the decision based on the invocation rather than the
   1.298 +	function definition makes expressions much easier to read.
   1.299 +      </para>
   1.300 +    </simplesect>
   1.301 +  </sect1>
   1.302 +
   1.303 +  <sect1 id="idb-comparison">
   1.304 +    <title>Comparison Functions</title>
   1.305 +
   1.306 +    <para>
   1.307 +      The empty comparison functions (&lt;eq/&gt;, &lt;neq/&gt;, &lt;leq/&gt;,
   1.308 +      &lt;geq/&gt;, &lt;lt/&gt; and &lt;gt/&gt;) and comparison functions with
   1.309 +      exactly one argument all evaluate to &lt;true/&gt;.
   1.310 +    </para>
   1.311 +
   1.312 +    <para>
   1.313 +      The ordered comparison functions (&lt;leq&gt;, &lt;geq&gt;, &lt;lt&gt;
   1.314 +      and &lt;gt&gt;) act as if the equivalent mathematical operator was
   1.315 +      inserted between their arguments. Thus:
   1.316 +      <programlisting>
   1.317 +&lt;lt&gt;
   1.318 +  1 2 3
   1.319 +&lt;/lt&gt;<!--
   1.320 +   --></programlisting>
   1.321 +      is equivalent to the mathematical expression:
   1.322 +      <screen>1 &lt; 2 &lt; 3</screen>
   1.323 +      and:
   1.324 +      <programlisting>
   1.325 +&lt;leq&gt;
   1.326 +  1 2 3
   1.327 +&lt;/leq&gt;<!--
   1.328 +   --></programlisting>
   1.329 +      is equivalent to the mathematical expression:
   1.330 +      <screen>1 ≤ 2 ≤ 3</screen>
   1.331 +    </para>
   1.332 +
   1.333 +    <para>
   1.334 +      When comparing objects of different types:
   1.335 +      <itemizedlist>
   1.336 +	<listitem>
   1.337 +	  Numbers will be implicitly cast between &lt;float&gt; and
   1.338 +	  &lt;integer&gt; where that involves no loss of precision
   1.339 +	</listitem>
   1.340 +	<listitem>
   1.341 +	  Strings will be compared byte-by-byte as UTF8 encoded strings
   1.342 +	</listitem>
   1.343 +	<listitem>
   1.344 +	  Functions will always be completely evaluated
   1.345 +	</listitem>
   1.346 +	<listitem>
   1.347 +	  Invocations of the constant functions are ordered as &lt;false/&gt;
   1.348 +	  &lt; &lt;nil/&gt; &lt; &lt;true/&gt;
   1.349 +	</listitem>
   1.350 +      </itemizedlist>
   1.351 +    </para>
   1.352 +
   1.353 +    <simplesect id="idb-comparison-rationale">
   1.354 +      <title>Rationale</title>
   1.355 +
   1.356 +      <para>
   1.357 +	The empty comparison functions equaluate to &lt;true&gt; by analogy
   1.358 +	with the comparison functions.
   1.359 +      </para>
   1.360 +
   1.361 +      <para>
   1.362 +	The ordering of the comparison functions is confused in the
   1.363 +	specification with the examples for &lt;lt&gt; and &lt;gt&gt; agreeing
   1.364 +	with libxexpr's behaviour and the examples for &lt;leq&gt; and
   1.365 +	&lt;geq&gt; doing the opposite. The choice was arbitary.
   1.366 +      </para>
   1.367 +    </simplesect>
   1.368 +  </sect1>
   1.369 +
   1.370 +  <sect1 id="idb-redefining-builtins">
   1.371 +    <title>Redefining Builtin Functions</title>
   1.372 +
   1.373 +    <para>
   1.374 +      Attempting to redefine a builtin function results in an error.
   1.375 +    </para>
   1.376 +
   1.377 +    <simplesect id="idb-redefining-builtins-rationale">
   1.378 +      <title>Rationale</title>
   1.379 +
   1.380 +      <para>
   1.381 +	While this could be implemented, there is no mention of it in the
   1.382 +	specification and it would complicate the implementation with no
   1.383 +	obvious benefit.
   1.384 +      </para>
   1.385 +    </simplesect>
   1.386 +  </sect1>
   1.387 +
   1.388 +  <sect1 id="idb-namespaces">
   1.389 +    <title>Namespaces</title>
   1.390 +
   1.391 +    <para>
   1.392 +      libxexpr considers an element's namespace to be part of its name
   1.393 +      and thus elements in a namespace other than the XEXPR namespace as
   1.394 +      are always distinct from functions defined by XEXPR. In addition,
   1.395 +      functions defined using &lt;define&gt; are defined in the
   1.396 +      XEXPR namespace.
   1.397 +    </para>
   1.398 +
   1.399 +    <para>
   1.400 +      libxexpr provides hooks for extending the XEXPR language by allowing
   1.401 +      handlers to be installed for other namespaces.
   1.402 +    </para>
   1.403 +
   1.404 +    <para>
   1.405 +      Elements which are in no namespace are treated as if they were in the
   1.406 +      XEXPR namespace.
   1.407 +    </para>
   1.408 +
   1.409 +    <simplesect id="idb-namespaces-rationale">
   1.410 +      <title>Rationale</title>
   1.411 +
   1.412 +      <para>
   1.413 +	Being able to extend the XEXPR language is vital for it to be useful
   1.414 +	and namespaces are the obvious way to do this.
   1.415 +      </para>
   1.416 +    </simplesect>
   1.417 +  </sect1>
   1.418 +</chapter>