docs/package-set.xml
changeset 311 cd292c2de0b6
child 312 068a7429db5d
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/docs/package-set.xml	Wed Jul 02 14:37:38 2008 -0400
     1.3 @@ -0,0 +1,239 @@
     1.4 +<?xml version="1.0" encoding="utf-8"?>
     1.5 +<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
     1.6 +
     1.7 +<chapter id="file-format">
     1.8 +  <title>Package Set File Format</title>
     1.9 +  
    1.10 +  <sect2 id="file-header">
    1.11 +    <title>File header</title>
    1.12 +    
    1.13 +    <para>
    1.14 +      The repo starts with a header, containing some number of
    1.15 +      sections, terminated by a section with type 0:
    1.16 +    </para>
    1.17 +
    1.18 +    <programlisting><![CDATA[
    1.19 +struct razor_set_header {
    1.20 +	uint32_t magic;
    1.21 +	uint32_t version;
    1.22 +	struct razor_set_section sections[0];
    1.23 +};
    1.24 +
    1.25 +struct razor_set_section {
    1.26 +	uint32_t type;
    1.27 +	uint32_t offset;
    1.28 +	uint32_t size;
    1.29 +};
    1.30 +]]></programlisting>
    1.31 +
    1.32 +    <para>
    1.33 +      razor_set_open() mmaps the repo file, and creates a struct razor_set:
    1.34 +    </para>
    1.35 +
    1.36 +    <programlisting><![CDATA[
    1.37 +struct razor_set {
    1.38 +	struct array string_pool;
    1.39 + 	struct array packages;
    1.40 + 	struct array properties;
    1.41 + 	struct array files;
    1.42 +	struct array package_pool;
    1.43 + 	struct array property_pool;
    1.44 + 	struct array file_pool;
    1.45 +	struct razor_set_header *header;
    1.46 +};
    1.47 +]]></programlisting>
    1.48 +
    1.49 +    <para>
    1.50 +      by finding the sections with those IDs and creating struct
    1.51 +      array's pointing to the right places in the mmapped data. (This
    1.52 +      is the only processing needed when reading in the file;
    1.53 +      everything else is used exactly as-is.)
    1.54 +    </para>
    1.55 +
    1.56 +  </sect2>
    1.57 +
    1.58 +  <sect2 id="sections">
    1.59 +    <title>The sections</title>
    1.60 +
    1.61 +    <itemizedlist>
    1.62 +      <listitem>
    1.63 +        <para>
    1.64 +          <emphasis>RAZOR_STRING_POOL</emphasis> Stores one copy of
    1.65 +	  each string that appears in the repo. (At the moment, this
    1.66 +	  is: package names, package versions, property names,
    1.67 +	  property versions, and (basenames of) filenames.) The
    1.68 +	  strings are arbitrarily-sized, 0-terminated, and not in any
    1.69 +	  particular order (although the empty string always ends up
    1.70 +	  being at offset 0).
    1.71 +	</para>
    1.72 +      </listitem>
    1.73 +
    1.74 +      <listitem>
    1.75 +        <para>
    1.76 +          <emphasis>RAZOR_PACKAGES</emphasis> Array of struct
    1.77 +	  razor_package; one for each package in the set, sorted by
    1.78 +	  name.
    1.79 +	</para>
    1.80 +      </listitem>
    1.81 +
    1.82 +      <listitem>
    1.83 +        <para>
    1.84 +          <emphasis>RAZOR_PROPERTIES</emphasis> Array of struct
    1.85 +	  razor_property; one for each unique property in the set,
    1.86 +	  sorted by type, then name, then relation type (eg, "&lt;" or
    1.87 +	  "&gt;="), then version. (Properties with no version have
    1.88 +	  relation type RAZOR_VERSION_EQUAL, and version "".)
    1.89 +	</para>
    1.90 +      </listitem>
    1.91 +	    
    1.92 +      <listitem>
    1.93 +        <para>
    1.94 +          <emphasis>RAZOR_FILES</emphasis> Array of struct
    1.95 +	  razor_entry; one for each file owned by any package in the
    1.96 +	  set. The current sort order (which is subject to change)
    1.97 +	  is breadth-first, sorted by basename. So eg: /, /bin,
    1.98 +	  /dev, /etc, /bin/false, /bin/true, /dev/null, /etc/passwd.
    1.99 +	</para>
   1.100 +      </listitem>
   1.101 +
   1.102 +      <listitem>
   1.103 +        <para>
   1.104 +          <emphasis>RAZOR_PACKAGE_POOL</emphasis> Array of struct
   1.105 +	  list, with each list item containing the index of a struct
   1.106 +	  razor_package in the packages section. See the discussion
   1.107 +	  of lists below.
   1.108 +	</para>
   1.109 +      </listitem>
   1.110 +
   1.111 +      <listitem>
   1.112 +        <para>
   1.113 +          <emphasis>RAZOR_PROPERTY_POOL</emphasis> Array of struct
   1.114 +	  list, with each list item containing the index of a struct
   1.115 +	  razor_property in the properties section. See the
   1.116 +	  discussion of lists below.
   1.117 +	</para>
   1.118 +      </listitem>
   1.119 +
   1.120 +      <listitem>
   1.121 +        <para>
   1.122 +          <emphasis>RAZOR_FILE_POOL</emphasis> Array of struct list,
   1.123 +	  with each list item containing the index of a struct
   1.124 +	  razor_entry in the files section. See the discussion of
   1.125 +	  lists below.
   1.126 +	</para>
   1.127 +      </listitem>
   1.128 +    </itemizedlist>
   1.129 +  </sect2>
   1.130 +
   1.131 +  <sect2 id="data-types">
   1.132 +    <title>Data types</title>
   1.133 +
   1.134 +    <para>
   1.135 +      Note that the exact layout of bits involves some historical
   1.136 +      accidents.  (Particularly the fact that the "name" field in most
   1.137 +      structs loses its high bits to a flags field.)
   1.138 +    </para>
   1.139 +
   1.140 +    <programlisting><![CDATA[
   1.141 +struct list_head
   1.142 +	uint list_ptr : 24;
   1.143 +	uint flags    : 8;
   1.144 +
   1.145 +struct list
   1.146 +	uint data  : 24;
   1.147 +	uint flags : 8;
   1.148 +]]></programlisting>
   1.149 +
   1.150 +    <para>
   1.151 +      Used to store lists of package, property, or file IDs. "struct
   1.152 +      list_head" stores the head of the list, which points to one or
   1.153 +      more "struct list"s in the appropriate "pool" section.  ("struct
   1.154 +      list" should probably be called "struct list_item".)
   1.155 +    </para>
   1.156 +
   1.157 +    <para>
   1.158 +      "list_first(&amp;head, &amp;pool)" returns a "struct list *"
   1.159 +      pointing to the first element of the list (or NULL for an empty
   1.160 +      list), and "list_next(list)" will return successive elements,
   1.161 +      until NULL is returned. Each "list->data" contains the index of
   1.162 +      a package, property, or file in the corresponding section of the
   1.163 +      set.
   1.164 +    </para>
   1.165 +
   1.166 +    <para>
   1.167 +      Peeking underneath the abstraction, a list_head's "flags" is
   1.168 +      0xff if the list is empty, 0x80 if it contains a single element,
   1.169 +      or 0x00 if it contains more than one element. In the
   1.170 +      single-element case, that element is actually stored in the
   1.171 +      list_head directly rather than being stored in a pool (and so
   1.172 +      list_first() just casts the list_head* to a list* and returns
   1.173 +      it). For multi-element lists, list_ptr is the index in the pool
   1.174 +      of the first element of this list; the list continues through
   1.175 +      successive elements of the pool until one with non-zero flags is
   1.176 +      reached, indicating the end of the list.
   1.177 +    </para>
   1.178 +
   1.179 +    <programlisting><![CDATA[
   1.180 +struct razor_package
   1.181 +	uint name    : 24;
   1.182 +	uint flags   : 8;
   1.183 +	uint version : 32;
   1.184 +	struct list_head properties;
   1.185 +	struct list_head files;
   1.186 +]]></programlisting>
   1.187 +
   1.188 +    <para>
   1.189 +      name and version are indexes into string_pool. properties is a
   1.190 +      list of all of the package's properties, and files is a list of
   1.191 +      its files. flags is currently only used during razor_set
   1.192 +      merging, to keep track of which set a package came from.
   1.193 +    </para>
   1.194 +
   1.195 +    <programlisting><![CDATA[
   1.196 +struct razor_property
   1.197 +	uint name     : 24;
   1.198 +	uint flags    : 6;
   1.199 +	uint type     : 2;
   1.200 +	uint relation : 32;
   1.201 +	uint version  : 32;
   1.202 +	struct list_head packages;
   1.203 +]]></programlisting>
   1.204 +
   1.205 +    <para>
   1.206 +      name and version are indexes into string_pool. type is an enum
   1.207 +      razor_property_type (eg, RAZOR_PROPERTY_REQUIRES), and relation
   1.208 +      is an enum razor_version_relation (eg,
   1.209 +      RAZOR_VERSION_GREATER_OR_EQUAL). packages is a list of the
   1.210 +      packages that have this property. flags is currently unused.
   1.211 +    </para>
   1.212 +
   1.213 +    <programlisting><![CDATA[
   1.214 +struct razor_entry
   1.215 +	uint name  : 24;
   1.216 +	uint flags : 8;
   1.217 +	uint start : 32;
   1.218 +	struct list_head packages;
   1.219 +]]></programlisting>
   1.220 +
   1.221 +    <para>
   1.222 +      name is an index into string_pool, giving the basename of the
   1.223 +      file. start is either 0, or an index pointing to another
   1.224 +      razor_entry that is the first child of this entry (for a
   1.225 +      non-empty directory). (Entry 0 is always the root of the tree,
   1.226 +      so no entry could have entry 0 as a child.) flags is 0x80
   1.227 +      (RAZOR_ENTRY_LAST) if an entry is the last entry in its
   1.228 +      directory. Otherwise it is 0.
   1.229 +    </para>
   1.230 +
   1.231 +    <para>
   1.232 +      Note that given a pointer to a struct_razor_entry (eg, from a
   1.233 +      package's "files" list), there is no way to reconstruct its full
   1.234 +      name without walking the entire files array up to that
   1.235 +      point. Because of this and other problems (fix_file_map()), it
   1.236 +      seems like razor_entry should be modified to include a pointer
   1.237 +      to its parent. (Storing full paths instead of just basenames
   1.238 +      would also fix this problem, but that would use much more
   1.239 +      memory.)
   1.240 +    </para>
   1.241 +  </sect2>
   1.242 +</chapter>