Move DEPSOLVE.txt and REPO.txt into docbook.
A little messy, but it's a first step towards pulling all the docs into
a nice self-contained document.
1.1 --- a/docs/DEPSOLVE.txt Wed Jul 02 18:46:47 2008 +0100
1.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000
1.3 @@ -1,364 +0,0 @@
1.4 -YUM vs RAZOR
1.5 -------------
1.6 -
1.7 -At a very high level, yum's depsolver does something roughly
1.8 -equivalent to:
1.9 -
1.10 - - For each package being installed or removed
1.11 -
1.12 - - For each relevant property (provides, requires, conflicts,
1.13 - obsoletes):
1.14 -
1.15 - - Figure out what additional packages need to be added to
1.16 - or removed from the system to satisfy this property
1.17 -
1.18 -which ends up being roughly O(N^2 * M) where N is the total number of
1.19 -properties and M is the number of packages being acted on.
1.20 -
1.21 -(I just figured that out off the top of my head, and I'm not totally
1.22 -familiar with the yum code, so it may be wrong.)
1.23 -
1.24 -Razor's depsolver is something like:
1.25 -
1.26 - - do {
1.27 -
1.28 - - For each property to be added to or removed from the system:
1.29 -
1.30 - - Figure out what packages need to be added to or removed
1.31 - from the system to satisfy this property
1.32 -
1.33 - - } until we stop adding/remove more packages
1.34 -
1.35 -with the key being that it's very easy to find the PROVIDES
1.36 -corresponding to a REQUIRES and vice versa, because the property
1.37 -arrays are sorted, and so all properties with the same "name" will be
1.38 -adjacent to one another in the array, allowing many dependencies to be
1.39 -satisified in essentially constant time. (Actually... we've been
1.40 -calling it constant, but it's really O(log N) for heavily-depended-on
1.41 -packages, because the more packages you have, the more variations on
1.42 -"requires foo", "requires foo = 1.1", "requires foo > 1.0", etc you're
1.43 -going to have to scan through.)
1.44 -
1.45 -Ideally though, each iteration of the inner loop body happens in
1.46 -constant time, and thus the inner loop as a whole is O(N), and thus
1.47 -the depsolver as a whole is O(N * M) (or at least, less than
1.48 -O(N * M * log N).
1.49 -
1.50 -
1.51 -FILE DEPENDENCIES
1.52 ------------------
1.53 -
1.54 -Whenever we add a package with a file REQUIRES to a razor_set, we also
1.55 -add a PROVIDES for that file to the package or packages which provide
1.56 -that file. This means that if we later add another package that
1.57 -requires the same file (eg, /bin/sh or /usr/bin/perl), we can resolve
1.58 -its file requirement exactly like we would resolve a property
1.59 -requirement, in nearly constant time.
1.60 -
1.61 -When adding a *new* file requirement (ie, a requirement on a file that
1.62 -no existing package depends on), we still have to scan through the
1.63 -file tree, which is O(log N) in the number of files.
1.64 -
1.65 -(AFAICT, there's no reason yum couldn't do the same optimization.
1.66 -Also, AFAICT, yum currently sticks property dependencies and file
1.67 -dependencies into the same hash table, so that if any package in the
1.68 -transaction has a file dependency, it causes *property* dependencies
1.69 -to become slower to resolve as well...)
1.70 -
1.71 -
1.72 -THE RULES
1.73 ----------
1.74 -
1.75 -This is what we have figured out for transaction-solving rules;
1.76 -neither yum nor rpm's algorithm seems to be explained in full
1.77 -anywhere...
1.78 -
1.79 - 1. Every requested install in the initial package set must be
1.80 - satisfied as either a new install or an update:
1.81 -
1.82 - - if the requested package name is the name of an upstream
1.83 - package:
1.84 -
1.85 - - if there is not a corresponding already-installed
1.86 - package, then install the upstream package
1.87 -
1.88 - - else if the upstream package is newer than the
1.89 - already-installed package, then update the package
1.90 -
1.91 - - else it's an error (UP_TO_DATE)
1.92 -
1.93 - - else if the requested package name is the name of an
1.94 - already-installed package:
1.95 -
1.96 - - if there is an upstream package that obsoletes the
1.97 - already-installed package, then behave as though the
1.98 - user had requested that that package be installed
1.99 - instead.
1.100 -
1.101 - - else it's an error (UP_TO_DATE or INSTALL_UNAVAILABLE?)
1.102 -
1.103 - - else it's an error (INSTALL_UNAVAILABLE)
1.104 -
1.105 - 2. Every requested removal in the initial package set must be
1.106 - satisfied as a removal. If any requested package name is not
1.107 - the name of an installed package, it's an error
1.108 - (REMOVE_NOT_INSTALLED)
1.109 -
1.110 - REQUIRES processing:
1.111 -
1.112 - 3. If a package being installed or updated-to REQUIRES a property
1.113 - that is not provided by any installed or to-be-installed
1.114 - package, we need to find an installable package that provides
1.115 - that property. If we find one, install/update it. If not, it's
1.116 - an error (UNSATISFIABLE). (If we find an upstream package
1.117 - providing the property that corresponds to a system package
1.118 - that's being removed, then it's a CONTRADICTION.)
1.119 -
1.120 - 4. If an already-installed package REQUIRES a property which is
1.121 - only provided by a package that is being removed, then that
1.122 - package needs to be removed as well.
1.123 -
1.124 - 5. If an already-installed package REQUIRES a property which is
1.125 - only provided by a package that is being upgraded or obsoleted
1.126 - (to a new package which does not provide that property), then:
1.127 -
1.128 - - if there is an update for the installed package, then update
1.129 - the installed package
1.130 -
1.131 - - else if there is another installable package that provides
1.132 - the required property, then install that.
1.133 -
1.134 - - else it's an error (UNSATISFIABLE?)
1.135 -
1.136 - CONFLICTS processing
1.137 -
1.138 - 6. If a package being installed or updated-to CONFLICTS with a
1.139 - property provided by an installed package:
1.140 -
1.141 - - if there is an update for the installed package, which the
1.142 - new package does not conflict with, then update the
1.143 - installed package.
1.144 -
1.145 - - else it's an error (NEW_CONFLICT)
1.146 -
1.147 - 7. If an already-installed package CONFLICTS with a property
1.148 - provided by a to-be-installed package:
1.149 -
1.150 - - if there is an update for the installed package, which does
1.151 - not conflict with the new package, then update the installed
1.152 - package.
1.153 -
1.154 - - else it's an error (NEW_CONFLICT)
1.155 -
1.156 - 8. If a package being installed or updated-to CONFLICTS with a
1.157 - property provided by a to-be-installed package, then it's an
1.158 - error (CONTRADICTION).
1.159 -
1.160 - OBSOLETES processing. NOTE: OBSOLETES are only matched against
1.161 - package names, not against arbitrary provided properties
1.162 -
1.163 - 9. If a package being installed or updated-to OBSOLETES an
1.164 - installed package, then obsolete that package. (ie, remove it,
1.165 - but treat it as updated for purposes of dangling REQUIRES).
1.166 -
1.167 - 10. If an already-installed package OBSOLETES a to-be-installed
1.168 - package, then it's an error. (ALREADY_OBSOLETE)
1.169 -
1.170 - 11. If a package being installed or updated-to OBSOLETES another
1.171 - package being installed or updated-to, then it's an error
1.172 - (CONTRADICTION).
1.173 -
1.174 -
1.175 -
1.176 -THE DEPSOLVER
1.177 --------------
1.178 -
1.179 -We start with two razor_sets, system and upstream, and a list of
1.180 -requested installations and removals.
1.181 -
1.182 - FIXME: what about multiple upstream repos? Having to deal with
1.183 - arbitrary numbers of razor_sets is possible, but will probably be
1.184 - messy... It might be easier to either store all upstream repo data
1.185 - in a single .rzdb file, or else merge all upstream .rzdb files
1.186 - together into a single razor_set at startup. (Or some combination
1.187 - of those.)
1.188 -
1.189 -We create a bit array of the packages in each set, indicating which
1.190 -ones are installed; the system bitarray starts out all 1s, and the
1.191 -upstream bitarray all 0s. Each bit is only allowed to change state
1.192 -once during the transaction; an installed package can be removed, or
1.193 -an uninstalled package installed, but trying to reinstall a removed
1.194 -package, or uninstall a newly-installed package is an error. This
1.195 -means the packages break down into four categories:
1.196 -
1.197 - - installed (1 bit in the system bit array)
1.198 - - to-be-removed (0 bit in the system bit array)
1.199 - - to-be-installed (1 bit in the upstream bit array)
1.200 - - installable (0 bit in the upstream bit array)
1.201 -
1.202 -
1.203 -Depsolver algorithm:
1.204 -
1.205 - - Create new razor_transaction_packages ("rtp"s) for each
1.206 - requested install or remove. These will be "unresolved", because
1.207 - we haven't yet found the razor_packages that correspond to them.
1.208 -
1.209 - - while there are new rtps:
1.210 -
1.211 - - sort the new rtps
1.212 -
1.213 - - Walk the system property list, upstream property list, and
1.214 - new rtp list in parallel, and:
1.215 -
1.216 - - For each uninstalled PROVIDES:
1.217 -
1.218 - - If the property is a valid package name (that is,
1.219 - either it's a package providing its own name, or it
1.220 - has a matching OBSOLETES), and it matches the name
1.221 - of a new rtp of type INSTALL or FORCED_UPDATE with
1.222 - an unresolved new_package:
1.223 -
1.224 - - If the upstream package has the same version as
1.225 - the system package, we have an UP_TO_DATE error
1.226 - (FIXME: not quite right. This doesn't deal with
1.227 - the case where we try to update an application
1.228 - because of a library update, and it turns out
1.229 - there's no new version of the application, but
1.230 - there IS a compat package containing the old
1.231 - version of the library.)
1.232 -
1.233 - - Otherwise, set the rtp's new_package to point to
1.234 - the package providing this property and set the
1.235 - appropriate bit in the upstream bit array.
1.236 -
1.237 - - For each to-be-installed non-file REQUIRES:
1.238 -
1.239 - - See if there's an installed or to-be-installed
1.240 - package that PROVIDES that property.
1.241 -
1.242 - - If not, see if there's an installable package that
1.243 - PROVIDES that property, and create a new INSTALL rtp
1.244 - for it if so.
1.245 -
1.246 - - If not, see if there's a to-be-removed package that
1.247 - PROVIDES that property. (If we find such a package,
1.248 - we have a CONTRADICTION error.)
1.249 -
1.250 - - If none of the above, then we have an UNSATISFIABLE
1.251 - error
1.252 -
1.253 - - For each to-be-installed file REQUIRES:
1.254 -
1.255 - - (We create fake file PROVIDES to match file REQUIRES
1.256 - when importing/merging razor sets, so if there is
1.257 - already another installed package that REQUIRES this
1.258 - file, there will be a PROVIDES listed for it as well.)
1.259 -
1.260 - - See if there's an installed package that PROVIDES
1.261 - that file.
1.262 -
1.263 - - If not, do a binary search of the system file tree
1.264 - looking to see if some installed package provides
1.265 - that file but does not have a PROVIDES for it.
1.266 -
1.267 - - If not, see if there's an installable package that
1.268 - PROVIDES that property, and create a new INSTALL rtp
1.269 - for it if so.
1.270 -
1.271 - - (If we actually work with multiple upstream
1.272 - razor_sets, then we will need to search the upstream
1.273 - file trees at this point, because it's possible that
1.274 - a package in one upstream repo would require a file
1.275 - in another upstream repo. But if we merge the
1.276 - multiple upstream repos into a single razor_set at
1.277 - some point, then we would not need to do that,
1.278 - because it would be guaranteed that we would have
1.279 - already created a fake PROVIDES if any package
1.280 - provides the file.)
1.281 -
1.282 - - If no installed or installable package provides the
1.283 - file, see if there's a to-be-removed package that
1.284 - provides the file. (If we find such a package, we
1.285 - have a CONTRADICTION error.)
1.286 -
1.287 - - If none of the above, then we have an UNSATISFIABLE
1.288 - error
1.289 -
1.290 - - For each to-be-installed PROVIDES:
1.291 -
1.292 - - Check if the new PROVIDES conflicts with an
1.293 - installed CONFLICTS. If so, create a new
1.294 - FORCED_UPDATE rtp for the installed package, so we
1.295 - can try to upgrade it to a non-conflicting version.
1.296 - (If we can't, we'll have an OLD_CONFLICT error.)
1.297 -
1.298 - - Check if the new PROVIDES conflicts with an
1.299 - installed OBSOLETES *and* the PROVIDES property
1.300 - corresponds to the name of its package. (That is,
1.301 - OBSOLETES are only matched against package names,
1.302 - not arbitrary provided properties.) If so, we have
1.303 - an ALREADY_OBSOLETE error.
1.304 -
1.305 - - Check if the new PROVIDES conflicts with a
1.306 - to-be-installed CONFLICTS. If so, we have a
1.307 - CONTRADICTION error.
1.308 -
1.309 - - For each to-be-installed CONFLICTS:
1.310 -
1.311 - - Basically the reverse of the previous case: check if
1.312 - the new CONFLICTS conflicts with an installed
1.313 - PROVIDES. If so, create a new FORCED_UPDATE rtp for
1.314 - the installed package, so we can try to upgrade it
1.315 - to a non-conflicting version. (If we can't, we'll
1.316 - have an NEW_CONFLICT error.)
1.317 -
1.318 - - Check if the new CONFLICTS conflicts with a
1.319 - to-be-installed PROVIDES. If so, we have a
1.320 - CONTRADICTION error.
1.321 -
1.322 - - For each to-be-installed OBSOLETES:
1.323 -
1.324 - - Check if there's an installed package that PROVIDES
1.325 - that property. If so, create an OBSOLETED rtp for
1.326 - the installed package.
1.327 -
1.328 - - If not, check if there's a to-be-installed package
1.329 - that PROVIDES that property. If so, we have a
1.330 - CONTRADICTION error.
1.331 -
1.332 -
1.333 - - For each installed PROVIDES:
1.334 -
1.335 - - If the property is a valid package name (that is,
1.336 - it's a package providing its own name), and it
1.337 - matches the name of a new rtp with an unresolved
1.338 - old_package, then set the rtp's old_package to point
1.339 - to the package providing this property and clear the
1.340 - appropriate bit in the system bit array.
1.341 -
1.342 - - For each to-be-removed PROVIDES:
1.343 -
1.344 - - If there's also an identical to-be-installed
1.345 - PROVIDES, we're ok and can skip this
1.346 -
1.347 - - Otherwise, for each installed REQUIRES of this
1.348 - property:
1.349 -
1.350 - - Look for some other installed or to-be-installed
1.351 - property that satisfies the REQUIRES.
1.352 -
1.353 - - If there isn't one, then for each installed
1.354 - package in this REQUIRES's package list:
1.355 -
1.356 - - If the PROVIDES was lost because the old
1.357 - package was REMOVEd (not FORCED_UPDATE or
1.358 - OBSOLETED), then create a new REMOVE rtp for
1.359 - this package.
1.360 -
1.361 - - Otherwise, create a new FORCED_UPDATE rtp
1.362 - for this package.
1.363 -
1.364 - - (We don't need to look at to-be-installed REQUIRES
1.365 - of this property, because if there are any, they
1.366 - will cause a CONTRADICTION error when we try to
1.367 - re-satisfy them the next time through.)
2.1 --- a/docs/Makefile.am Wed Jul 02 18:46:47 2008 +0100
2.2 +++ b/docs/Makefile.am Wed Jul 02 14:37:38 2008 -0400
2.3 @@ -30,7 +30,8 @@
2.4 EXTRA_DIST += version.xml.in
2.5
2.6 docsdir = $(datadir)/doc/razor
2.7 -dist_docs_DATA = \
2.8 - DEPSOLVE.txt \
2.9 - REPO.txt
2.10
2.11 +content_files = \
2.12 + package-set.xml \
2.13 + solver.xml
2.14 +
3.1 --- a/docs/REPO.txt Wed Jul 02 18:46:47 2008 +0100
3.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000
3.3 @@ -1,172 +0,0 @@
3.4 -The repo file format / razor_set data structure
3.5 ------------------------------------------------
3.6 -
3.7 -The repo starts with a header, containing some number of sections,
3.8 -terminated by a section with type 0:
3.9 -
3.10 - struct razor_set_header {
3.11 - uint32_t magic;
3.12 - uint32_t version;
3.13 - struct razor_set_section sections[0];
3.14 - };
3.15 -
3.16 - struct razor_set_section {
3.17 - uint32_t type;
3.18 - uint32_t offset;
3.19 - uint32_t size;
3.20 - };
3.21 -
3.22 -razor_set_open() mmaps the repo file, and creates a struct razor_set:
3.23 -
3.24 - struct razor_set {
3.25 - struct array string_pool;
3.26 - struct array packages;
3.27 - struct array properties;
3.28 - struct array files;
3.29 - struct array package_pool;
3.30 - struct array property_pool;
3.31 - struct array file_pool;
3.32 - struct razor_set_header *header;
3.33 - };
3.34 -
3.35 -by finding the sections with those IDs and creating "struct array"s
3.36 -pointing to the right places in the mmapped data. (This is the only
3.37 -processing needed when reading in the file; everything else is used
3.38 -exactly as-is.)
3.39 -
3.40 -
3.41 -The sections
3.42 -------------
3.43 -
3.44 -RAZOR_STRING_POOL
3.45 -
3.46 - Stores one copy of each string that appears in the repo. (At
3.47 - the moment, this is: package names, package versions, property
3.48 - names, property versions, and (basenames of) filenames.) The
3.49 - strings are arbitrarily-sized, 0-terminated, and not in any
3.50 - particular order (although the empty string always ends up
3.51 - being at offset 0).
3.52 -
3.53 -RAZOR_PACKAGES
3.54 -
3.55 - Array of struct razor_package; one for each package in the
3.56 - set, sorted by name.
3.57 -
3.58 -RAZOR_PROPERTIES
3.59 -
3.60 - Array of struct razor_property; one for each unique property
3.61 - in the set, sorted by type, then name, then relation type (eg,
3.62 - "<" or ">="), then version. (Properties with no version have
3.63 - relation type RAZOR_VERSION_EQUAL, and version "".)
3.64 -
3.65 -RAZOR_FILES
3.66 -
3.67 - Array of struct razor_entry; one for each file owned by any
3.68 - package in the set. The current sort order (which is subject
3.69 - to change) is breadth-first, sorted by basename. So eg: /, /bin,
3.70 - /dev, /etc, /bin/false, /bin/true, /dev/null, /etc/passwd.
3.71 -
3.72 -RAZOR_PACKAGE_POOL
3.73 -
3.74 - Array of struct list, with each list item containing the index
3.75 - of a struct razor_package in the packages section. See the
3.76 - discussion of lists below.
3.77 -
3.78 -RAZOR_PROPERTY_POOL
3.79 -
3.80 - Array of struct list, with each list item containing the index
3.81 - of a struct razor_property in the properties section. See the
3.82 - discussion of lists below.
3.83 -
3.84 -RAZOR_FILE_POOL
3.85 -
3.86 - Array of struct list, with each list item containing the index
3.87 - of a struct razor_entry in the files section. See the
3.88 - discussion of lists below.
3.89 -
3.90 -
3.91 -Data types
3.92 -----------
3.93 -Note that the exact layout of bits involves some historical accidents.
3.94 -(Particularly the fact that the "name" field in most structs loses its
3.95 -high bits to a flags field.)
3.96 -
3.97 -struct list_head
3.98 - uint list_ptr : 24;
3.99 - uint flags : 8;
3.100 -
3.101 -struct list
3.102 - uint data : 24;
3.103 - uint flags : 8;
3.104 -
3.105 - Used to store lists of package, property, or file IDs. "struct
3.106 - list_head" stores the head of the list, which points to one or
3.107 - more "struct list"s in the appropriate "pool" section.
3.108 - ("struct list" should probably be called "struct list_item".)
3.109 -
3.110 - "list_first(&head, &pool)" returns a "struct list *" pointing
3.111 - to the first element of the list (or NULL for an empty list),
3.112 - and "list_next(list)" will return successive elements, until
3.113 - NULL is returned. Each "list->data" contains the index of a
3.114 - package, property, or file in the corresponding section of the
3.115 - set.
3.116 -
3.117 - Peeking underneath the abstraction, a list_head's "flags" is
3.118 - 0xff if the list is empty, 0x80 if it contains a single
3.119 - element, or 0x00 if it contains more than one element. In the
3.120 - single-element case, that element is actually stored in the
3.121 - list_head directly rather than being stored in a pool (and so
3.122 - list_first() just casts the list_head* to a list* and returns
3.123 - it). For multi-element lists, list_ptr is the index in the
3.124 - pool of the first element of this list; the list continues
3.125 - through successive elements of the pool until one with
3.126 - non-zero flags is reached, indicating the end of the list.
3.127 -
3.128 -struct razor_package
3.129 - uint name : 24;
3.130 - uint flags : 8;
3.131 - uint version : 32;
3.132 - struct list_head properties;
3.133 - struct list_head files;
3.134 -
3.135 - name and version are indexes into string_pool. properties is a
3.136 - list of all of the package's properties, and files is a list
3.137 - of its files. flags is currently only used during razor_set
3.138 - merging, to keep track of which set a package came from.
3.139 -
3.140 -struct razor_property
3.141 - uint name : 24;
3.142 - uint flags : 6;
3.143 - uint type : 2;
3.144 - uint relation : 32;
3.145 - uint version : 32;
3.146 - struct list_head packages;
3.147 -
3.148 - name and version are indexes into string_pool. type is an enum
3.149 - razor_property_type (eg, RAZOR_PROPERTY_REQUIRES), and
3.150 - relation is an enum razor_version_relation (eg,
3.151 - RAZOR_VERSION_GREATER_OR_EQUAL). packages is a list of the
3.152 - packages that have this property. flags is currently unused.
3.153 -
3.154 -struct razor_entry
3.155 - uint name : 24;
3.156 - uint flags : 8;
3.157 - uint start : 32;
3.158 - struct list_head packages;
3.159 -
3.160 - name is an index into string_pool, giving the basename of the
3.161 - file. start is either 0, or an index pointing to another
3.162 - razor_entry that is the first child of this entry (for a
3.163 - non-empty directory). (Entry 0 is always the root of the tree,
3.164 - so no entry could have entry 0 as a child.) flags is 0x80
3.165 - (RAZOR_ENTRY_LAST) if an entry is the last entry in its
3.166 - directory. Otherwise it is 0.
3.167 -
3.168 - Note that given a pointer to a struct_razor_entry (eg, from a
3.169 - package's "files" list), there is no way to reconstruct its
3.170 - full name without walking the entire files array up to that
3.171 - point. Because of this and other problems (fix_file_map()), it
3.172 - seems like razor_entry should be modified to include a pointer
3.173 - to its parent. (Storing full paths instead of just basenames
3.174 - would also fix this problem, but that would use much more
3.175 - memory.)
4.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000
4.2 +++ b/docs/package-set.xml Wed Jul 02 14:37:38 2008 -0400
4.3 @@ -0,0 +1,239 @@
4.4 +<?xml version="1.0" encoding="utf-8"?>
4.5 +<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
4.6 +
4.7 +<chapter id="file-format">
4.8 + <title>Package Set File Format</title>
4.9 +
4.10 + <sect2 id="file-header">
4.11 + <title>File header</title>
4.12 +
4.13 + <para>
4.14 + The repo starts with a header, containing some number of
4.15 + sections, terminated by a section with type 0:
4.16 + </para>
4.17 +
4.18 + <programlisting><![CDATA[
4.19 +struct razor_set_header {
4.20 + uint32_t magic;
4.21 + uint32_t version;
4.22 + struct razor_set_section sections[0];
4.23 +};
4.24 +
4.25 +struct razor_set_section {
4.26 + uint32_t type;
4.27 + uint32_t offset;
4.28 + uint32_t size;
4.29 +};
4.30 +]]></programlisting>
4.31 +
4.32 + <para>
4.33 + razor_set_open() mmaps the repo file, and creates a struct razor_set:
4.34 + </para>
4.35 +
4.36 + <programlisting><![CDATA[
4.37 +struct razor_set {
4.38 + struct array string_pool;
4.39 + struct array packages;
4.40 + struct array properties;
4.41 + struct array files;
4.42 + struct array package_pool;
4.43 + struct array property_pool;
4.44 + struct array file_pool;
4.45 + struct razor_set_header *header;
4.46 +};
4.47 +]]></programlisting>
4.48 +
4.49 + <para>
4.50 + by finding the sections with those IDs and creating struct
4.51 + array's pointing to the right places in the mmapped data. (This
4.52 + is the only processing needed when reading in the file;
4.53 + everything else is used exactly as-is.)
4.54 + </para>
4.55 +
4.56 + </sect2>
4.57 +
4.58 + <sect2 id="sections">
4.59 + <title>The sections</title>
4.60 +
4.61 + <itemizedlist>
4.62 + <listitem>
4.63 + <para>
4.64 + <emphasis>RAZOR_STRING_POOL</emphasis> Stores one copy of
4.65 + each string that appears in the repo. (At the moment, this
4.66 + is: package names, package versions, property names,
4.67 + property versions, and (basenames of) filenames.) The
4.68 + strings are arbitrarily-sized, 0-terminated, and not in any
4.69 + particular order (although the empty string always ends up
4.70 + being at offset 0).
4.71 + </para>
4.72 + </listitem>
4.73 +
4.74 + <listitem>
4.75 + <para>
4.76 + <emphasis>RAZOR_PACKAGES</emphasis> Array of struct
4.77 + razor_package; one for each package in the set, sorted by
4.78 + name.
4.79 + </para>
4.80 + </listitem>
4.81 +
4.82 + <listitem>
4.83 + <para>
4.84 + <emphasis>RAZOR_PROPERTIES</emphasis> Array of struct
4.85 + razor_property; one for each unique property in the set,
4.86 + sorted by type, then name, then relation type (eg, "<" or
4.87 + ">="), then version. (Properties with no version have
4.88 + relation type RAZOR_VERSION_EQUAL, and version "".)
4.89 + </para>
4.90 + </listitem>
4.91 +
4.92 + <listitem>
4.93 + <para>
4.94 + <emphasis>RAZOR_FILES</emphasis> Array of struct
4.95 + razor_entry; one for each file owned by any package in the
4.96 + set. The current sort order (which is subject to change)
4.97 + is breadth-first, sorted by basename. So eg: /, /bin,
4.98 + /dev, /etc, /bin/false, /bin/true, /dev/null, /etc/passwd.
4.99 + </para>
4.100 + </listitem>
4.101 +
4.102 + <listitem>
4.103 + <para>
4.104 + <emphasis>RAZOR_PACKAGE_POOL</emphasis> Array of struct
4.105 + list, with each list item containing the index of a struct
4.106 + razor_package in the packages section. See the discussion
4.107 + of lists below.
4.108 + </para>
4.109 + </listitem>
4.110 +
4.111 + <listitem>
4.112 + <para>
4.113 + <emphasis>RAZOR_PROPERTY_POOL</emphasis> Array of struct
4.114 + list, with each list item containing the index of a struct
4.115 + razor_property in the properties section. See the
4.116 + discussion of lists below.
4.117 + </para>
4.118 + </listitem>
4.119 +
4.120 + <listitem>
4.121 + <para>
4.122 + <emphasis>RAZOR_FILE_POOL</emphasis> Array of struct list,
4.123 + with each list item containing the index of a struct
4.124 + razor_entry in the files section. See the discussion of
4.125 + lists below.
4.126 + </para>
4.127 + </listitem>
4.128 + </itemizedlist>
4.129 + </sect2>
4.130 +
4.131 + <sect2 id="data-types">
4.132 + <title>Data types</title>
4.133 +
4.134 + <para>
4.135 + Note that the exact layout of bits involves some historical
4.136 + accidents. (Particularly the fact that the "name" field in most
4.137 + structs loses its high bits to a flags field.)
4.138 + </para>
4.139 +
4.140 + <programlisting><![CDATA[
4.141 +struct list_head
4.142 + uint list_ptr : 24;
4.143 + uint flags : 8;
4.144 +
4.145 +struct list
4.146 + uint data : 24;
4.147 + uint flags : 8;
4.148 +]]></programlisting>
4.149 +
4.150 + <para>
4.151 + Used to store lists of package, property, or file IDs. "struct
4.152 + list_head" stores the head of the list, which points to one or
4.153 + more "struct list"s in the appropriate "pool" section. ("struct
4.154 + list" should probably be called "struct list_item".)
4.155 + </para>
4.156 +
4.157 + <para>
4.158 + "list_first(&head, &pool)" returns a "struct list *"
4.159 + pointing to the first element of the list (or NULL for an empty
4.160 + list), and "list_next(list)" will return successive elements,
4.161 + until NULL is returned. Each "list->data" contains the index of
4.162 + a package, property, or file in the corresponding section of the
4.163 + set.
4.164 + </para>
4.165 +
4.166 + <para>
4.167 + Peeking underneath the abstraction, a list_head's "flags" is
4.168 + 0xff if the list is empty, 0x80 if it contains a single element,
4.169 + or 0x00 if it contains more than one element. In the
4.170 + single-element case, that element is actually stored in the
4.171 + list_head directly rather than being stored in a pool (and so
4.172 + list_first() just casts the list_head* to a list* and returns
4.173 + it). For multi-element lists, list_ptr is the index in the pool
4.174 + of the first element of this list; the list continues through
4.175 + successive elements of the pool until one with non-zero flags is
4.176 + reached, indicating the end of the list.
4.177 + </para>
4.178 +
4.179 + <programlisting><![CDATA[
4.180 +struct razor_package
4.181 + uint name : 24;
4.182 + uint flags : 8;
4.183 + uint version : 32;
4.184 + struct list_head properties;
4.185 + struct list_head files;
4.186 +]]></programlisting>
4.187 +
4.188 + <para>
4.189 + name and version are indexes into string_pool. properties is a
4.190 + list of all of the package's properties, and files is a list of
4.191 + its files. flags is currently only used during razor_set
4.192 + merging, to keep track of which set a package came from.
4.193 + </para>
4.194 +
4.195 + <programlisting><![CDATA[
4.196 +struct razor_property
4.197 + uint name : 24;
4.198 + uint flags : 6;
4.199 + uint type : 2;
4.200 + uint relation : 32;
4.201 + uint version : 32;
4.202 + struct list_head packages;
4.203 +]]></programlisting>
4.204 +
4.205 + <para>
4.206 + name and version are indexes into string_pool. type is an enum
4.207 + razor_property_type (eg, RAZOR_PROPERTY_REQUIRES), and relation
4.208 + is an enum razor_version_relation (eg,
4.209 + RAZOR_VERSION_GREATER_OR_EQUAL). packages is a list of the
4.210 + packages that have this property. flags is currently unused.
4.211 + </para>
4.212 +
4.213 + <programlisting><![CDATA[
4.214 +struct razor_entry
4.215 + uint name : 24;
4.216 + uint flags : 8;
4.217 + uint start : 32;
4.218 + struct list_head packages;
4.219 +]]></programlisting>
4.220 +
4.221 + <para>
4.222 + name is an index into string_pool, giving the basename of the
4.223 + file. start is either 0, or an index pointing to another
4.224 + razor_entry that is the first child of this entry (for a
4.225 + non-empty directory). (Entry 0 is always the root of the tree,
4.226 + so no entry could have entry 0 as a child.) flags is 0x80
4.227 + (RAZOR_ENTRY_LAST) if an entry is the last entry in its
4.228 + directory. Otherwise it is 0.
4.229 + </para>
4.230 +
4.231 + <para>
4.232 + Note that given a pointer to a struct_razor_entry (eg, from a
4.233 + package's "files" list), there is no way to reconstruct its full
4.234 + name without walking the entire files array up to that
4.235 + point. Because of this and other problems (fix_file_map()), it
4.236 + seems like razor_entry should be modified to include a pointer
4.237 + to its parent. (Storing full paths instead of just basenames
4.238 + would also fix this problem, but that would use much more
4.239 + memory.)
4.240 + </para>
4.241 + </sect2>
4.242 +</chapter>
5.1 --- a/docs/razor-docs.xml Wed Jul 02 18:46:47 2008 +0100
5.2 +++ b/docs/razor-docs.xml Wed Jul 02 14:37:38 2008 -0400
5.3 @@ -59,6 +59,8 @@
5.4 This part presents the design documentation for razor.
5.5 </para>
5.6 </partintro>
5.7 + <xi:include href="package-set.xml" />
5.8 + <xi:include href="solver.xml" />
5.9 </reference>
5.10
5.11 <reference id="ref-core">
6.1 --- /dev/null Thu Jan 01 00:00:00 1970 +0000
6.2 +++ b/docs/solver.xml Wed Jul 02 14:37:38 2008 -0400
6.3 @@ -0,0 +1,370 @@
6.4 +<?xml version="1.0" encoding="utf-8"?>
6.5 +<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
6.6 +
6.7 +<chapter id="solver">
6.8 + <title>Dependency Solver</title>
6.9 +
6.10 + <para>
6.11 + At a very high level, yum's depsolver does something roughly
6.12 + equivalent to:
6.13 +
6.14 + - For each package being installed or removed
6.15 +
6.16 + - For each relevant property (provides, requires, conflicts,
6.17 + obsoletes):
6.18 +
6.19 + - Figure out what additional packages need to be added to
6.20 + or removed from the system to satisfy this property
6.21 +
6.22 + which ends up being roughly O(N^2 * M) where N is the total number of
6.23 + properties and M is the number of packages being acted on.
6.24 +
6.25 +(I just figured that out off the top of my head, and I'm not totally
6.26 +familiar with the yum code, so it may be wrong.)
6.27 +
6.28 +Razor's depsolver is something like:
6.29 +
6.30 + - do {
6.31 +
6.32 + - For each property to be added to or removed from the system:
6.33 +
6.34 + - Figure out what packages need to be added to or removed
6.35 + from the system to satisfy this property
6.36 +
6.37 + - } until we stop adding/remove more packages
6.38 +
6.39 +with the key being that it's very easy to find the PROVIDES
6.40 +corresponding to a REQUIRES and vice versa, because the property
6.41 +arrays are sorted, and so all properties with the same "name" will be
6.42 +adjacent to one another in the array, allowing many dependencies to be
6.43 +satisified in essentially constant time. (Actually... we've been
6.44 +calling it constant, but it's really O(log N) for heavily-depended-on
6.45 +packages, because the more packages you have, the more variations on
6.46 +"requires foo", "requires foo = 1.1", "requires foo > 1.0", etc you're
6.47 +going to have to scan through.)
6.48 +
6.49 +Ideally though, each iteration of the inner loop body happens in
6.50 +constant time, and thus the inner loop as a whole is O(N), and thus
6.51 +the depsolver as a whole is O(N * M) (or at least, less than
6.52 +O(N * M * log N).
6.53 +
6.54 +
6.55 +FILE DEPENDENCIES
6.56 +-----------------
6.57 +
6.58 +Whenever we add a package with a file REQUIRES to a razor_set, we also
6.59 +add a PROVIDES for that file to the package or packages which provide
6.60 +that file. This means that if we later add another package that
6.61 +requires the same file (eg, /bin/sh or /usr/bin/perl), we can resolve
6.62 +its file requirement exactly like we would resolve a property
6.63 +requirement, in nearly constant time.
6.64 +
6.65 +When adding a *new* file requirement (ie, a requirement on a file that
6.66 +no existing package depends on), we still have to scan through the
6.67 +file tree, which is O(log N) in the number of files.
6.68 +
6.69 +(AFAICT, there's no reason yum couldn't do the same optimization.
6.70 +Also, AFAICT, yum currently sticks property dependencies and file
6.71 +dependencies into the same hash table, so that if any package in the
6.72 +transaction has a file dependency, it causes *property* dependencies
6.73 +to become slower to resolve as well...)
6.74 +
6.75 +
6.76 +THE RULES
6.77 +---------
6.78 +
6.79 +This is what we have figured out for transaction-solving rules;
6.80 +neither yum nor rpm's algorithm seems to be explained in full
6.81 +anywhere...
6.82 +
6.83 + 1. Every requested install in the initial package set must be
6.84 + satisfied as either a new install or an update:
6.85 +
6.86 + - if the requested package name is the name of an upstream
6.87 + package:
6.88 +
6.89 + - if there is not a corresponding already-installed
6.90 + package, then install the upstream package
6.91 +
6.92 + - else if the upstream package is newer than the
6.93 + already-installed package, then update the package
6.94 +
6.95 + - else it's an error (UP_TO_DATE)
6.96 +
6.97 + - else if the requested package name is the name of an
6.98 + already-installed package:
6.99 +
6.100 + - if there is an upstream package that obsoletes the
6.101 + already-installed package, then behave as though the
6.102 + user had requested that that package be installed
6.103 + instead.
6.104 +
6.105 + - else it's an error (UP_TO_DATE or INSTALL_UNAVAILABLE?)
6.106 +
6.107 + - else it's an error (INSTALL_UNAVAILABLE)
6.108 +
6.109 + 2. Every requested removal in the initial package set must be
6.110 + satisfied as a removal. If any requested package name is not
6.111 + the name of an installed package, it's an error
6.112 + (REMOVE_NOT_INSTALLED)
6.113 +
6.114 + REQUIRES processing:
6.115 +
6.116 + 3. If a package being installed or updated-to REQUIRES a property
6.117 + that is not provided by any installed or to-be-installed
6.118 + package, we need to find an installable package that provides
6.119 + that property. If we find one, install/update it. If not, it's
6.120 + an error (UNSATISFIABLE). (If we find an upstream package
6.121 + providing the property that corresponds to a system package
6.122 + that's being removed, then it's a CONTRADICTION.)
6.123 +
6.124 + 4. If an already-installed package REQUIRES a property which is
6.125 + only provided by a package that is being removed, then that
6.126 + package needs to be removed as well.
6.127 +
6.128 + 5. If an already-installed package REQUIRES a property which is
6.129 + only provided by a package that is being upgraded or obsoleted
6.130 + (to a new package which does not provide that property), then:
6.131 +
6.132 + - if there is an update for the installed package, then update
6.133 + the installed package
6.134 +
6.135 + - else if there is another installable package that provides
6.136 + the required property, then install that.
6.137 +
6.138 + - else it's an error (UNSATISFIABLE?)
6.139 +
6.140 + CONFLICTS processing
6.141 +
6.142 + 6. If a package being installed or updated-to CONFLICTS with a
6.143 + property provided by an installed package:
6.144 +
6.145 + - if there is an update for the installed package, which the
6.146 + new package does not conflict with, then update the
6.147 + installed package.
6.148 +
6.149 + - else it's an error (NEW_CONFLICT)
6.150 +
6.151 + 7. If an already-installed package CONFLICTS with a property
6.152 + provided by a to-be-installed package:
6.153 +
6.154 + - if there is an update for the installed package, which does
6.155 + not conflict with the new package, then update the installed
6.156 + package.
6.157 +
6.158 + - else it's an error (NEW_CONFLICT)
6.159 +
6.160 + 8. If a package being installed or updated-to CONFLICTS with a
6.161 + property provided by a to-be-installed package, then it's an
6.162 + error (CONTRADICTION).
6.163 +
6.164 + OBSOLETES processing. NOTE: OBSOLETES are only matched against
6.165 + package names, not against arbitrary provided properties
6.166 +
6.167 + 9. If a package being installed or updated-to OBSOLETES an
6.168 + installed package, then obsolete that package. (ie, remove it,
6.169 + but treat it as updated for purposes of dangling REQUIRES).
6.170 +
6.171 + 10. If an already-installed package OBSOLETES a to-be-installed
6.172 + package, then it's an error. (ALREADY_OBSOLETE)
6.173 +
6.174 + 11. If a package being installed or updated-to OBSOLETES another
6.175 + package being installed or updated-to, then it's an error
6.176 + (CONTRADICTION).
6.177 +
6.178 +
6.179 +
6.180 +THE DEPSOLVER
6.181 +-------------
6.182 +
6.183 +We start with two razor_sets, system and upstream, and a list of
6.184 +requested installations and removals.
6.185 +
6.186 + FIXME: what about multiple upstream repos? Having to deal with
6.187 + arbitrary numbers of razor_sets is possible, but will probably be
6.188 + messy... It might be easier to either store all upstream repo data
6.189 + in a single .rzdb file, or else merge all upstream .rzdb files
6.190 + together into a single razor_set at startup. (Or some combination
6.191 + of those.)
6.192 +
6.193 +We create a bit array of the packages in each set, indicating which
6.194 +ones are installed; the system bitarray starts out all 1s, and the
6.195 +upstream bitarray all 0s. Each bit is only allowed to change state
6.196 +once during the transaction; an installed package can be removed, or
6.197 +an uninstalled package installed, but trying to reinstall a removed
6.198 +package, or uninstall a newly-installed package is an error. This
6.199 +means the packages break down into four categories:
6.200 +
6.201 + - installed (1 bit in the system bit array)
6.202 + - to-be-removed (0 bit in the system bit array)
6.203 + - to-be-installed (1 bit in the upstream bit array)
6.204 + - installable (0 bit in the upstream bit array)
6.205 +
6.206 +
6.207 +Depsolver algorithm:
6.208 +
6.209 + - Create new razor_transaction_packages ("rtp"s) for each
6.210 + requested install or remove. These will be "unresolved", because
6.211 + we haven't yet found the razor_packages that correspond to them.
6.212 +
6.213 + - while there are new rtps:
6.214 +
6.215 + - sort the new rtps
6.216 +
6.217 + - Walk the system property list, upstream property list, and
6.218 + new rtp list in parallel, and:
6.219 +
6.220 + - For each uninstalled PROVIDES:
6.221 +
6.222 + - If the property is a valid package name (that is,
6.223 + either it's a package providing its own name, or it
6.224 + has a matching OBSOLETES), and it matches the name
6.225 + of a new rtp of type INSTALL or FORCED_UPDATE with
6.226 + an unresolved new_package:
6.227 +
6.228 + - If the upstream package has the same version as
6.229 + the system package, we have an UP_TO_DATE error
6.230 + (FIXME: not quite right. This doesn't deal with
6.231 + the case where we try to update an application
6.232 + because of a library update, and it turns out
6.233 + there's no new version of the application, but
6.234 + there IS a compat package containing the old
6.235 + version of the library.)
6.236 +
6.237 + - Otherwise, set the rtp's new_package to point to
6.238 + the package providing this property and set the
6.239 + appropriate bit in the upstream bit array.
6.240 +
6.241 + - For each to-be-installed non-file REQUIRES:
6.242 +
6.243 + - See if there's an installed or to-be-installed
6.244 + package that PROVIDES that property.
6.245 +
6.246 + - If not, see if there's an installable package that
6.247 + PROVIDES that property, and create a new INSTALL rtp
6.248 + for it if so.
6.249 +
6.250 + - If not, see if there's a to-be-removed package that
6.251 + PROVIDES that property. (If we find such a package,
6.252 + we have a CONTRADICTION error.)
6.253 +
6.254 + - If none of the above, then we have an UNSATISFIABLE
6.255 + error
6.256 +
6.257 + - For each to-be-installed file REQUIRES:
6.258 +
6.259 + - (We create fake file PROVIDES to match file REQUIRES
6.260 + when importing/merging razor sets, so if there is
6.261 + already another installed package that REQUIRES this
6.262 + file, there will be a PROVIDES listed for it as well.)
6.263 +
6.264 + - See if there's an installed package that PROVIDES
6.265 + that file.
6.266 +
6.267 + - If not, do a binary search of the system file tree
6.268 + looking to see if some installed package provides
6.269 + that file but does not have a PROVIDES for it.
6.270 +
6.271 + - If not, see if there's an installable package that
6.272 + PROVIDES that property, and create a new INSTALL rtp
6.273 + for it if so.
6.274 +
6.275 + - (If we actually work with multiple upstream
6.276 + razor_sets, then we will need to search the upstream
6.277 + file trees at this point, because it's possible that
6.278 + a package in one upstream repo would require a file
6.279 + in another upstream repo. But if we merge the
6.280 + multiple upstream repos into a single razor_set at
6.281 + some point, then we would not need to do that,
6.282 + because it would be guaranteed that we would have
6.283 + already created a fake PROVIDES if any package
6.284 + provides the file.)
6.285 +
6.286 + - If no installed or installable package provides the
6.287 + file, see if there's a to-be-removed package that
6.288 + provides the file. (If we find such a package, we
6.289 + have a CONTRADICTION error.)
6.290 +
6.291 + - If none of the above, then we have an UNSATISFIABLE
6.292 + error
6.293 +
6.294 + - For each to-be-installed PROVIDES:
6.295 +
6.296 + - Check if the new PROVIDES conflicts with an
6.297 + installed CONFLICTS. If so, create a new
6.298 + FORCED_UPDATE rtp for the installed package, so we
6.299 + can try to upgrade it to a non-conflicting version.
6.300 + (If we can't, we'll have an OLD_CONFLICT error.)
6.301 +
6.302 + - Check if the new PROVIDES conflicts with an
6.303 + installed OBSOLETES *and* the PROVIDES property
6.304 + corresponds to the name of its package. (That is,
6.305 + OBSOLETES are only matched against package names,
6.306 + not arbitrary provided properties.) If so, we have
6.307 + an ALREADY_OBSOLETE error.
6.308 +
6.309 + - Check if the new PROVIDES conflicts with a
6.310 + to-be-installed CONFLICTS. If so, we have a
6.311 + CONTRADICTION error.
6.312 +
6.313 + - For each to-be-installed CONFLICTS:
6.314 +
6.315 + - Basically the reverse of the previous case: check if
6.316 + the new CONFLICTS conflicts with an installed
6.317 + PROVIDES. If so, create a new FORCED_UPDATE rtp for
6.318 + the installed package, so we can try to upgrade it
6.319 + to a non-conflicting version. (If we can't, we'll
6.320 + have an NEW_CONFLICT error.)
6.321 +
6.322 + - Check if the new CONFLICTS conflicts with a
6.323 + to-be-installed PROVIDES. If so, we have a
6.324 + CONTRADICTION error.
6.325 +
6.326 + - For each to-be-installed OBSOLETES:
6.327 +
6.328 + - Check if there's an installed package that PROVIDES
6.329 + that property. If so, create an OBSOLETED rtp for
6.330 + the installed package.
6.331 +
6.332 + - If not, check if there's a to-be-installed package
6.333 + that PROVIDES that property. If so, we have a
6.334 + CONTRADICTION error.
6.335 +
6.336 +
6.337 + - For each installed PROVIDES:
6.338 +
6.339 + - If the property is a valid package name (that is,
6.340 + it's a package providing its own name), and it
6.341 + matches the name of a new rtp with an unresolved
6.342 + old_package, then set the rtp's old_package to point
6.343 + to the package providing this property and clear the
6.344 + appropriate bit in the system bit array.
6.345 +
6.346 + - For each to-be-removed PROVIDES:
6.347 +
6.348 + - If there's also an identical to-be-installed
6.349 + PROVIDES, we're ok and can skip this
6.350 +
6.351 + - Otherwise, for each installed REQUIRES of this
6.352 + property:
6.353 +
6.354 + - Look for some other installed or to-be-installed
6.355 + property that satisfies the REQUIRES.
6.356 +
6.357 + - If there isn't one, then for each installed
6.358 + package in this REQUIRES's package list:
6.359 +
6.360 + - If the PROVIDES was lost because the old
6.361 + package was REMOVEd (not FORCED_UPDATE or
6.362 + OBSOLETED), then create a new REMOVE rtp for
6.363 + this package.
6.364 +
6.365 + - Otherwise, create a new FORCED_UPDATE rtp
6.366 + for this package.
6.367 +
6.368 + - (We don't need to look at to-be-installed REQUIRES
6.369 + of this property, because if there are any, they
6.370 + will cause a CONTRADICTION error when we try to
6.371 + re-satisfy them the next time through.)
6.372 + </para>
6.373 +</chapter>