TODO
author Kristian H?gsberg <krh@redhat.com>
Mon Jun 09 12:47:37 2008 -0400 (2008-06-09)
changeset 230 c1e2aed8dd07
parent 213 8f8b782b7a0e
child 299 d4f7f167b8bb
permissions -rw-r--r--
Rewrite depsolver to use a series of passes over all packages.

The big change is that we follow one step of the depedency chain for
each package to resolve in each iteration, and repeat until there are
no more possible moves. In contrast the old depsolver would try to
follow the dependency chain completely for one package at a time.

This new approach is simpler and faster, and at the same time more
roboust. Instead of knowing how one newly installed package may
affect other packages (obsoleting, pulling in new packages etc), the
new algorithm just looks at the total list of requires, provides,
obsoletes and conflicts after installing new packages.
krh@65
     1
Towards replacing rpm + yum (0.1):
krh@65
     2
krh@214
     3
- drop the filelists from the main package set file, split out to a
krh@214
     4
  secondary file.  for package sets that depend on other package sets,
krh@214
     5
  we need to be able to generate properties with owning packages that
krh@214
     6
  are in another set.  this way, a package that requires a file, will
krh@214
     7
  look up the provides in the set and find the package that owns it
krh@214
     8
  and then try mark that for update.
krh@214
     9
krh@82
    10
- installer part:
krh@65
    11
krh@82
    12
   - pre install check; check that dirs can be created (no files where
krh@82
    13
     want to create dirs), move config files according to file
krh@82
    14
     flags. (.rpmnew etc)
krh@65
    15
krh@82
    16
   - store rpm headers for installed packages.
krh@65
    17
krh@214
    18
   - implement rpm uninstall and update.
krh@214
    19
krh@214
    20
   - triggers? just say no?
krh@214
    21
krh@189
    22
- rpm seems to consider glibc > 2.6.90 to mean greater than
krh@189
    23
  2.6.90-anything.  That is, a comparison that doesn't mention the
krh@189
    24
  release field, shouldn't regard the release field of pkgs it
krh@189
    25
  compares against.  glibc-common-2.6.90 has
krh@189
    26
krh@189
    27
	conflicts: glibc < 2.6.90, glibc > 2.6.90
krh@189
    28
krh@189
    29
  since rpm doesn't let you do glibc != 2.6.90, and
krh@189
    30
krh@189
    31
	requires: glibc = 2.6.90
krh@189
    32
krh@189
    33
  will always pull in glibc.  But even with a != relation, would
krh@189
    34
  glibc-2.6.90-16 be equal to 2.6.90?  glibc 2.7.90-8 dropped it in
krh@189
    35
  favor of requires = 2.7.90-8 (#225806).
krh@189
    36
krh@65
    37
- signed packages
krh@65
    38
krh@70
    39
- space calculation before transaction, but ideally, do a number of
krh@70
    40
  smaller transactions.
krh@70
    41
krh@70
    42
- pre-link changing binaries and libs on disk screwing up checksum?
krh@70
    43
krh@82
    44
- pipelined download and install; topo-sort packages in update set,
krh@82
    45
  pick one with all deps in the current set, add it to the current set
krh@82
    46
  and satisfy deps against update set => result: minimal update
krh@82
    47
  transaction.  Queue download and install/update transaction for the
krh@82
    48
  packages in the minimal set, start over.  This also makes the
krh@82
    49
  installation phase much more interruptible, basically just stop
krh@82
    50
  after a sub-transaction finishes.  As we keep the update set around
krh@82
    51
  as a target, we can restart if needed.  Probably don't need to, can
krh@82
    52
  just do a new update.  During a sub-transaction we should keep the
krh@82
    53
  target set (i.e. the current set to be) around as a lock file
krh@82
    54
  (system.repo.lock or so, see git) so that razor updates are
krh@82
    55
  prevented if the systems crashes during an update.
krh@74
    56
krh@214
    57
- implement depsolving between multiple package sets by creating an
krh@214
    58
  iterator that has a sorted list of all installed pkgs from all sets,
krh@214
    59
  all installed requires from all sets, all installed provides from
krh@214
    60
  all sets etc.  could be a list of tuples (pkgs index, set index).
krh@214
    61
  should simplify even the two-set depsolving a bit since we can
krh@214
    62
  pretend there's just one set.  this should also be useful for the
krh@214
    63
  'overlay set' idea where the system set is actually made up of a
krh@214
    64
  number of sets, but typically a read-only set from a read-only fs
krh@214
    65
  and a read-write set from a r/w fs.
krh@214
    66
krh@214
    67
- locking: we use advisory file locking on the system set
krh@214
    68
  (/var/lib/razor/system.repo) to indicate a transaction is in
krh@214
    69
  progress.  The locking algorithm is as follows:
krh@214
    70
krh@214
    71
    1. obtain advisory lock on system set.  if this is already taken,
krh@214
    72
       we know that a process is actively modifying the system set and
krh@214
    73
       we have to wait.  there's a fcntl that lets you block for the
krh@214
    74
       lock to go away.
krh@214
    75
krh@214
    76
    2. if a system-next.repo file already exists an earlier razor
krh@214
    77
       process was interrupted or crashed and we may want to clean
krh@214
    78
       that up.  the system-next.repo file will record what the
krh@214
    79
       previous instance was trying to do and we can just replay that
krh@214
    80
       to clean up.
krh@214
    81
krh@214
    82
    3. create the new package set whichever way and write it to
krh@214
    83
       system-next.repo, then start installing/removing rpms.
krh@214
    84
krh@214
    85
    4. When the update is complete, rename system-next.repo to
krh@214
    86
       system.repo and remove the advisory lock.
krh@214
    87
krh@214
    88
  we should probably introduce a new object that encapsulates this
krh@214
    89
  sequence, the filename conventions, rpm cache, e.g. struct
krh@214
    90
  razor_image, with operations such as
krh@214
    91
krh@214
    92
	#define RAZOR_IMAGE_READ	0x01
krh@214
    93
	#define RAZOR_IMAGE_WRITE	0x02
krh@214
    94
krh@214
    95
	struct razor_image *
krh@214
    96
	razor_image_open(const char *root, unsigned int flags);
krh@214
    97
krh@214
    98
	int
krh@214
    99
	razor_image_begin_transaction(struct razor_image *image,
krh@214
   100
				      struct razor_set *target);
krh@214
   101
krh@214
   102
	int
krh@214
   103
	razor_image_finish_transaction(struct razor_image *image);
krh@214
   104
krh@214
   105
  the transaction pipelineing described above sits on top of this,
krh@214
   106
  since each step there needs to complete a full transaction that
krh@214
   107
  writes out a new package set.
krh@214
   108
krh@214
   109
  for overlay package sets we could do something like
krh@214
   110
krh@214
   111
	struct razor_image *
krh@214
   112
	razor_image_open_with_base(const char *root, unsigned int flags,
krh@214
   113
				   struct razor_image *base);
krh@214
   114
krh@214
   115
  where base specifies the r/o package set it's layered on.  this
krh@214
   116
  allows for stacking several layers of images.
krh@214
   117
krh@213
   118
krh@213
   119
Package set file format items:
krh@213
   120
krh@213
   121
- drop the 4k section alignment
krh@213
   122
krh@213
   123
- just use strings for header identifiers, make the string pool
krh@213
   124
  section have a fixed string (maybe make "strings" always the first
krh@213
   125
  string so its index is 0), or maybe just require that it's the first
krh@213
   126
  section in the file.
krh@213
   127
krh@213
   128
- nail down byte-order of repo file.
krh@213
   129
krh@213
   130
- version the sections in the file, put the element size in the header
krh@213
   131
  so we can add stuff to elements in a backwards compatible way.
krh@213
   132
  maybe not necessary, we can just add sections that augment the
krh@213
   133
  sections we want to add to (similar to how rpm has add versioned
krh@213
   134
  deps).
krh@213
   135
krh@213
   136
krh@65
   137
Misc ideas:
krh@65
   138
krh@1
   139
- keep history of installed packages/journal of package transaction,
krh@1
   140
  so we can roll back to yesterday, or see what got installed in the
krh@1
   141
  latest yum update.
krh@1
   142
krh@8
   143
- transactions, proper recovery, make sure we don't poop our package
krh@8
   144
  database (no more rm /var/lib/rpm/__cache*).
krh@8
   145
krh@18
   146
- use hash table for package and property lists so we only store
krh@18
   147
  unique lists (like for string pool).
krh@40
   148
krh@40
   149
- use existing, running system as repo; eg
krh@40
   150
krh@40
   151
	razor update razor://other-box.local evince
krh@40
   152
krh@40
   153
  to pull eg the latest evince and dependencies from another box.  We
krh@40
   154
  should be able to regenerate a rzr pkg from the system so we can
krh@40
   155
  reuse the signature from the originating repo.
krh@41
   156
krh@41
   157
- Ok, maybe the fastest package set merge method in the end is to use
krh@41
   158
  the razor_importer, but use a hash table for the properties.  This
krh@41
   159
  way we can assign them unique IDs immediately (like tokenizing
krh@41
   160
  strings).
krh@47
   161
krh@47
   162
- test suite should be easy, just keep .repo files around and test
krh@47
   163
  different type of upgrades that way (obsoletes, conflicts, file
krh@47
   164
  conflicts, file/dir problems etc).  Or maybe just keep a simple file
krh@47
   165
  format ad use a custom importer to create the .repo files.
krh@47
   166
krh@56
   167
- try to clean up the
krh@56
   168
krh@56
   169
	do { ... } while (((e++)->name & RAZOR_ENTRY_LAST) == 0);
krh@56
   170
krh@56
   171
  idiom for iteration of directories.
krh@63
   172
krh@63
   173
- overlay package sets?  mount a read-only /usr over nfs or from the
krh@63
   174
  virt-host and have a local package set overlaid over the read-only
krh@63
   175
  one.  shouldn't need new features in the core package set data
krh@63
   176
  structure, but should be just conventions on top.  we have the base
krh@63
   177
  package set from the r/o system, the overlay set from the local
krh@63
   178
  system and we can have an effective package set which is the merge
krh@63
   179
  of everything from the overlay into the base set.  the effective set
krh@63
   180
  is easy to compute and we could do it on the fly or cache it.  or
krh@63
   181
  maybe the effective set is the on-disk representation and the
krh@63
   182
  overlay can be computed when needed, we just keep a link back to the
krh@63
   183
  base.
krh@71
   184
krh@71
   185
- incremental rawhide repo updates? instead of downloading 10MB zipped
krh@82
   186
  repo every time, download a diff repo?  Should be pretty small,
krh@82
   187
  especially if we don't have file checksums in metadata.  Filenames
krh@82
   188
  and properties are for the most part already present, typically just
krh@82
   189
  a version bump plus maybe tweaking a couple requires.  The upstream
krh@82
   190
  repo can store multiple incremental updates in one big file and
krh@82
   191
  provide an index file that maps updates for a given date (we should
krh@82
   192
  use repo-file checksums though) to a range in the file: Download the
krh@82
   193
  index file, search for a match for your latest rawhide.repo file,
krh@82
   194
  download range of updates that brings it up to date.
krh@74
   195
krh@74
   196
- use hash tables for dirs when importing files to avoid qsorting all
krh@74
   197
  files in rawhide.
krh@75
   198
krh@82
   199
Bugs:
krh@82
   200
krh@82
   201
- eliminate duplicate entries in package property lists.