TODO
author Kristian H?gsberg <krh@redhat.com>
Fri Sep 07 14:00:19 2007 -0400 (2007-09-07)
changeset 16 78383b7bc4fa
parent 5 4bdfd6031b3d
child 18 b2bf852ca8d1
permissions -rw-r--r--
Use bsearch instead of hash table when looking up packages.
krh@1
     1
- keep history of installed packages/journal of package transaction,
krh@1
     2
  so we can roll back to yesterday, or see what got installed in the
krh@1
     3
  latest yum update.
krh@1
     4
krh@1
     5
- we build a cache of the currently installed set to service
krh@1
     6
  dependency inquiries fast:
krh@1
     7
krh@1
     8
	map from property to pkg (as hash) providing it
krh@1
     9
	map from property to pkgs requiring it
krh@1
    10
	map from pkg name to manifest
krh@1
    11
	map from string to string pool index
krh@1
    12
krh@1
    13
	no implicit provides? not even pkgname?
krh@1
    14
krh@1
    15
- properties are strings, stored in a string table
krh@1
    16
krh@1
    17
- on disk maps are binary files of (string table index, hash) pairs
krh@1
    18
krh@1
    19
- at run time, we mmap the map, and keep changes in memory in a splay
krh@1
    20
  tree or similar.  if searching the splay tree fails we punt to the
krh@1
    21
  mmap.  once the transaction is done, we merge the map and the splay
krh@1
    22
  tree and write it back out.
krh@1
    23
krh@1
    24
- the on-disk string pool is sorted and we keep a list of indices into
krh@1
    25
  the string pool in sorted order so we can bsearch the list with a
krh@1
    26
  string to get its string pool index.  maybe a hash table is better,
krh@1
    27
  less I/O as we will expect to find the string within the block we
krh@1
    28
  look up with the hash function.
krh@1
    29
krh@5
    30
- represent all files as a breadth first traversal of the tree of all
krh@5
    31
  files.  each entry has its name (string pool index), the number of
krh@5
    32
  immediate children, total number of children, and owning package.
krh@5
    33
  for files both these numbers are zero.  a file is identified by its
krh@5
    34
  index in this flattened tree.
krh@5
    35
krh@5
    36
  to get the file name from an index, we search through the list.  by
krh@5
    37
  summing up the number of children, we know when to skip a directory
krh@5
    38
  and when to descend into one.  as we go we accumulate the path
krh@5
    39
  elements.
krh@5
    40
krh@5
    41
  hmm, dropping number of immediate children and using a sentinel drops
krh@5
    42
  a word from every entry.
krh@5
    43
krh@1
    44
- signed pkgs
krh@8
    45
krh@8
    46
- gzip repository of look-aside pkg xml files somehow?
krh@8
    47
krh@8
    48
- transactions, proper recovery, make sure we don't poop our package
krh@8
    49
  database (no more rm /var/lib/rpm/__cache*).
krh@8
    50
krh@8
    51
- no external dependencies, forget about bdb, sqlite.  It's *simple*
krh@8
    52
  and we need to control the on-disk format for these tools.
krh@8
    53
krh@8
    54
- 20740 requires, 2246 unique... hmm.