docs/REPO.txt
author Kristian H?gsberg <krh@redhat.com>
Fri Jun 20 23:13:09 2008 -0400 (2008-06-20)
changeset 257 0c3db660514d
permissions -rw-r--r--
When uniquifying properties, also sort them on the owning package.

This ensures that whenever two packages provide or (or require, obsolete
or conflict) the same property, they appear in the same order in the
propertys list of packages.
rhughes@241
     1
The repo file format / razor_set data structure
rhughes@241
     2
-----------------------------------------------
rhughes@241
     3
rhughes@241
     4
The repo starts with a header, containing some number of sections,
rhughes@241
     5
terminated by a section with type 0:
rhughes@241
     6
rhughes@241
     7
	struct razor_set_header {
rhughes@241
     8
		uint32_t magic;
rhughes@241
     9
		uint32_t version;
rhughes@241
    10
		struct razor_set_section sections[0];
rhughes@241
    11
	};
rhughes@241
    12
rhughes@241
    13
	struct razor_set_section {
rhughes@241
    14
		uint32_t type;
rhughes@241
    15
		uint32_t offset;
rhughes@241
    16
		uint32_t size;
rhughes@241
    17
	};
rhughes@241
    18
rhughes@241
    19
razor_set_open() mmaps the repo file, and creates a struct razor_set:
rhughes@241
    20
rhughes@241
    21
	struct razor_set {
rhughes@241
    22
		struct array string_pool;
rhughes@241
    23
	 	struct array packages;
rhughes@241
    24
	 	struct array properties;
rhughes@241
    25
	 	struct array files;
rhughes@241
    26
		struct array package_pool;
rhughes@241
    27
	 	struct array property_pool;
rhughes@241
    28
	 	struct array file_pool;
rhughes@241
    29
		struct razor_set_header *header;
rhughes@241
    30
	};
rhughes@241
    31
rhughes@241
    32
by finding the sections with those IDs and creating "struct array"s
rhughes@241
    33
pointing to the right places in the mmapped data. (This is the only
rhughes@241
    34
processing needed when reading in the file; everything else is used
rhughes@241
    35
exactly as-is.)
rhughes@241
    36
rhughes@241
    37
rhughes@241
    38
The sections
rhughes@241
    39
------------
rhughes@241
    40
rhughes@241
    41
RAZOR_STRING_POOL
rhughes@241
    42
rhughes@241
    43
	Stores one copy of each string that appears in the repo. (At
rhughes@241
    44
	the moment, this is: package names, package versions, property
rhughes@241
    45
	names, property versions, and (basenames of) filenames.) The
rhughes@241
    46
	strings are arbitrarily-sized, 0-terminated, and not in any
rhughes@241
    47
	particular order (although the empty string always ends up
rhughes@241
    48
	being at offset 0).
rhughes@241
    49
rhughes@241
    50
RAZOR_PACKAGES
rhughes@241
    51
rhughes@241
    52
	Array of struct razor_package; one for each package in the
rhughes@241
    53
	set, sorted by name.
rhughes@241
    54
rhughes@241
    55
RAZOR_PROPERTIES
rhughes@241
    56
rhughes@241
    57
	Array of struct razor_property; one for each unique property
rhughes@241
    58
	in the set, sorted by type, then name, then relation type (eg,
rhughes@241
    59
	"<" or ">="), then version. (Properties with no version have
rhughes@241
    60
	relation type RAZOR_VERSION_EQUAL, and version "".)
rhughes@241
    61
rhughes@241
    62
RAZOR_FILES
rhughes@241
    63
rhughes@241
    64
	Array of struct razor_entry; one for each file owned by any
rhughes@241
    65
	package in the set. The current sort order (which is subject
rhughes@241
    66
	to change) is breadth-first, sorted by basename. So eg: /, /bin,
rhughes@241
    67
	/dev, /etc, /bin/false, /bin/true, /dev/null, /etc/passwd.
rhughes@241
    68
rhughes@241
    69
RAZOR_PACKAGE_POOL
rhughes@241
    70
rhughes@241
    71
	Array of struct list, with each list item containing the index
rhughes@241
    72
	of a struct razor_package in the packages section. See the
rhughes@241
    73
	discussion of lists below.
rhughes@241
    74
rhughes@241
    75
RAZOR_PROPERTY_POOL
rhughes@241
    76
rhughes@241
    77
	Array of struct list, with each list item containing the index
rhughes@241
    78
	of a struct razor_property in the properties section. See the
rhughes@241
    79
	discussion of lists below.
rhughes@241
    80
rhughes@241
    81
RAZOR_FILE_POOL
rhughes@241
    82
rhughes@241
    83
	Array of struct list, with each list item containing the index
rhughes@241
    84
	of a struct razor_entry in the files section. See the
rhughes@241
    85
	discussion of lists below.
rhughes@241
    86
rhughes@241
    87
rhughes@241
    88
Data types
rhughes@241
    89
----------
rhughes@241
    90
Note that the exact layout of bits involves some historical accidents.
rhughes@241
    91
(Particularly the fact that the "name" field in most structs loses its
rhughes@241
    92
high bits to a flags field.)
rhughes@241
    93
rhughes@241
    94
struct list_head
rhughes@241
    95
	uint list_ptr : 24;
rhughes@241
    96
	uint flags    : 8;
rhughes@241
    97
rhughes@241
    98
struct list
rhughes@241
    99
	uint data  : 24;
rhughes@241
   100
	uint flags : 8;
rhughes@241
   101
rhughes@241
   102
	Used to store lists of package, property, or file IDs. "struct
rhughes@241
   103
	list_head" stores the head of the list, which points to one or
rhughes@241
   104
	more "struct list"s in the appropriate "pool" section.
rhughes@241
   105
	("struct list" should probably be called "struct list_item".)
rhughes@241
   106
rhughes@241
   107
	"list_first(&head, &pool)" returns a "struct list *" pointing
rhughes@241
   108
	to the first element of the list (or NULL for an empty list),
rhughes@241
   109
	and "list_next(list)" will return successive elements, until
rhughes@241
   110
	NULL is returned. Each "list->data" contains the index of a
rhughes@241
   111
	package, property, or file in the corresponding section of the
rhughes@241
   112
	set.
rhughes@241
   113
rhughes@241
   114
	Peeking underneath the abstraction, a list_head's "flags" is
rhughes@241
   115
	0xff if the list is empty, 0x80 if it contains a single
rhughes@241
   116
	element, or 0x00 if it contains more than one element. In the
rhughes@241
   117
	single-element case, that element is actually stored in the
rhughes@241
   118
	list_head directly rather than being stored in a pool (and so
rhughes@241
   119
	list_first() just casts the list_head* to a list* and returns
rhughes@241
   120
	it). For multi-element lists, list_ptr is the index in the
rhughes@241
   121
	pool of the first element of this list; the list continues
rhughes@241
   122
	through successive elements of the pool until one with
rhughes@241
   123
	non-zero flags is reached, indicating the end of the list.
rhughes@241
   124
rhughes@241
   125
struct razor_package
rhughes@241
   126
	uint name    : 24;
rhughes@241
   127
	uint flags   : 8;
rhughes@241
   128
	uint version : 32;
rhughes@241
   129
	struct list_head properties;
rhughes@241
   130
	struct list_head files;
rhughes@241
   131
rhughes@241
   132
	name and version are indexes into string_pool. properties is a
rhughes@241
   133
	list of all of the package's properties, and files is a list
rhughes@241
   134
	of its files. flags is currently only used during razor_set
rhughes@241
   135
	merging, to keep track of which set a package came from.
rhughes@241
   136
rhughes@241
   137
struct razor_property
rhughes@241
   138
	uint name     : 24;
rhughes@241
   139
	uint flags    : 6;
rhughes@241
   140
	uint type     : 2;
rhughes@241
   141
	uint relation : 32;
rhughes@241
   142
	uint version  : 32;
rhughes@241
   143
	struct list_head packages;
rhughes@241
   144
rhughes@241
   145
	name and version are indexes into string_pool. type is an enum
rhughes@241
   146
	razor_property_type (eg, RAZOR_PROPERTY_REQUIRES), and
rhughes@241
   147
	relation is an enum razor_version_relation (eg,
rhughes@241
   148
	RAZOR_VERSION_GREATER_OR_EQUAL). packages is a list of the
rhughes@241
   149
	packages that have this property. flags is currently unused.
rhughes@241
   150
rhughes@241
   151
struct razor_entry
rhughes@241
   152
	uint name  : 24;
rhughes@241
   153
	uint flags : 8;
rhughes@241
   154
	uint start : 32;
rhughes@241
   155
	struct list_head packages;
rhughes@241
   156
rhughes@241
   157
	name is an index into string_pool, giving the basename of the
rhughes@241
   158
	file. start is either 0, or an index pointing to another
rhughes@241
   159
	razor_entry that is the first child of this entry (for a
rhughes@241
   160
	non-empty directory). (Entry 0 is always the root of the tree,
rhughes@241
   161
	so no entry could have entry 0 as a child.) flags is 0x80
rhughes@241
   162
	(RAZOR_ENTRY_LAST) if an entry is the last entry in its
rhughes@241
   163
	directory. Otherwise it is 0.
rhughes@241
   164
rhughes@241
   165
	Note that given a pointer to a struct_razor_entry (eg, from a
rhughes@241
   166
	package's "files" list), there is no way to reconstruct its
rhughes@241
   167
	full name without walking the entire files array up to that
rhughes@241
   168
	point. Because of this and other problems (fix_file_map()), it
rhughes@241
   169
	seems like razor_entry should be modified to include a pointer
rhughes@241
   170
	to its parent. (Storing full paths instead of just basenames
rhughes@241
   171
	would also fix this problem, but that would use much more
rhughes@241
   172
	memory.)