docs/REPO.txt
author Richard Hughes <richard@hughsie.com>
Mon Jun 30 09:47:32 2008 +0100 (2008-06-30)
changeset 303 2d450078e46e
permissions -rw-r--r--
trivial: razor_property_iterator_create() can have package NULL and be valid
rhughes@241
     1
The repo file format / razor_set data structure
rhughes@241
     2
-----------------------------------------------
rhughes@241
     3
rhughes@241
     4
The repo starts with a header, containing some number of sections,
rhughes@241
     5
terminated by a section with type 0:
rhughes@241
     6
rhughes@241
     7
	struct razor_set_header {
rhughes@241
     8
		uint32_t magic;
rhughes@241
     9
		uint32_t version;
rhughes@241
    10
		struct razor_set_section sections[0];
rhughes@241
    11
	};
rhughes@241
    12
rhughes@241
    13
	struct razor_set_section {
rhughes@241
    14
		uint32_t type;
rhughes@241
    15
		uint32_t offset;
rhughes@241
    16
		uint32_t size;
rhughes@241
    17
	};
rhughes@241
    18
rhughes@241
    19
razor_set_open() mmaps the repo file, and creates a struct razor_set:
rhughes@241
    20
rhughes@241
    21
	struct razor_set {
rhughes@241
    22
		struct array string_pool;
rhughes@241
    23
	 	struct array packages;
rhughes@241
    24
	 	struct array properties;
rhughes@241
    25
	 	struct array files;
rhughes@241
    26
		struct array package_pool;
rhughes@241
    27
	 	struct array property_pool;
rhughes@241
    28
	 	struct array file_pool;
rhughes@241
    29
		struct razor_set_header *header;
rhughes@241
    30
	};
rhughes@241
    31
rhughes@241
    32
by finding the sections with those IDs and creating "struct array"s
rhughes@241
    33
pointing to the right places in the mmapped data. (This is the only
rhughes@241
    34
processing needed when reading in the file; everything else is used
rhughes@241
    35
exactly as-is.)
rhughes@241
    36
rhughes@241
    37
rhughes@241
    38
The sections
rhughes@241
    39
------------
rhughes@241
    40
rhughes@241
    41
RAZOR_STRING_POOL
rhughes@241
    42
rhughes@241
    43
	Stores one copy of each string that appears in the repo. (At
rhughes@241
    44
	the moment, this is: package names, package versions, property
rhughes@241
    45
	names, property versions, and (basenames of) filenames.) The
rhughes@241
    46
	strings are arbitrarily-sized, 0-terminated, and not in any
rhughes@241
    47
	particular order (although the empty string always ends up
rhughes@241
    48
	being at offset 0).
rhughes@241
    49
rhughes@241
    50
RAZOR_PACKAGES
rhughes@241
    51
rhughes@241
    52
	Array of struct razor_package; one for each package in the
rhughes@241
    53
	set, sorted by name.
rhughes@241
    54
rhughes@241
    55
RAZOR_PROPERTIES
rhughes@241
    56
rhughes@241
    57
	Array of struct razor_property; one for each unique property
rhughes@241
    58
	in the set, sorted by type, then name, then relation type (eg,
rhughes@241
    59
	"<" or ">="), then version. (Properties with no version have
rhughes@241
    60
	relation type RAZOR_VERSION_EQUAL, and version "".)
rhughes@241
    61
rhughes@241
    62
RAZOR_FILES
rhughes@241
    63
rhughes@241
    64
	Array of struct razor_entry; one for each file owned by any
rhughes@241
    65
	package in the set. The current sort order (which is subject
rhughes@241
    66
	to change) is breadth-first, sorted by basename. So eg: /, /bin,
rhughes@241
    67
	/dev, /etc, /bin/false, /bin/true, /dev/null, /etc/passwd.
rhughes@241
    68
rhughes@241
    69
RAZOR_PACKAGE_POOL
rhughes@241
    70
rhughes@241
    71
	Array of struct list, with each list item containing the index
rhughes@241
    72
	of a struct razor_package in the packages section. See the
rhughes@241
    73
	discussion of lists below.
rhughes@241
    74
rhughes@241
    75
RAZOR_PROPERTY_POOL
rhughes@241
    76
rhughes@241
    77
	Array of struct list, with each list item containing the index
rhughes@241
    78
	of a struct razor_property in the properties section. See the
rhughes@241
    79
	discussion of lists below.
rhughes@241
    80
rhughes@241
    81
RAZOR_FILE_POOL
rhughes@241
    82
rhughes@241
    83
	Array of struct list, with each list item containing the index
rhughes@241
    84
	of a struct razor_entry in the files section. See the
rhughes@241
    85
	discussion of lists below.
rhughes@241
    86
rhughes@241
    87
rhughes@241
    88
Data types
rhughes@241
    89
----------
rhughes@241
    90
Note that the exact layout of bits involves some historical accidents.
rhughes@241
    91
(Particularly the fact that the "name" field in most structs loses its
rhughes@241
    92
high bits to a flags field.)
rhughes@241
    93
rhughes@241
    94
struct list_head
rhughes@241
    95
	uint list_ptr : 24;
rhughes@241
    96
	uint flags    : 8;
rhughes@241
    97
rhughes@241
    98
struct list
rhughes@241
    99
	uint data  : 24;
rhughes@241
   100
	uint flags : 8;
rhughes@241
   101
rhughes@241
   102
	Used to store lists of package, property, or file IDs. "struct
rhughes@241
   103
	list_head" stores the head of the list, which points to one or
rhughes@241
   104
	more "struct list"s in the appropriate "pool" section.
rhughes@241
   105
	("struct list" should probably be called "struct list_item".)
rhughes@241
   106
rhughes@241
   107
	"list_first(&head, &pool)" returns a "struct list *" pointing
rhughes@241
   108
	to the first element of the list (or NULL for an empty list),
rhughes@241
   109
	and "list_next(list)" will return successive elements, until
rhughes@241
   110
	NULL is returned. Each "list->data" contains the index of a
rhughes@241
   111
	package, property, or file in the corresponding section of the
rhughes@241
   112
	set.
rhughes@241
   113
rhughes@241
   114
	Peeking underneath the abstraction, a list_head's "flags" is
rhughes@241
   115
	0xff if the list is empty, 0x80 if it contains a single
rhughes@241
   116
	element, or 0x00 if it contains more than one element. In the
rhughes@241
   117
	single-element case, that element is actually stored in the
rhughes@241
   118
	list_head directly rather than being stored in a pool (and so
rhughes@241
   119
	list_first() just casts the list_head* to a list* and returns
rhughes@241
   120
	it). For multi-element lists, list_ptr is the index in the
rhughes@241
   121
	pool of the first element of this list; the list continues
rhughes@241
   122
	through successive elements of the pool until one with
rhughes@241
   123
	non-zero flags is reached, indicating the end of the list.
rhughes@241
   124
rhughes@241
   125
struct razor_package
rhughes@241
   126
	uint name    : 24;
rhughes@241
   127
	uint flags   : 8;
rhughes@241
   128
	uint version : 32;
rhughes@241
   129
	struct list_head properties;
rhughes@241
   130
	struct list_head files;
rhughes@241
   131
rhughes@241
   132
	name and version are indexes into string_pool. properties is a
rhughes@241
   133
	list of all of the package's properties, and files is a list
rhughes@241
   134
	of its files. flags is currently only used during razor_set
rhughes@241
   135
	merging, to keep track of which set a package came from.
rhughes@241
   136
rhughes@241
   137
struct razor_property
rhughes@241
   138
	uint name     : 24;
rhughes@241
   139
	uint flags    : 6;
rhughes@241
   140
	uint type     : 2;
rhughes@241
   141
	uint relation : 32;
rhughes@241
   142
	uint version  : 32;
rhughes@241
   143
	struct list_head packages;
rhughes@241
   144
rhughes@241
   145
	name and version are indexes into string_pool. type is an enum
rhughes@241
   146
	razor_property_type (eg, RAZOR_PROPERTY_REQUIRES), and
rhughes@241
   147
	relation is an enum razor_version_relation (eg,
rhughes@241
   148
	RAZOR_VERSION_GREATER_OR_EQUAL). packages is a list of the
rhughes@241
   149
	packages that have this property. flags is currently unused.
rhughes@241
   150
rhughes@241
   151
struct razor_entry
rhughes@241
   152
	uint name  : 24;
rhughes@241
   153
	uint flags : 8;
rhughes@241
   154
	uint start : 32;
rhughes@241
   155
	struct list_head packages;
rhughes@241
   156
rhughes@241
   157
	name is an index into string_pool, giving the basename of the
rhughes@241
   158
	file. start is either 0, or an index pointing to another
rhughes@241
   159
	razor_entry that is the first child of this entry (for a
rhughes@241
   160
	non-empty directory). (Entry 0 is always the root of the tree,
rhughes@241
   161
	so no entry could have entry 0 as a child.) flags is 0x80
rhughes@241
   162
	(RAZOR_ENTRY_LAST) if an entry is the last entry in its
rhughes@241
   163
	directory. Otherwise it is 0.
rhughes@241
   164
rhughes@241
   165
	Note that given a pointer to a struct_razor_entry (eg, from a
rhughes@241
   166
	package's "files" list), there is no way to reconstruct its
rhughes@241
   167
	full name without walking the entire files array up to that
rhughes@241
   168
	point. Because of this and other problems (fix_file_map()), it
rhughes@241
   169
	seems like razor_entry should be modified to include a pointer
rhughes@241
   170
	to its parent. (Storing full paths instead of just basenames
rhughes@241
   171
	would also fix this problem, but that would use much more
rhughes@241
   172
	memory.)