Package management with tools such as rpm
or apt
is such a staple of life on Linux that we have mostly forgotten what an enormous step forward these tools are. In a nutshell, these tools package software together with metadata that makes certain types of sanity checks possible; most important in this metadata is dependency information that expresses things like ‘package P needs package Q’.
Strangely, very little attention has been paid to the computational aspects of package management and dependency resolution. I’ve recently come across two papers that do just that: Dan Burrows’ paper shows that determining a set of packages to install based on initial conditions is NP-complete. The paper then goes on to describe certain heuristics that reduce the complexity of the problem and the algorithm used in aptitude
for dependency resolution. The paper
by the EDOS Project gives a similar proof of the NP-completeness of package installation, together with an overview of the most popular package management systems and thoughts on improving packaging metadata.
If package installation is NP-complete, how come thousands of people succeed in installing packages every day in small amounts of time ? At its heart, package installation is hard because package dependencies can express alternatives, forcing the resolution algorithm to make choices that might turn out to be bad and causing the algorithm to backtrack. An example of such an alternative are the various RPM packages for mailer daemons that all provide an smtpdaemon
. Packages that require smtpdaemon
force the resolution algorithm to choose one mail daemon like sendmail or postfix. Luckily, packages that provide such alternatives are rare. A second source of alternatives are dependencies that require a certain version of a package or newer versions, written in RPM in the form P >= 2.0
. Package managers get around any choices here by chosing the installed version of P
or, if that is not new enough, the latest available.
It would be interesting to see what the complexity of the actual algorithms implemented by package managers is.
Watzmann.Blog by David Lutterkort is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.
Generated with Jekyll