Friday, July 27, 2007
Metadata as a Service
Problem statement
Software distribution in the internet age goes away from large piles of disks, CDs or DVD and moves towards online distribution servers providing software from a package repository. The next version of OpenSUSE, 10.3, will be distributed as a 1-CD installation with online access to more packages.
Accessing a specific package means the client needs to know whats available and if a package has dependencies to other packages. This information is kept in a table of contents of the repository, usually referred to as metadata.
First time access to a repository requires download of all metadata by the client. If the repository changes, i.e. packages get version upgrades, large portions of the metadata have to be downloaded again - refreshed.
The EDOS project proposes peer-to-peer networks for distributing repository data.
But how much of this metadata is actually needed ? How much bandwidth is wasted by downloading metadata that gets outdated before first use ?
And technology moves on. Network speeds raise, available bandwidth explodes, internet access is as common as TV and telephone in more and more households. Internet flatrates and always on will be as normal as electrical power coming from the wall socket in a couple of years. At the same time CPUs get more powerful and memory prices are on a constant decrease.
But the client systems can't keep up since customers don't buy a new computer every year. The improvements in computing power, memory, and bandwidth are mostly on the server side.
And this brings me to Metadata as a Service.
Instead of wasting bandwidth for downloading and client computing power for processing the metadata, the repository server can provide a WebService, handling most of the load. Clients only download what they actually need and cache as they feel appropriate.
Client tools for software management are just frontends for the web service. Searching and browsing is handled on the server where load balancing and scaling are well understood and easily handled.
This could even be driven further by doing all the repository management server-side. Clients always talk to the same server which knows the repositories the client wants to access and also tracks software installed on the client. Then upgrade requests can be handled purely by the server, making client profile uploads obsolete. Certainly the way to go for mobile and embedded devices.
Google might offer such a service - knowing all the software installed on a client is certainly valuable data for them.
Just a thought ...
Wednesday, July 18, 2007
Hackweek aftermath
Earlier this year, I already added XML output to yast2-core which came in very handy for this project. Using the REXML stream listener to code the translator was the fun part of a couple of late night hacks.
The result is a complete syntax translator for all YaST client and module code. The generated Ruby code is nicely indented and passes the Ruby syntax checker.
Combined with Duncans Ruby-YCP bindings, translating ycp to Ruby should be quite useful as we try to provide support for more widespread scripting languages.
The translator is available at svn.opensuse.org and requires a recent version of yast2-core, which supports XML output and the '-x' parameter of ycpc.
Then run
ycpc -c -x file.ycp -o file.xmlto convert YCP code to XML.
Now use the xml-ruby translator as
cd yxmlconv ruby src/converter.rb file.xml > file.rb
Translating e.g /usr/share/YaST2/modules/Arch.ycp
{ module "Arch"; // local variables string _architecture = nil; string _board_compatible = nil; string _checkgeneration = ""; boolean _has_pcmcia = nil; boolean _is_laptop = nil; boolean _is_uml = nil; boolean _has_smp = nil; // Xen domain (dom0 or domU) boolean _is_xen = nil; // Xen dom0 boolean _is_xen0 = nil; /* ************************************************************ */ /* system architecture */ /** * General architecture type */ global string architecture () { if (_architecture == nil) _architecture = (string)SCR::Read(.probe.architecture); return _architecture; } ...outputs the following Ruby code
module Arch require 'ycp/SCR' _architecture = nil _board_compatible = nil _checkgeneration = "" _has_pcmcia = nil _is_laptop = nil _is_uml = nil _has_smp = nil _is_xen = nil _is_xen0 = nil def architecture( ) if ( _architecture == nil ) then _architecture = Ycp::Builtin::Read( ".probe.architecture" ) end return _architecture end ...Preserving the comments from the ycp code would be nice -- for next Hackweek.
Btw, it's fairly straightforward to change the translator to output e.g. Python or Java or C# or ...
Tuesday, July 17, 2007
Smolt - Gathering hardware information
They currently have data from approx. 80000 systems, mostly x86, which hopefully will grow in the future. The device and system statistics are quite interesting to browse. Besides hardware, smolt also tracks the system language, kernel version, swap size etc. It also tries to make an educated guess on desktop vs. server vs. laptop - typically a blurred area for Linux systems.
Once they offer an online API for direct access to the smolt server database, this really will be quite useful.
Monday, July 16, 2007
EDOS Project
Michael Schröders hackweek project is based on using well-known mathematical models for describing and solving package dependencies: Satisfiability - SAT
Apparently, some research on this topic was done before. The oldest mentioning of SAT for packaging dependencies I found is a paper from Daniel Burrows dating ca. mid-2005. Daniel is the author of the aptitude package manager and certainly knows the topic of dependency hell inside out.
However, the most interesting link Google revealed, was the one to the EDOS project.
EDOS is short for Environment for the development and Distribution of Open Source software and is funded by the European Commission with 2.2 million euros.
The project aims to study and solve problems associated with the production, management and distribution of open source software packages.
Its four main topics of research are:
- Dependencies With a formal approach to management of software dependencies, it should be possible to manage the complexity of large free and open source package-based software distributions. The project already produced a couple of publications and tools, but I couldn't find links to source code yet.
- Downloading The problem of huge and frequently changing software repositories might be solvable with P2P distribution of code and binaries.
- Quality assurance
All software projects face the dilemma between release often - release early and system quality. One can either
- reduce system quality
- or reduce the number of packages
- or accept long delays before final release of high quality system
- Metrics and Evaluation The decision between old, less features, more stable vs. new, more features, more bugs should be better reasoned by defining parameters to characterize distributions, distribution edition and distribution customization.
Monday, July 02, 2007
openwsman-yast now returns proper datatypes
require 'rwsman' require 'yast' client = WsMan::Client.new( 'http', 'client.internet.org', 8889, '/wsman', 'user', 'password') options = WsMan::ClientOption.new schema = YaST::SCHEMA uri = schema + "/YCP" options.property_add( "ycp", "{ return SCR::Read( .proc.modules ); }" ) result = client.invoke( uri, "eval", options ) modhash = YaST.decode_result( result ) # hash of { modulename => { size=>1234, used=>3 } }Supported are void, bool, integer, float, string, symbol, path, term, list, and map -- should be sufficient for most of YaST. The YaST class is here. You need at least version 1.1.0 of openwsman and openwsman-yast, both available on the openSUSE build service. And, btw, source code for openwsman-yast is now hosted on svn.opensuse.org