Friday, July 27, 2007

Metadata as a Service

OpenSUSE bug 276018 got me into thinking about software repositories and data transfer again.

Problem statement

Software distribution in the internet age goes away from large piles of disks, CDs or DVD and moves towards online distribution servers providing software from a package repository. The next version of OpenSUSE, 10.3, will be distributed as a 1-CD installation with online access to more packages.
Accessing a specific package means the client needs to know whats available and if a package has dependencies to other packages. This information is kept in a table of contents of the repository, usually referred to as metadata.
First time access to a repository requires download of all metadata by the client. If the repository changes, i.e. packages get version upgrades, large portions of the metadata have to be downloaded again - refreshed.

The EDOS project proposes peer-to-peer networks for distributing repository data.

But how much of this metadata is actually needed ? How much bandwidth is wasted by downloading metadata that gets outdated before first use ?

And technology moves on. Network speeds raise, available bandwidth explodes, internet access is as common as TV and telephone in more and more households. Internet flatrates and always on will be as normal as electrical power coming from the wall socket in a couple of years. At the same time CPUs get more powerful and memory prices are on a constant decrease.

But the client systems can't keep up since customers don't buy a new computer every year. The improvements in computing power, memory, and bandwidth are mostly on the server side.

And this brings me to Metadata as a Service.

Instead of wasting bandwidth for downloading and client computing power for processing the metadata, the repository server can provide a WebService, handling most of the load. Clients only download what they actually need and cache as they feel appropriate.

Client tools for software management are just frontends for the web service. Searching and browsing is handled on the server where load balancing and scaling are well understood and easily handled.

This could even be driven further by doing all the repository management server-side. Clients always talk to the same server which knows the repositories the client wants to access and also tracks software installed on the client. Then upgrade requests can be handled purely by the server, making client profile uploads obsolete. Certainly the way to go for mobile and embedded devices.
Google might offer such a service - knowing all the software installed on a client is certainly valuable data for them.

Just a thought ...

Wednesday, July 18, 2007

Hackweek aftermath

Novell Hackweek left me with a last itch to scratch -- Cornelius' proposal of a Ycp To Ruby translator.

Earlier this year, I already added XML output to yast2-core which came in very handy for this project. Using the REXML stream listener to code the translator was the fun part of a couple of late night hacks.

The result is a complete syntax translator for all YaST client and module code. The generated Ruby code is nicely indented and passes the Ruby syntax checker.

Combined with Duncans Ruby-YCP bindings, translating ycp to Ruby should be quite useful as we try to provide support for more widespread scripting languages.

The translator is available at svn.opensuse.org and requires a recent version of yast2-core, which supports XML output and the '-x' parameter of ycpc.
Then run
  ycpc -c -x file.ycp -o file.xml

to convert YCP code to XML.
Now use the xml-ruby translator as
  cd yxmlconv
  ruby src/converter.rb file.xml > file.rb


Translating e.g /usr/share/YaST2/modules/Arch.ycp

{
module "Arch";
// local variables
string _architecture = nil;
string _board_compatible = nil;
string _checkgeneration = "";
boolean _has_pcmcia = nil;
boolean _is_laptop = nil;
boolean _is_uml = nil;
boolean _has_smp = nil;
// Xen domain (dom0 or domU)
boolean _is_xen = nil;
// Xen dom0
boolean _is_xen0 = nil;
/* ************************************************************ */
/* system architecture                                          */
/**
 * General architecture type
 */
global string architecture () {
    if (_architecture == nil)
        _architecture = (string)SCR::Read(.probe.architecture);
    return _architecture;
}

...
outputs the following Ruby code
module Arch
  require 'ycp/SCR'
  _architecture = nil
  _board_compatible = nil
  _checkgeneration = ""
  _has_pcmcia = nil
  _is_laptop = nil
  _is_uml = nil
  _has_smp = nil
  _is_xen = nil
  _is_xen0 = nil

  def architecture(  )
    if ( _architecture == nil ) then
      _architecture = Ycp::Builtin::Read( ".probe.architecture" )
    end
    return _architecture
  end
...
Preserving the comments from the ycp code would be nice -- for next Hackweek.
Btw, it's fairly straightforward to change the translator to output e.g. Python or Java or C# or ...

Tuesday, July 17, 2007

Smolt - Gathering hardware information

LWN pointed me to this mail from Fedoraproject inviting other distrubtion to participate in the Smolt project. Smolt is used to gather hardware data from Linux systems and makes it available for browsing.
They currently have data from approx. 80000 systems, mostly x86, which hopefully will grow in the future. The device and system statistics are quite interesting to browse. Besides hardware, smolt also tracks the system language, kernel version, swap size etc. It also tries to make an educated guess on desktop vs. server vs. laptop - typically a blurred area for Linux systems.

Once they offer an online API for direct access to the smolt server database, this really will be quite useful.

Monday, July 16, 2007

EDOS Project

Michael Schröders hackweek project is based on using well-known mathematical models for describing and solving package dependencies: Satisfiability - SAT
Apparently, some research on this topic was done before. The oldest mentioning of SAT for packaging dependencies I found is a paper from Daniel Burrows dating ca. mid-2005. Daniel is the author of the aptitude package manager and certainly knows the topic of dependency hell inside out.

However, the most interesting link Google revealed, was the one to the EDOS project.
EDOS is short for Environment for the development and Distribution of Open Source software and is funded by the European Commission with 2.2 million euros. The project aims to study and solve problems associated with the production, management and distribution of open source software packages.
Its four main topics of research are:

  • Dependencies With a formal approach to management of software dependencies, it should be possible to manage the complexity of large free and open source package-based software distributions. The project already produced a couple of publications and tools, but I couldn't find links to source code yet.
  • Downloading The problem of huge and frequently changing software repositories might be solvable with P2P distribution of code and binaries.
  • Quality assurance All software projects face the dilemma between release often - release early and system quality. One can either
    • reduce system quality
    • or reduce the number of packages
    • or accept long delays before final release of high quality system
    EDOS wants to develop a testing framework and quality assurance portal to make distribution quality better and measurable.
  • Metrics and Evaluation The decision between old, less features, more stable vs. new, more features, more bugs should be better reasoned by defining parameters to characterize distributions, distribution edition and distribution customization.

Interesting stuff for a lot of distributions out there ...

Monday, July 02, 2007

openwsman-yast now returns proper datatypes

After five days of hacking last week, a final itch was left which needed scratching. The YaST openwsman plugin only passed strings back and forth, losing all the type information present in the YCP result value. So I added some code to convert basic YCP types to XML (in the plugin) and from XML to Ruby (on the client side). Now the result of a web service call to YaST can be processed directly in Ruby. Here's a code example showing the contents of /proc/modules on a remote machine.
require 'rwsman'
require 'yast'
client = WsMan::Client.new( 'http', 'client.internet.org', 8889, '/wsman', 'user', 'password')
options = WsMan::ClientOption.new
schema = YaST::SCHEMA
uri = schema + "/YCP"
options.property_add( "ycp", "{ return SCR::Read( .proc.modules ); }" )
result = client.invoke( uri, "eval", options )
modhash = YaST.decode_result( result ) # hash of { modulename => { size=>1234, used=>3 } }
Supported are void, bool, integer, float, string, symbol, path, term, list, and map -- should be sufficient for most of YaST. The YaST class is here. You need at least version 1.1.0 of openwsman and openwsman-yast, both available on the openSUSE build service. And, btw, source code for openwsman-yast is now hosted on svn.opensuse.org