Difference between revisions of "MirrorBrain"

From Apache OpenOffice Wiki
Jump to: navigation, search
m
m
 
Line 108: Line 108:
 
   * By utilizing Solaris ZFS file system, we can take snapshots of the system instantly.
 
   * By utilizing Solaris ZFS file system, we can take snapshots of the system instantly.
  
==Examples how to use it: ==
 
  
Query example:  
+
== Examples how to use it ==
 +
 
 +
Show all details of a mirror:  
 
   mb show averse
 
   mb show averse
  
Line 138: Line 139:
 
   bzgrep ' /files/' /var/log/apache2/download.services.openoffice.org/2009/09/download.services.openoffice.org-20090930-access_log.bz2|
 
   bzgrep ' /files/' /var/log/apache2/download.services.openoffice.org/2009/09/download.services.openoffice.org-20090930-access_log.bz2|
 
   awk '{print $7}' | grep '\.\(exe\|dmg\|gz\)$' |cut -d/ -f3- | sort | uniq -c | sort -nr|more
 
   awk '{print $7}' | grep '\.\(exe\|dmg\|gz\)$' |cut -d/ -f3- | sort | uniq -c | sort -nr|more
 +
 +
 +
== Example how to add a new mirror ==
 +
 +
Create the new mirror with basic data:
 +
  mb new example.com -H http://www.example.com -F ftp://ftp.example.com  -R rsync://rsync.example.com -a "Admin Name" -e admin@example.com
 +
 +
Add more details and comments:
 +
  mb edit example.com
 +
 +
Scan for new files and finally activate it:
 +
  mb scan -e example.com
  
 
[[Category:Mirror Network]]
 
[[Category:Mirror Network]]

Latest revision as of 10:51, 3 January 2011

MirrorBrain

Host http://doozer.poeml.de or http://openoffice.mirrorbrain.org or http://download.services.openoffice.org
mirrorlist http://mirrordb.opensuse.org/index2.html

MirrorBrain Documentation
MirrorBrain FAQ
OOo MirrorBrain Error-Log
MirrorBrain Source

The tool is called "mb" and has several subcommands like "list", "show", "edit". Use "mb help <cmd>" on any subcommand to get usage info.

Here are some quick examples of things to do on the commandline:

  • mb help
  • mb list -c jp
  • mb list -a
  • mb list -d
  • mb list --disabled
  • mb help list
  • mb help
    • (see "mb help list")
  • mb show ftp5
    • opens data in vim.
  • mb edit ftp5
    • statusBaseurl should not be edited, it is changed by monitoring.
  • shortcuts to editing
    • mb disable ftp5
    • mb enable ftp5
    • mb score ftp5 200
    • mb score ftp5 100
  • scanning
    • mb scan ftp5
    • mb scan -e
  • new mirror
    • mb new (see mb help new)
  • do check for file on server (bug: always shows 200(OK) for FTP servers)
    • mb probefile localized/ja/3.0.0/OOo_3.0.0_MacOSXIntel_install_ja.dmg
  • database lookup for a file:
    • mb file ls localized/ja/3.0.0/OOo_3.0.0_MacOSXIntel_install_ja.dmg


MirrorGuide

Here is another GeoIP basis bouncer for OpenOffice.org mirror network which I have just developed at the level of mock-up.

Main entry
 * http://mirrorguide.tora-japan.com/mirror/
   That is the main entry of this system.
   It also holds all files 'rsync'ed from the reference site.
   All files, however, are in the size of zero.
   Accessing directories shows its structure and possible mirror sites.
   Accessing files will be silently relocated to one of the mirror sites.
Simulating accesses from a given country
 * U.S.A   http://mirrorguide.tora-japan.com/locations/United%20States/
 * Germany http://mirrorguide.tora-japan.com/locations/Germany/
   Clicking 'extended/' easily demonstrates how this system lists mirror
   sites, which is based on results of crawling mirror sites.
 * Japan   http://mirrorguide.tora-japan.com/locations/Japan/
   http://mirrorguide.tora-japan.com/countries/Japan.xml
   These pages demonstrate how the fallback system should work.
   Notes that the preference values have not been implemented yet,
   A simple round robin algorithm is currently used.
Crawling mirror sites
 * File availability: http://mirrorguide.tora-japan.com/crawler/http/
   All files, same as above, are in the size of zero.
 * Log: http://mirrorguide.tora-japan.com/crawler/log/

Highlights of this systems are:

 * No RDBMS, such as MySQL, is needed, easily to relocate the system.
 * UNIX file system based, database lookup are implemented as file system lookup.
 * No actual target files are needed, the system currently consumes disk space of 60MB.
 * Apache tuned, this system just adds a header part to Apache's directory index.
 * Apache's Rewrite rule is well utilized.
 * GeoIP basis, enforced with fallback configuration, bouncer.
 * Written in Perl, no PHP, no Java, no ..., totally less than 1K lines.
 * No daily maintenance would be necessary, every this would be automatically done with 'cron.'
 * Easy to track change of configurations with version control system such as Subversion, CVS.
 * XML basis configuration files, somewhat easy to implement friendly user interface.
 * This bouncer would introduce one of the mirror site certainly holding the file
   which the user looking for by means of referring to the results of crawling sites.
 * Releasing files of OpenOffice.org could be easier than ever. All you need is to
   start a command to synchronize the contents in the main entry of this system with
   that of the reference mirror site. It would take a few seconds to finish by
   just obtaining a list of files form the site with 'rsync' command.
 * And more, ...

The motivation of developing this system comes from:

 * Needs of tracking changes in configurations, who, when, what, and so on. It would be
   hard to do that with RDBMS basis system without incorporating such heavy mechanism.
   In this system I have been using Subversion to track changes of configurations.
 * UNIX file system basis implementation brings us much robuster system.
   A several number of crawlers would do looking for files in the mirror network and
   reflect their results to the file system meanwhile Apache serves user requests
   referring the file system. Operators would update configuration files on demand.
   They all can work simultaneously.
 * By utilizing Solaris ZFS file system, we can take snapshots of the system instantly.


Examples how to use it

Show all details of a mirror:

 mb show averse

List number of files per mirror:

 mb list -H --region --country --number-of-files

List mirrors sorted by priority:

 mb list -c de --prio | sort -k2 -nr

List downloads by language:

echo 'SELECT country, lang, sum(count) FROM stats_counter GROUP BY country, lang ORDER BY country, sum DESC' | mb db shell

or better alternative:

 for country in $( mb list --country | awk '{print $2}' | sort -u ); do mb list --country -c $country | awk '{print $2": "$1}'; 
 echo "SELECT lang, sum(count) FROM stats_counter WHERE country = '$country' AND date > '2010-10-06' AND count > 50 GROUP BY lang ORDER BY sum DESC" | mb db shell; done > redirects_per_country_mirror_and_lang-geo-gt50.txt

Create mirror:

 mb new ftp.byfly.by -H http://ftp.byfly.by/pub/openoffice.org/ -F ftp://ftp.byfly.by/pub/openoffice.org/  -R rsync://ftp.byfly.by/openoffice/

Edit mirror entry:

 mb edit byfly

Scan mirror and activate it:

 mb scan -e byfly

Read access-log:

 bzgrep ' /files/' /var/log/apache2/download.services.openoffice.org/2009/09/download.services.openoffice.org-20090930-access_log.bz2|
 awk '{print $7}' | grep '\.\(exe\|dmg\|gz\)$' |cut -d/ -f3- | sort | uniq -c | sort -nr|more


Example how to add a new mirror

Create the new mirror with basic data:

 mb new example.com -H http://www.example.com -F ftp://ftp.example.com  -R rsync://rsync.example.com -a "Admin Name" -e admin@example.com

Add more details and comments:

 mb edit example.com

Scan for new files and finally activate it:

 mb scan -e example.com
Personal tools