Difference between revisions of "Rough ideas for the OpenOffice.org Open Mirror Network System version 3.0"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Revised Concepts and Strength and some expressions)
(Undo revision 154027 by Tora (Talk))
Line 1: Line 1:
== Status of this document ==
 
* Rough ideas for brainstorming
 
 
== Concept ==
 
{| border=1 cellpadding=3
 
! Concept
 
! Descriptions
 
|-
 
| Distributed computer network
 
|
 
* One subsystem per one server basis deployment
 
* Each subsystem concentrates on solely its own duty
 
* Deploying servers as a virtual and/or real machine
 
|-
 
| Robustness
 
|
 
* Hard to lose data such as configurations, histories, and/or logs
 
* Data is kept in several subsystems as a replica, not kept in a single centralized server machine
 
* No need to keep irrelevant, uninteresting data in any subsystem
 
|-
 
| Simple, open API
 
|
 
* Platform, programming language, and application neutral API
 
* HTTP GET to retrieve remote data as a plain text, XML, and/or ZIP-archived file
 
* Anyone can freely develop their own subsystems using data files via the API
 
|-
 
| Maintainability
 
|
 
* Easy to upgrade each subsystem separately
 
* Easy to develop and test with experimental and staging servers while production servers serve users
 
|-
 
| Scalability & High Availability
 
|
 
* Easy to add servers like a small device and to replace them if broken
 
* DNS load balancing and automatic fail over
 
|-
 
| Surveillance & Alert
 
|
 
* Surveying malicious files and monitoring the status of servers
 
* Emergency stop in case of incident
 
|}
 
 
== Network Diagram ==
 
[[Image:open-mirror-network-system-draft-2009-12-17-2100.png]]
 
 
== Subsystem ==
 
The OpenOffice.org Open Mirror Network System consists of several subsystems. Each subsystem is loosely connected from one subsystem to another subsystem using the platform, programming language, and application independent API.
 
 
{| border=1 cellpadding=3
 
!Subsystem
 
!Descriptions
 
|-
 
| Mirror Server
 
|
 
* A mirror server is a server run by several types of entities such as universities, organizations, companies or individuals, which delivers files to users via HTTP, FTP, and/or RSYNC.
 
* Mirror servers are spread over the world and can be classified into three types: Master server, Tier-1 server, and Regular server.
 
* The master server is the origination from which files come. A Tier-1 server retrieves files from the Master server; A Regular server retrieve files from the Tire-1 server.
 
* The release engineering team uploads files to the master server.
 
|-
 
| OpenOffice.org Web
 
|
 
# A user visits the web site, download.openoffice.org.
 
# A JavaScript in the web page accesses to the Download Concierge Servers to get several types of small lists of deliverable and shows them to the user using AJAX.
 
# The user chooses a platform, product, version, language, and other items from the lists. The JavaScript helps the user find required files saying e.g. "You will need to install one of these language installers first and then install this language pack for your language." The user clicks on the shown links to start downloading file(s).
 
# One of the load-balanced Redirector Servers takes a user's download request to a closer Mirror Server.
 
# The Mirror Server sends back the requested file to the user.
 
|-
 
| Download Concierge Server
 
|
 
* A Download Concierge Server is a web server specially tuned for AJAX.
 
* This server fetches scan results from the Scanner Server, internally prepares several lists of items by categories such as platform, product, version, language, and other items, and responds with a small XML file upon a request from JavaScript embedded in a web page.
 
|-
 
| Redirector Server
 
|
 
* A Redirector Server fetches scan results from the Scanner Server and hits back a user's request to a closer server in which the requested file is available and puts log files on its web server.
 
|-
 
| Scanner Server
 
|
 
* A Scanner Server fetches repository data such as a list of URLs from the Repository, scans all mirror servers based on the URLs, and puts scan results and log files on its web server.
 
|-
 
| Repository
 
|
 
* The Repository is not a server but a set of data files such as a list of mirror sites, which are stored in distribution.openoffice.org and maintained by the staffs.
 
* GUI tools might be prepared for staffs to easily maintain the repository data.
 
|-
 
| Logging Server
 
|
 
* A Logging Server collects log files from several servers, concatenate some log files - simultaneously produced by load-balanced servers - into a single log file, and puts those log files on its web server.
 
|-
 
| Statistics Calculation Server
 
|
 
* A Statistics Calculation Server fetches log files from a Logging Server, accumulates logs by category such as location of user, choice of language, version of OpenOffice.org, calculates statistics, and provides users with the results through its web server.
 
* Any type of existing freely available software products could be used for this purpose.
 
* Or even, anyone can develop own cool program running on his server or PC.
 
|-
 
| Surveillance Servers & User's PC
 
|
 
* A Surveillance Server fetches repository data from the Repository and scan results from the Scanner Server and confirm the integrity of files kept in each mirror server by periodically and randomly scanning all mirror servers.
 
* This server also continuously checks the status of several other servers such as Redirector Server and Scanner Server.
 
* Likewise, User's PC also does the similar tasks.
 
* A Surveillance Server will inform Redirector Servers and alert Staffs and Administrators in case of incident.
 
* A PC user would inform suspicious phenomenon on the mailing list.
 
* Staffs can directly stop the Redirector Servers.
 
|}
 
<p></p>
 
 
== API ==
 
Each subsystem communicates by obtaining remote files via HTTP GET. There would be basically three types of file:
 
{| border=1 cellpadding=3
 
! Type
 
! Example
 
|-
 
| A plain text file (.txt)
 
| Scan results (URL, path, type of entry (d: directory, f: file, l: symbolic link)
 
<pre>
 
http://mirror.aarnet.edu.au/pub/openoffice /stable/3.1.1/ d
 
http://mirror.aarnet.edu.au/pub/openoffice /stable/3.1.1/OOo_3.1.1_Win32Intel_install_en-US.exe f
 
http://mirror.aarnet.edu.au/pub/openoffice /stable/3.1.1/OOo_3.1.1_Win32Intel_install_wJRE_en-US.exe f
 
...
 
http://openoffice.mirror.aussiehq.net.au /stable/3.1.1/ d
 
http://openoffice.mirror.aussiehq.net.au /stable/3.1.1/OOo_3.1.1_Win32Intel_install_en-US.exe f
 
http://openoffice.mirror.aussiehq.net.au /stable/3.1.1/OOo_3.1.1_Win32Intel_install_wJRE_en-US.exe f
 
...
 
</pre>
 
|-
 
| XML file (.xml)
 
| A list of mirror server
 
<pre>
 
  &lt;site location="Australia" code="au" type="regular" name="AussieHQ"&gt;
 
    &lt;uri set="extended" scheme="ftp"&gt;ftp://openoffice.mirror.aussiehq.net.au/pub/openoffice/&lt;/uri&gt;
 
    &lt;uri set="extended" scheme="http"&gt;http://openoffice.mirror.aussiehq.net.au/&lt;/uri&gt;
 
    &lt;uri set="extended" scheme="rsync"&gt;rsync://openoffice.mirror.aussiehq.net.au/openoffice-extended/&lt;/uri&gt;
 
    &lt;uri set="main" scheme="rsync"&gt;rsync://openoffice.mirror.aussiehq.net.au/openoffice/&lt;/uri&gt;
 
    &lt;contact name="Foo Bar"&gt;
 
      &lt;email&gt;foo.bar@xxx.yyy&lt;/email&gt;
 
    &lt;/contact&gt;
 
  &lt;/site&gt;
 
</pre>
 
|-
 
| ZIP-archived file (.zip)
 
|
 
* A set of a list of mirror server (.xml) and scan results (.txt)
 
* A set of Apache access log file (.txt) and error log file (.txt)
 
|}
 
<p></p>
 
 
== Strength ==
 
{| border=1 cellpadding=3
 
!Strengh
 
!Descriptions
 
|-
 
| Maintainability
 
|
 
* Since each subsystem is loosely connected, we can stop, reboot, upgrade, and replace one server without interrupting other server's service.
 
* Since each subsystem is encapsulated, we can easily upgrade any subsystem without depending on other subsystem's implementation.
 
* We can develop a new version of software on an experimental server and test it on a staging server without interfering with the production server which is serving real users.
 
|-
 
| Scalability and high availability
 
|
 
* We can instantly deploy many instances of subsystem, such as a redirector server, in several different locations.
 
* Each instance of one subsystem can be given an individual IP address and these IP addresses can be accessed with a single FQDN - DNS load balancing.
 
* Since the API is based on HTTP, as its nature, a fail over to another IP address could be automatically done in a condition of multiply assigned IP address for a single FQDN.
 
* Each server can be a virtual machine in an existing server machine or can be a real machine depending on a resource strategy.
 
|-
 
| Openness
 
|
 
* We can easily deploy existing software products such as server monitoring tools and download statistics tools because of application neutral API
 
* Anyone can freely get and reuse any data provided by the repository and each subsystem.
 
* Any subsystem can be developed by any developers who are interested and experienced in the field. Web developers could develop a front-end for users. Programmers who are good at mathematics could develop a statistics calculation server. Operators who know network administration could develop a surveillance server.
 
|}
 
<p></p>
 
 
== Weakness ==
 
{| border=1 cellpadding=3
 
! Weakness
 
! Descriptions
 
|-
 
| Complexity
 
|
 
* Each subsystem is simple, but the entire system is complex.
 
* No one can know deeply the entire system from end to end because each subsystem is encapsulated and its detailed implementation is left to its developer or development team.
 
|}
 
<p></p>
 
 
== Generation ==
 
 
{| border=1 cellpadding=3
 
!Generation
 
!System
 
!Descriptions
 
|-
 
| 1
 
| Bouncer
 
|
 
* Apache, PHP, and MySQL based application
 
* "Bouncer is a database driven mirror management app..."
 
* http://wiki.osuosl.org/display/Bouncer/Home
 
|-
 
| 2
 
| MirrorBrain
 
|
 
* Apache, Python, Perl and PostgreSQL based application
 
* "MirrorBrain is a Download Redirector and Metalink Generator"
 
* http://mirrorbrain.org/
 
|- style="background-color: #8080ff;"
 
| 3
 
| Open Mirror Network System version 3
 
|
 
* Distributed computer network system
 
|-
 
|4
 
| Open Mirror Network System version 4
 
|
 
* In addition to version 3, (just some ideas)
 
** providing communication features between redirectors and each mirror server
 
** logging for preciser download statistics calculation
 
** taking timezone into account
 
|}
 
<p></p>
 
 
== The Next Generation ==
 
In the version 4, redirectors and mirror servers would communicate each other over SOAP or other protocol to make situation better.
 
* The availability of content
 
** Pooling
 
*** Redirector: "What files do you currently have?"
 
*** Mirror server: "I have this, this, ... and this file."
 
** Reporting
 
*** Mirror server: "I have gotten this file and deleted that file."
 
*** Redirector: "Thanks."
 
 
* The load of server
 
** Pooling
 
*** Redirector: "How is it going?"
 
*** Mirror server: "Well, I have been busy, could you throttle the amount of download requests towards me?"
 
** Reporting
 
*** Mirror server: "I have currently room for more download."
 
*** Redirector: "OK, I am increasing the amount of download requests for you."
 
 
To calculate preciser download statistics, installing a small program in every or most mirror servers and the program prepare download logs. Logging Servers gather download logs from each mirror server and calculate download statistics.
 
 
In addition to the communication, timezone would be also taken into account. E.g during a day time in Europa, some traffic from the inside of Europa could be routed to America where it is early morning.
 
 
== Developer Assignment ==
 
{| border=1 cellpadding=3
 
!Subsystem
 
!Developer
 
|-
 
| Mirror Server
 
| - (no need to be developed)
 
|-
 
| OpenOffice.org Web
 
| (website project, release engineer)
 
|-
 
| Download Concierge Server
 
| (website project, release engineer, mirror project)
 
|-
 
| Redirector Server
 
| (Persons who know redirector)
 
|-
 
| Scanner Server
 
| (Probably Tora)
 
|-
 
| Repository
 
| - (no need to be developed)
 
|-
 
| Logging Server
 
| (Probably Tora)
 
|-
 
| Statistics Calculation Server
 
| (Persons who know statistics, marketing project, or simply use existing software)
 
|-
 
| Surveillance Servers & User's PC
 
| (Persons who are interested in this area)
 
|}
 
<p></p>
 
 
== Schedule ==
 
(To be discussed.)
 
 
[[Category:Mirror Network]]
 
== Status of this document ==
 
* Rough ideas for brainstorming
 
 
== Concept ==
 
{| border=1 cellpadding=3
 
! Concept
 
! Descriptions
 
|-
 
| Distributed computer network
 
|
 
* One subsystem per one server basis deployment
 
* Each subsystem simply acts for own duty
 
* Deploying servers as a virtual and/or real machine
 
|-
 
| Simple, open API
 
|
 
* Platform, programming language, and application independent API
 
* Using HTTP GET to get a plain text file, XML file, or ZIP-archive file
 
* Anyone can freely develop own cool subsystems using data files
 
|-
 
| Maintainability
 
|
 
* Easy to upgrade each subsystem individually
 
* Easy to develop and test with experimental, staging, and production servers
 
|-
 
| Scalability & High Availability
 
|
 
* Easy to add servers
 
* DNS load balancing and automatic fail over
 
|-
 
| Surveillance & Alert
 
|
 
* Surveying malicious files and monitoring the status of servers
 
* Emergency stop in case of incident
 
|}
 
 
== Network Diagram ==
 
[[Image:open-mirror-network-system-draft-2009-12-17-2100.png]]
 
 
== Subsystem ==
 
The OpenOffice.org Open Mirror Network System consists of several subsystems. Each subsystem is loosely connected from one subsystem to another subsystem using the platform, programming language, and application independent API.
 
 
{| border=1 cellpadding=3
 
!Subsystem
 
!Descriptions
 
|-
 
| Mirror Server
 
|
 
* A mirror server is a server run by several types of entities such as universities, organizations, companies or individuals, which delivers files to users via HTTP, FTP, and/or RSYNC.
 
* Mirror servers are spread over the world and can be classified into three types: Master server, Tier-1 server, and Regular server.
 
* The master server is the origination from which files come. A Tier-1 server retrieves files from the Master server; A Regular server retrieve files from the Tire-1 server.
 
* The release engineering team uploads files to the master server.
 
|-
 
| OpenOffice.org Web
 
|
 
# A user visits the web site, download.openoffice.org.
 
# A JavaScript in the web page accesses to the Download Concierge Servers to get several types of small lists of deliverable and shows them to the user using AJAX.
 
# The user chooses a platform, product, version, language, and other items from the lists. The JavaScript helps the user find required files saying e.g. "You will need to install one of these language installers first and then install this language pack for your language." The user clicks on the shown links to start downloading file(s).
 
# One of the load-balanced Redirector Servers takes a user's download request to a closer Mirror Server.
 
# The Mirror Server sends back the requested file to the user.
 
|-
 
| Download Concierge Server
 
|
 
* A Download Concierge Server is a web server specially tuned for AJAX.
 
* This server fetches scan results from the Scanner Server, internally prepares several lists of items by categories such as platform, product, version, language, and other items, and responds with a small XML file upon a request from JavaScript embedded in a web page.
 
|-
 
| Redirector Server
 
|
 
* A Redirector Server fetches scan results from the Scanner Server and hits back a user's request to a closer server in which the requested file is available and puts log files on its web server.
 
|-
 
| Scanner Server
 
|
 
* A Scanner Server fetches repository data such as a list of URLs from the Repository, scans all mirror servers based on the URLs, and puts scan results and log files on its web server.
 
|-
 
| Repository
 
|
 
* The Repository is not a server but a set of data files such as a list of mirror sites, which are stored in distribution.openoffice.org and maintained by the staffs.
 
* GUI tools might be prepared for staffs to easily maintain the repository data.
 
|-
 
| Logging Server
 
|
 
* A Logging Server collects log files from several servers, concatenate some log files - simultaneously produced by load-balanced servers - into a single log file, and puts those log files on its web server.
 
|-
 
| Statistics Calculation Server
 
|
 
* A Statistics Calculation Server fetches log files from a Logging Server, accumulates logs by category such as location of user, choice of language, version of OpenOffice.org, calculates statistics, and provides users with the results through its web server.
 
* Any type of existing freely available software products could be used for this purpose.
 
* Or even, anyone can develop own cool program running on his server or PC.
 
|-
 
| Surveillance Servers & User's PC
 
|
 
* A Surveillance Server fetches repository data from the Repository and scan results from the Scanner Server and confirm the integrity of files kept in each mirror server by periodically and randomly scanning all mirror servers.
 
* This server also continuously checks the status of several other servers such as Redirector Server and Scanner Server.
 
* Likewise, User's PC also does the similar tasks.
 
* A Surveillance Server will inform Redirector Servers and alert Staffs and Administrators in case of incident.
 
* A PC user would inform suspicious phenomenon on the mailing list.
 
* Staffs can directly stop the Redirector Servers.
 
|}
 
<p></p>
 
 
== API ==
 
Each subsystem communicates by obtaining remote files via HTTP GET. There are basically three types of file:
 
{| border=1 cellpadding=3
 
! Type
 
! Example
 
|-
 
| A plain text file (.txt)
 
| Scan results (URL, path, type of entry (d: directory, f: file, l: symbolic link)
 
<pre>
 
http://mirror.aarnet.edu.au/pub/openoffice /stable/3.1.1/ d
 
http://mirror.aarnet.edu.au/pub/openoffice /stable/3.1.1/OOo_3.1.1_Win32Intel_install_en-US.exe f
 
http://mirror.aarnet.edu.au/pub/openoffice /stable/3.1.1/OOo_3.1.1_Win32Intel_install_wJRE_en-US.exe f
 
...
 
http://openoffice.mirror.aussiehq.net.au /stable/3.1.1/ d
 
http://openoffice.mirror.aussiehq.net.au /stable/3.1.1/OOo_3.1.1_Win32Intel_install_en-US.exe f
 
http://openoffice.mirror.aussiehq.net.au /stable/3.1.1/OOo_3.1.1_Win32Intel_install_wJRE_en-US.exe f
 
...
 
</pre>
 
|-
 
| XML file (.xml)
 
| A list of mirror server
 
<pre>
 
  &lt;site location="Australia" code="au" type="regular" name="AussieHQ"&gt;
 
    &lt;uri set="extended" scheme="ftp"&gt;ftp://openoffice.mirror.aussiehq.net.au/pub/openoffice/&lt;/uri&gt;
 
    &lt;uri set="extended" scheme="http"&gt;http://openoffice.mirror.aussiehq.net.au/&lt;/uri&gt;
 
    &lt;uri set="extended" scheme="rsync"&gt;rsync://openoffice.mirror.aussiehq.net.au/openoffice-extended/&lt;/uri&gt;
 
    &lt;uri set="main" scheme="rsync"&gt;rsync://openoffice.mirror.aussiehq.net.au/openoffice/&lt;/uri&gt;
 
    &lt;contact name="Foo Bar"&gt;
 
      &lt;email&gt;foo.bar@xxx.yyy&lt;/email&gt;
 
    &lt;/contact&gt;
 
  &lt;/site&gt;
 
</pre>
 
|-
 
| ZIP-archive file (.zip)
 
|
 
* A set of a list of mirror server (.xml) and scan results (.txt)
 
* A set of Apache access log file (.txt) and error log file (.txt)
 
|}
 
<p></p>
 
 
== Strength ==
 
{| border=1 cellpadding=3
 
!Strengh
 
!Descriptions
 
|-
 
| Maintainability
 
|
 
* Since each subsystem is loosely connected, we can stop, reboot, upgrade, and replace one server without interrupting other server's service.
 
* Since each subsystem is encapsulated, we can easily upgrade any subsystem without depending on other subsystem's implementation.
 
* We can develop a new version of software on an experimental server and test it on a staging server without interfering with the production server which is serving real users.
 
|-
 
| Scalability and high availability
 
|
 
* We can instantly deploy many instances of subsystem, such as a redirector server, in several different locations.
 
* Each instance of one subsystem can be given an individual IP address and these IP addresses can be accessed with a single FGDN - DNS load balancing.
 
* Since the API is based on HTTP, as its nature, a fail over to another IP address could be automatically done in a condition of multiply assigned IP address for a single FQDN.
 
* Each server can be a virtual machine in an existing server machine or can be a real machine depending on a resource strategy.
 
|-
 
| Openness
 
|
 
* Anyone can freely get and reuse any data provided by the repository and each subsystem.
 
* Any subsystem can be developed by any developers who are interested and experienced in the field. Web developers could develop a front-end for users. Programmers who are good at mathematics could develop a statistics calculation server. Operators who know network administration could develop a surveillance server.
 
|}
 
<p></p>
 
 
== Weakness ==
 
{| border=1 cellpadding=3
 
! Weakness
 
! Descriptions
 
|-
 
| Complexity
 
|
 
* Each subsystem is simple, but the entire system is complex.
 
* No one can know deeply the entire system from end to end because each subsystem is encapsulated and its detailed implementation is left to its developer or development team.
 
|}
 
<p></p>
 
 
== Generation ==
 
 
{| border=1 cellpadding=3
 
!Generation
 
!System
 
!Descriptions
 
|-
 
| 1
 
| Bouncer
 
|
 
* Apache, PHP, and MySQL based application
 
* "Bouncer is a database driven mirror management app..."
 
* http://wiki.osuosl.org/display/Bouncer/Home
 
|-
 
| 2
 
| MirrorBrain
 
|
 
* Apache, Python, Perl and PostgreSQL based application
 
* "MirrorBrain is a Download Redirector and Metalink Generator"
 
* http://mirrorbrain.org/
 
|- style="background-color: #8080ff;"
 
| 3
 
| Open Mirror Network System version 3
 
|
 
* Distributed computer network system
 
|-
 
|4
 
| Open Mirror Network System version 4
 
|
 
* In addition to version 3, (just some ideas)
 
** providing communication features between redirectors and each mirror server
 
** logging for preciser download statistics calculation
 
** taking timezone into account
 
|}
 
<p></p>
 
 
== The Next Generation ==
 
In the version 4, redirectors and mirror servers would communicate each other over SOAP or other protocol to make situation better.
 
* The availability of content
 
** Pooling
 
*** Redirector: "What files do you currently have?"
 
*** Mirror server: "I have this, this, ... and this file."
 
** Reporting
 
*** Mirror server: "I have gotten this file and deleted that file."
 
*** Redirector: "Thanks."
 
 
* The load of server
 
** Pooling
 
*** Redirector: "How is it going?"
 
*** Mirror server: "Well, I have been busy, could you throttle the amount of download requests towards me?"
 
** Reporting
 
*** Mirror server: "I have currently room for more download."
 
*** Redirector: "OK, I am increasing the amount of download requests for you."
 
 
To calculate preciser download statistics, installing a small program in every or most mirror servers and the program prepare download logs. Logging Servers gather download logs from each mirror server and calculate download statistics.
 
 
In addition to the communication, timezone would be also taken into account. E.g during a day time in Europa, some traffic from the inside of Europa could be routed to America where it is early morning.
 
 
== Developer Assignment ==
 
{| border=1 cellpadding=3
 
!Subsystem
 
!Developer
 
|-
 
| Mirror Server
 
| - (no need to be developed)
 
|-
 
| OpenOffice.org Web
 
| (website project, release engineer)
 
|-
 
| Download Concierge Server
 
| (website project, release engineer, mirror project)
 
|-
 
| Redirector Server
 
| (Persons who know redirector)
 
|-
 
| Scanner Server
 
| (Probably Tora)
 
|-
 
| Repository
 
| - (no need to be developed)
 
|-
 
| Logging Server
 
| (Probably Tora)
 
|-
 
| Statistics Calculation Server
 
| (Persons who know statistics, marketing project, or simply use existing software)
 
|-
 
| Surveillance Servers & User's PC
 
| (Persons who are interested in this area)
 
|}
 
<p></p>
 
 
== Schedule ==
 
(To be discussed.)
 
 
[[Category:Mirror Network]]
 
 
== Status of this document ==
 
== Status of this document ==
 
* Rough ideas for brainstorming  
 
* Rough ideas for brainstorming  

Revision as of 04:45, 29 December 2009

Status of this document

  • Rough ideas for brainstorming

Concept

Concept Descriptions
Distributed computer network
  • One subsystem per one server basis deployment
  • Each subsystem simply acts for own duty
  • Deploying servers as a virtual and/or real machine
Simple, open API
  • Platform, programming language, and application independent API
  • Using HTTP GET to get a plain text file, XML file, or ZIP-archive file
  • Anyone can freely develop own cool subsystems using data files
Maintainability
  • Easy to upgrade each subsystem individually
  • Easy to develop and test with experimental, staging, and production servers
Scalability & High Availability
  • Easy to add servers
  • DNS load balancing and automatic fail over
Surveillance & Alert
  • Surveying malicious files and monitoring the status of servers
  • Emergency stop in case of incident

Network Diagram

Open-mirror-network-system-draft-2009-12-17-2100.png

Subsystem

The OpenOffice.org Open Mirror Network System consists of several subsystems. Each subsystem is loosely connected from one subsystem to another subsystem using the platform, programming language, and application independent API.

Subsystem Descriptions
Mirror Server
  • A mirror server is a server run by several types of entities such as universities, organizations, companies or individuals, which delivers files to users via HTTP, FTP, and/or RSYNC.
  • Mirror servers are spread over the world and can be classified into three types: Master server, Tier-1 server, and Regular server.
  • The master server is the origination from which files come. A Tier-1 server retrieves files from the Master server; A Regular server retrieve files from the Tire-1 server.
  • The release engineering team uploads files to the master server.
OpenOffice.org Web
  1. A user visits the web site, download.openoffice.org.
  2. A JavaScript in the web page accesses to the Download Concierge Servers to get several types of small lists of deliverable and shows them to the user using AJAX.
  3. The user chooses a platform, product, version, language, and other items from the lists. The JavaScript helps the user find required files saying e.g. "You will need to install one of these language installers first and then install this language pack for your language." The user clicks on the shown links to start downloading file(s).
  4. One of the load-balanced Redirector Servers takes a user's download request to a closer Mirror Server.
  5. The Mirror Server sends back the requested file to the user.
Download Concierge Server
  • A Download Concierge Server is a web server specially tuned for AJAX.
  • This server fetches scan results from the Scanner Server, internally prepares several lists of items by categories such as platform, product, version, language, and other items, and responds with a small XML file upon a request from JavaScript embedded in a web page.
Redirector Server
  • A Redirector Server fetches scan results from the Scanner Server and hits back a user's request to a closer server in which the requested file is available and puts log files on its web server.
Scanner Server
  • A Scanner Server fetches repository data such as a list of URLs from the Repository, scans all mirror servers based on the URLs, and puts scan results and log files on its web server.
Repository
  • The Repository is not a server but a set of data files such as a list of mirror sites, which are stored in distribution.openoffice.org and maintained by the staffs.
  • GUI tools might be prepared for staffs to easily maintain the repository data.
Logging Server
  • A Logging Server collects log files from several servers, concatenate some log files - simultaneously produced by load-balanced servers - into a single log file, and puts those log files on its web server.
Statistics Calculation Server
  • A Statistics Calculation Server fetches log files from a Logging Server, accumulates logs by category such as location of user, choice of language, version of OpenOffice.org, calculates statistics, and provides users with the results through its web server.
  • Any type of existing freely available software products could be used for this purpose.
  • Or even, anyone can develop own cool program running on his server or PC.
Surveillance Servers & User's PC
  • A Surveillance Server fetches repository data from the Repository and scan results from the Scanner Server and confirm the integrity of files kept in each mirror server by periodically and randomly scanning all mirror servers.
  • This server also continuously checks the status of several other servers such as Redirector Server and Scanner Server.
  • Likewise, User's PC also does the similar tasks.
  • A Surveillance Server will inform Redirector Servers and alert Staffs and Administrators in case of incident.
  • A PC user would inform suspicious phenomenon on the mailing list.
  • Staffs can directly stop the Redirector Servers.

API

Each subsystem communicates by obtaining remote files via HTTP GET. There are basically three types of file:

Type Example
A plain text file (.txt) Scan results (URL, path, type of entry (d: directory, f: file, l: symbolic link)
http://mirror.aarnet.edu.au/pub/openoffice /stable/3.1.1/ d
http://mirror.aarnet.edu.au/pub/openoffice /stable/3.1.1/OOo_3.1.1_Win32Intel_install_en-US.exe f
http://mirror.aarnet.edu.au/pub/openoffice /stable/3.1.1/OOo_3.1.1_Win32Intel_install_wJRE_en-US.exe f
...
http://openoffice.mirror.aussiehq.net.au /stable/3.1.1/ d
http://openoffice.mirror.aussiehq.net.au /stable/3.1.1/OOo_3.1.1_Win32Intel_install_en-US.exe f
http://openoffice.mirror.aussiehq.net.au /stable/3.1.1/OOo_3.1.1_Win32Intel_install_wJRE_en-US.exe f
...
XML file (.xml) A list of mirror server
  <site location="Australia" code="au" type="regular" name="AussieHQ">
    <uri set="extended" scheme="ftp">ftp://openoffice.mirror.aussiehq.net.au/pub/openoffice/</uri>
    <uri set="extended" scheme="http">http://openoffice.mirror.aussiehq.net.au/</uri>
    <uri set="extended" scheme="rsync">rsync://openoffice.mirror.aussiehq.net.au/openoffice-extended/</uri>
    <uri set="main" scheme="rsync">rsync://openoffice.mirror.aussiehq.net.au/openoffice/</uri>
    <contact name="Foo Bar">
      <email>foo.bar@xxx.yyy</email>
    </contact>
  </site>
ZIP-archive file (.zip)
  • A set of a list of mirror server (.xml) and scan results (.txt)
  • A set of Apache access log file (.txt) and error log file (.txt)

Strength

Strengh Descriptions
Maintainability
  • Since each subsystem is loosely connected, we can stop, reboot, upgrade, and replace one server without interrupting other server's service.
  • Since each subsystem is encapsulated, we can easily upgrade any subsystem without depending on other subsystem's implementation.
  • We can develop a new version of software on an experimental server and test it on a staging server without interfering with the production server which is serving real users.
Scalability and high availability
  • We can instantly deploy many instances of subsystem, such as a redirector server, in several different locations.
  • Each instance of one subsystem can be given an individual IP address and these IP addresses can be accessed with a single FGDN - DNS load balancing.
  • Since the API is based on HTTP, as its nature, a fail over to another IP address could be automatically done in a condition of multiply assigned IP address for a single FQDN.
  • Each server can be a virtual machine in an existing server machine or can be a real machine depending on a resource strategy.
Openness
  • Anyone can freely get and reuse any data provided by the repository and each subsystem.
  • Any subsystem can be developed by any developers who are interested and experienced in the field. Web developers could develop a front-end for users. Programmers who are good at mathematics could develop a statistics calculation server. Operators who know network administration could develop a surveillance server.

Weakness

Weakness Descriptions
Complexity
  • Each subsystem is simple, but the entire system is complex.
  • No one can know deeply the entire system from end to end because each subsystem is encapsulated and its detailed implementation is left to its developer or development team.

Generation

Generation System Descriptions
1 Bouncer
2 MirrorBrain
  • Apache, Python, Perl and PostgreSQL based application
  • "MirrorBrain is a Download Redirector and Metalink Generator"
  • http://mirrorbrain.org/
3 Open Mirror Network System version 3
  • Distributed computer network system
4 Open Mirror Network System version 4
  • In addition to version 3, (just some ideas)
    • providing communication features between redirectors and each mirror server
    • logging for preciser download statistics calculation
    • taking timezone into account

The Next Generation

In the version 4, redirectors and mirror servers would communicate each other over SOAP or other protocol to make situation better.

  • The availability of content
    • Pooling
      • Redirector: "What files do you currently have?"
      • Mirror server: "I have this, this, ... and this file."
    • Reporting
      • Mirror server: "I have gotten this file and deleted that file."
      • Redirector: "Thanks."
  • The load of server
    • Pooling
      • Redirector: "How is it going?"
      • Mirror server: "Well, I have been busy, could you throttle the amount of download requests towards me?"
    • Reporting
      • Mirror server: "I have currently room for more download."
      • Redirector: "OK, I am increasing the amount of download requests for you."

To calculate preciser download statistics, installing a small program in every or most mirror servers and the program prepare download logs. Logging Servers gather download logs from each mirror server and calculate download statistics.

In addition to the communication, timezone would be also taken into account. E.g during a day time in Europa, some traffic from the inside of Europa could be routed to America where it is early morning.

Developer Assignment

Subsystem Developer
Mirror Server - (no need to be developed)
OpenOffice.org Web (website project, release engineer)
Download Concierge Server (website project, release engineer, mirror project)
Redirector Server (Persons who know redirector)
Scanner Server (Probably Tora)
Repository - (no need to be developed)
Logging Server (Probably Tora)
Statistics Calculation Server (Persons who know statistics, marketing project, or simply use existing software)
Surveillance Servers & User's PC (Persons who are interested in this area)

Schedule

(To be discussed.)

Personal tools