The Replicator

CMP home Downloads Menu no menu Last updated 2004-06-28 by Roedy Green ©2003 Canadian Mind Products.

Introduction

The function of the Replicator is to efficiently replicate a set of files on many machines over the Internet from a master copy. It consists of two parts:
  1. A Java utility that prepares the master files for uploading to a website by compacting the HTML (removing unnecessary white space), collecting the files by age and bundling them in compressed Zip format.
  2. A Java WebStart Application that each client runs any time they wish to bring their copy of the files up to date. It visits the website, collects the new files, unpacks and decompresses them. The website files may optionally be protected by a login userid/password.
The Replicator can also be thought of as an efficient way to backup your crucial files to a website or CD. It automatically detects changed files, compresses them, and bundles them into archives. You don't have to backup everything, only what has changed. It automatically repacks old archives containing excessive deadwood.

Trying the Replicator

You can use the Replicator to download the entire compressed mindprod.com website, and keep keep your copy up to date efficiently. You can try it out here.

Why Use The Replicator?

The alternatives are: The Replicator lets each client have a local copy of the complete website, that is kept up to date with just a short transmission from time to time of just the changes. It is a one button click to request a refresh. The Replicator also works for people with only indirect internet access. The Replicator uses only garden variety HTTP browser protocol and requires no software to be run on the ISP's server.

rsync

The traditional tool to replicate a tree of files is called rsync. It has sophisticated ways of sending only the parts of files that have changed, comparing files using rolling checksums. It can be used with secure (encrypted) rsh or ssh channels. Most of the time, it would be faster than the Replicator. rsync is available for most Unix and Windows platforms.

The catches are:

How To Operate The Replicator

Prepare the master copy of the files in a single directory tree representing the website. You can create new files, delete files or update the files with any tools you please. You can build indexes on these files or use any other such tools you want. There is no restriction of what sorts of files these are. They may be HTML, PDF, EXE or even ZIP.

After you have done a batch of changes that you want to propagate, start a bat file called PREPARE.BAT by clicking the item in your start menu or the desktop icon. If your machine can support it, you may alternately use the faster PREPJET.BAT that using a compiled version of the program.

PREPARE.BAT creates compressed summary bundle zips of files that have recently changed and uploads them to your website using a third party program such as NetLoad. Netload automatically removes deleted files from the website.

Normally you would upload your website both in compressed and expanded form, though the expanded form is optional.

At any time a client wants to get the freshest files, they simply click Replicator Update on a web page, or click Replicator on the start menu or a desktop icon. The rest is fully automatic. When it is done, the files will be sitting on their local hard disk decompressed, identical copies to the master. The client program automatically fetches only what is needed. It does not need the help of any third party software such as a browser or FTP program.

Features

Configuring

The master sending station configures the whole process by filling in a form called replicator.properties like this. The # lines are comments.
# replicator.properties files.
# Replicator Configuration file for Mindprod project.

# Summary of what this files says:
# You want to distribute nearly everything in
# E:\mindprod.  The Replicator will put the zips
# in E:\replicator-staging.  You will upload those
# files to http://mindprod.com/replicator.  The clients
# will download the zips to c:\mpstaging
# then decompress them to c:\mp

# This replicator.propreties files is used used
# to configure a set of files for efficient distribution via a website using
# multiple archive zip files to send just the changes.
# Each parameter that follows must be specified on one line, even for multiples.
# Everything is case sensitive,
# i.e. upper/lower case has to be precisely correct.
# You specify the broad rules of what files you want to distribute,
# then refine your choice with exceptions to the previous rules.

# Directory without trailing \ where ALL the
# master files to be distributed are kept.
# What is common to all filenames to be distributed, possibly just a drive.
# Use local platform names e.g. E: or E:\mindprod for Windows.
# This name is purely for the local
# Master copy, not directory on the website or directory at the clients'.
# Absolute directory name.
# use \ not /.
SENDER_BASE_DIR=e:\mindprod


# Directory without trailing \ where all the archive zip
# files to be distributed are kept.
# Use local platform names e.g. E:\mindprod for Windows.
# This name is purely for the local
# master copy, not for the website or the clients.
# Make sure this directory is NOT included in the trees,
# dirs or files to distribute!
# It is NOT relative to the SENDER_BASE_DIR.
# Absolute directory name.
# use \ not /.
SENDER_ZIP_STAGING_DIR=E:\replicator-staging


# Which directory trees (directory and all subdirectories)
# within the base directory tree will
# be distributed.  The name is relative to the base directory, i.e.
# The name must include the directory, but not
# the base. The name must not begin or end with a \,
# though it may contain embedded \. Spaces are ok.
# Do not surround filenames in quotes.
# Separate multiple directory tree names with commas.
# * means all trees in the base directory, and all files in the base directory
# use \ not /.
TREES_TO_DISTRIBUTE=*

# Which directory trees (directory and all subdirectories)
# within the base directory tree will not
# be distributed.
# Don't specify directory tree names, unless they would
# naturally be distributed by being part of TREES_TO_DISTRIBUTE
# above.
# The directory and all its subdirectories will
# be withheld, even if they appear in the list to be distributed.
# The name is relative to the base directory, i.e.
# The name must include the directory, but not
# the base. The name must not begin or end with a \,
# though it may contain embedded \. Spaces are ok.
# Do not surround filenames in quotes.
# Separate multiple directory tree names with commas.
# use \ not /.
TREES_TO_WITHHOLD=renney,addictions


# Which individual directories (directory without subdirectory)
# within the base directory tree will
# be distributed.
# Don't specify any directories which would naturally be distributed
# by being part of TREES_TO_DISTRIBUTE above. You do have to mention
# directories that would be excluded by TREES_TO_WITHHOLD above.
# The name is relative to the base directory, i.e.
# the name must include the directory, but not
# the base. The name must not begin or end with a \,
# though it may contain embedded \. Spaces are ok.
# Do not surround filenames in quotes.
# Separate multiple directory names with commas.
# * means all files in the base directory, not including all dirs and subdirs.
# use \ not /.
DIRS_TO_DISTRIBUTE=jgloss

# Which individual directories (directory without subdirectories)
# within the base directory tree will not
# be distributed.
# Don't specify directory  names, unless they would
# naturally be distributed by being part of TREES_TO_DISTRIBUTE
# above.
# The name is relative to the base directory, i.e.
# The name must include the directory, but not
# the base. The name must not begin or end with a \,
# though it may contain embedded \. Spaces are ok.
# Do not surround filenames in quotes.
# Separate multiple directory names with commas.
# * means all dirs in the base directory, and all files in the base directory,
# not including all subdirs of those dirs.
# * means all files in the base directory, not including all dirs and subdirs.
# use \ not /.
DIRS_TO_WITHHOLD=jgloss\include,include

# Which individual files (which otherwise would not be distributed)
# within the base directory tree will
# be distributed.
# Don't specify any files which would naturally be distributed
# by being part of TREES_TO_DISTRIBUTE or DIRS_TO_DISTRIBUTE above.
# You do have to mention files that would be excluded by TREES_TO_WITHHOLD
# or FILES_TO_WITHHOLD above.
# The name is relative to the base directory, i.e.
# the name must include the directory, but not
# the base. The name must not begin or end with a \,
# though it may contain embedded \. Spaces are ok.
# Do not surround filenames in quotes.
# Separate multiple filenames with commas.
# use \ not /.
FILES_TO_DISTRIBUTE=

# Which individual files that would otherwise be distributed
# within the base directory tree will not
# be distributed.
# Don't specify file names, unless they would
# naturally be distributed by being part of TREES_TO_DISTRIBUTE
# or DIRS_TO_DISTRIBUTE above.
# The name is relative to the base directory, i.e.
# The name must include the directory, but not
# the base. The name must not begin or end with a \,
# though it may contain embedded \. Spaces are ok.
# Do not surround filenames in quotes.
# Separate multiple filenames, with commas.
# use \ not /.
FILES_TO_WITHHOLD=zips\cmp1.zip,zips\cmp2.zip,zips\cmp3.zip,zips\cmp4.zip,zips\
cmp5.zip

# Which file extensions should be withheld from distribution, no matter what
# directory they are in, and no matter what previous rules say about these files.
EXTENSIONS_TO_WITHHOLD=log,zip,digest,nlx

# Largest file to distribute in bytes.
# This is to prevent you accidentally distrubuting some monster file.
# Suggest leaving at 10 megabytes.
# The largest possible setting is 2 gigabytes.
# Large files are not spread over several zips.
# Big files will be sent as one large zip.
LARGEST_FILE_TO_DISTRIBUTE=10000000

# We optimise under the presumption that most users will check for updates
# more frequently than this many days between checkins.
# Setting this number too large
# penalizes brand new users.
# Setting it too small penalises users who update infrequently.
LAG=5

# Approximate size in bytes of individual zip archive files distributed,
# prior to compression.
# usually 2 megabytes 2000000, no commas
# It might be set smaller if users have very flaky connections and can't
# reliably download files that big.
UNCOMPRESSED_IDEAL_ZIPSIZE=2000000

# Compression level 0 to 9.
# 0=uncompressed 9=as compressed as possible.
# Bigger number takes more time to prepare zips.
COMPRESSION_LEVEL=9

# 'distributed' if you want excess whitespace
# removed from any distributed *.html
# This makes the files smaller and faster to load.  They look the same
# on the browser screen, but the raw HTML source is harder to understand.
# 'original' if you want the original HTML compacted as well.
# 'none' if you want HTML files distributed as-is.
COMPACT_HTML=original

# true if you want files that have new dates, but have not really
# changed since the last distribution to be redated back to their
# original dates.  This saves redistributing files that have not
# actually changed content.  This is useful for generated index files
# which often generate to the same output as previously.
# This affects the file dates on the originals and well as the files
# distributed.
# false if you want redated, but unchanged files to be redistributed
# with the new dates.
# For more information on how untouch works see
# http://mindprod.com/untouch.html
UNTOUCH=true

# Command to upload files to the Internet.
# Needs fully qualitication and .exe suffix.
# e.g. E:\Program Files\netload\netload.exe
# or C:\WINNT\system32\cmd.exe /E:1900 /C upload.bat
# Leave empty (i.e UPLOAD=) to bypass automatic upload.
# Don't surround in quotes, even if it contains spaces.
# use \ in stead of /.
UPLOAD=E:\Program Files\netload\netload.exe


# Following properties control the default client configuration:

# Project name for title, just what it is we are distributing.
PROJECT_NAME=Mindprod.com Website

# Globally unique name, starting with website name in reverse e.g. com.mindprod.
# This name allows The Replicator to keep several
# projects on the same machine from
# interfering with one another.
UNIQUE_PROJECT_NAME=com.mindprod.replicator

# Directory without trailing \ where ALL the master files
# to be distributed are kept.
# Use local platform names e.g. E:\mindprod for Windows.
# This is the suggested default for the client.
# use \ not /.
SUGGESTED_RECEIVER_BASE_DIR=c:\mp

# Directory without trailing \ where all the incoming archive zip files are kept.
# Use local platform names e.g. E:\mindprod for Windows.
# This is the suggested default for the client.
# use \ not /.
SUGGESTED_RECEIVER_ZIP_STAGING_DIR=c:\mpstaging

# Where on the website the archive files are stored.
# without a trailing /. Must have a lead http://
# e.g. http://mindprod.com/replicator
# use / not \
WEBSITE_ZIP_URL=http://mindprod.com/replicator

# Used when the client does not have direct access to the internet.
# Where can it get the zip files from a LAN or his local machine
# downloaded by a relay version of the Replicator.
# If the source of the zip files for the client
# is directory on the current machine, you might code:
# file://localhost/X:/replicator
# If the source of the zip files is a directory
# on another machine on the LAN you might code:
# file://bigserver/Cdisk/replicator
# where Cdisk is the share name.
# use / not \.
SUGGESTED_LAN_ZIP_URL=file://server/sharedarea/mpstaging

# Do clients need a userid/passwords to access the files from the website?
# 'none' means no user id or password needed to access files on the website.
# 'basic' means use a simple undigested userid/password to access files.
AUTHENTICATION=none

# true if you want additional troubleshooting information dumped to the console.
# false if you want this debugging information suppressed.
DEBUG=true
# end
    

Client Use

The client just clicks on a link on a webpage in a browser to install the software. Alternatively he/she can type javaws to start Java Web Start, then feed the URL of the jnlp file e.g. http://mindprod.com/replicator/replicatorrecieverwebsite.jnlp to Java Web Start to install the program. The client then fills in a screen that looks like this:

replicator screen shot

On subsequent uses, all the client has to do is click OK without filling in any fields.

SneakerNet/LAN Use

Sometimes clients may not have a direct internet connection. They must use an indirect approach to getting get their files. A computer with internet access gets the latest files using the JWS client software in the usual way, possibly using an alternative jnlp file called replicatorreceiverrelay.jnlp if they don't want to see the expanded files themselves. Then someone has to burn all the zip files onto CD and carry them across to the computer or LAN that has no internet access. From there the client software can retrieve just the new files directly from the CD (details later), or from a shared copy of the CD put on a LAN server, or from a copy of the files put on an internal website or fileserver using yet another version of the jnlp file called replicatorreceiverlan.jnlp.

The only tricky part is figuring out the URL for where the zip files are stored. Try this technique to discover it. Put a dummy temp.html file in the lan directory where the zip files and the manifest.ser are. Now try to open it with your browser from the machine runing the replicator receiver, using file open or whatever other techniques you have, e.g. drag and drop. When you finally get it viewed, you will see its URL on the top line. Use that as a model to create the URL for the LAN directory. If that URL does not work, convert any vertical bars in the URL to colons.

The URL will have the form file://machinename/sharename/directory Look in your Network Neighbourhood to discover the machinenames and sharenames. You can also assign the remote directory a local drive letter so that it appears as if if were on your local machine. Then the URL becomes file://localhost/X:/somedirectory

You can also test the Replicator echoing files back to the same machine, without uploading to a website, by using replicatorreceivertest.jnlp.

Dial Up Use

Getting started from scratch with a dial up internet connection would take an inordinately long time. What you want to do is send someone a CD to get them started, and from then on get their updates via dialup internet connection.

Web Server Requirements

Master Station Requirements

Client Station Requirements

Deploying

You download a customized version of the program from my website, then install it as you would any other Windows program. Then you fine tune the replicator.properties file. I should have it pre-set up for you correctly. Then you click PREPARE, which prepares zips and uploads them to your website. Your clients need a link to the replicatorreceiverwebsite.jnlp file that either they click in their browser or type directly into javaws.exe. When they click it will automatically install, download and decompress the files.

Creating Replicator Distribution CDs

To produce a CD distribution to rapidly get a client started, just burn the contents of the SENDER_ZIP_STAGING_DIR onto the root directory of CD. Don't create a SENDER_ZIP_STAGING_DIR directory on the CD! Put the files directly into the root.

Include everything in the SENDER_ZIP_STAGING_DIR, namely the ZIP files, a copy of the replicatorreceiver.jar, the JNLP files, freshness.ser, the setup.exe, the replicator.gif and replicator.ico files.

Do not include the files in the program directory such as replicatorsender.exe or replicatorsender.jar. Also possibly include a copy of the offline Java JRE also onto the root directory of the CD.

Make copies of the CD and send them out by mail. To install the software, just insert the CD in the CD ROM drive.

When you run setup from CD, it will install the client Java Web Start application software and unpack the distributed data files in the zips. The JRE is there just in case the client did not have a recent Java already installed, (which includes the Java Web Start runtime). All the client need to is insert the cd to invoke the autorun feature to install the program and datafiles. If that does not work, the client can jump start the process by going to a DOS box and typing

// make the CD the current drive, presumably R:
R:
setup.exe
Change the letter R to whatever your CDROM drive letter is. If even that does not work try:
Click Java Web Start to launch it.
Click view | Downloaded Applications.
Type file://localhost/R:/replicatorreceivercdR.jnlp
Click Start.
where R is the drive letter of the CD.

If your CDROM drive letter is not R: you can adjust it to be. Right Click My Computer | Manage | Device Manager Storage | Disk Management alternatively Settings | Control Panel | Administrative Tools | Computer Management | Storage | Disk Management

The client can then continue via website updates or lan updates or further CD updates.

You can also create CDs at remote stations by using the replicatorreceiverrelay.jnlp to download the zips from the website. Burn a cd consisting of the all the files in the relay receiver's zip staging directory.

Files

You may be curious what all the various files are for. You don't need to understand any of this to use the Replicator.
File Purpose
*.class compiled Java classes
*.java Java source code
autorun.inf kicks of a install from CD
mindprodcert.cer public key code signing certificate. You may optionally install this into Java Webstart so that it automatically trusts the Canadian Mind Products digital signatures.
freshness.ser Lets the client know if any of its auxiliary files have gone stale and need to be redownloaded.
justs.bat compiles everything and prepares jars. Only used if you customise the program yourself.
manifest.ser list of current zips with dates used by client to decide what to download.
prepare.bat prepare a set of zips for distribution.
prejet.bat prepare a set of zips for distribution, using the natively Jet compiled version.
prepjar.bat prepare a set of zips for distribution, using the Java.exe JVM with a jar file.
receiver.mft Java manifest for the replicatorreceiver jar
receiver.ser persistent state of client target. Remembers what it was doing last time.
receiverjar.list list of files for replicatorreceiver.jar
replicator.gif Replicator logo in Internet format.
replicator.ico Replicator logo in Windows format.
replicator.properties Master configuration file for sender.
replicatorreceiver.jar collected class files to run the client receiver.
replicatorreceivercd*.jnlp Java web Start declarations for download from a CD. There is a different one for each possible CD drive letter A through Z.
replicatorreceiverlan.jnlp Java web Start declarations for download from a LAN
replicatorreceiverrelay.jnlp Java web Start declarations for download zips, for later relay to offnet clients
replicatorreceivertest.jnlp Java web Start declarations for download from sending site, loopback test.
replicatorreceiverwebsite.jnlp Java web Start declarations for download from a Website.
replicatorsender.exe Jet natively compiled version of the sender for extra speed.
replicatorsender.jar collected class files to run the sender.
sender.mft Java manifest for the replicatorsender.jar
sender.ser persistent state of sender. Remembers what was doing last.
senderjar.list list of files for replicatortarget.jar
setup.exe kicks of a install from CD
startover.bat Used to start from scratch. Erases all zips and starts afresh generating them.

Cost

James Seaboldt of smear.biz generously funded the development of the project. It is available to the public for $100.00 USD

What Is Included

What Is Not Included

Trouble Shooting

Java Web Start sometimes leaves obsolete icons on yoru desktop. To get rid of them, right click the desktop and click refresh. You can then fill in the holes with right click arrange icons | by name.

All files and directories the replicator touches must not have leading or trailing spaces in their names. It is ok to have embedded spaces, even double spaces in names. If they have lead or trail spaces, the Replicator will abort telling you which file has to be renamed. It will pick up where it left off after you fix the problem and restart it.

Most problems getting started surround not configurging the replicator.properties file correctly. Check the system out by using the replicatorrecivertest.jnlp to download the zips and unpack them to a unique directory, and make sure the files you wanted included are included and no more. You can adjust the replicator.properties to correct any errors, and the Replicator will automatically adjust without having to start from scratch.

To start from scratch, run startover.bat. If that does not work, delete the *.ser files in the program files/replicator directory and all the files in the SENDER_ZIP_STAGING_DIR.

Most problems seem to occur during the upload of the zip files, which is not under the direct control of the Replicator. Check that the files in the SENDER_ZIP_STAGING_DIR match the website in size and date. If you see mismatches, you can try deleting the offending files from the website manually, then retry the prepare. If you use Netload, doing local refresh and site refresh will often help get it back in sync. Try deleting any *.nlx files in the SENDER_ZIP_STAGING_DIR and any netload.chk files in the WEBSITE_ZIP_URL directory on the website.

If problems seem to be related to Jet and its DLLs, you can revert to the HotSpot version which is slightly slower. You just use PREPJAR.BAT in preference to PREPJET.BAT. The Jet version only works in Windows.

To use the Replicator off net, you need to use the replicatorreceiverviarelay.jnlp to collect the zip files off the website and store them in a staging directory. You then have to copy those files to some place where they are accessible via a lan, perhaps using a CD as an intermediary. You then fire up replicatorreceivervialan.jnlp from the same directory as the zips. If this directory is the same as the one you configured in replicator.properties SUGGESTED_LAN_ZIP_URL, it should work fine. However, if it is different, you will have to manually edit the codebase parameter in the replicatorreceivervialan.jnlp file to make it match.

The other way to use the Replicator offnet is to use replicatorreceiverviarelay.jnlp to download the zips from the website and burn everything in the staging directory onto the root of a CD. From there you can install the CD just by inserting it into a drive. You can use the CD to initially install the files, and subsequent CDs to keep the files up to date. You can also use the web to keep the files up to date after an initial CD install. When you use replicatorreceivercdX.jnlp it will copy the files off cd it has not already got.

If you make a great many changes to your website, it is best to first upload those individual changes before running the replicator, rather than getting the upload phase of the replicator to upload those files and its own in the same batch. Large upload batches sometimes fail, and need to be run several times. Netload need to be have its cached directory manually cleared after each failure.

Possible Futures

Summary

This is a sweet project because there is so little in the way of user interface to nail down. Basically just start it and it runs. The only issues I am still concerned about are:


CMP logo
CMP_home
home
Canadian Mind Products CSS
HTML Checked!
ICRA ratings logo
mindprod.com IP:[24.87.56.253]
Your IP:[80.134.30.163]
You are visitor number 1717.
Please send errors, omissions and suggestions
to improve this page to Roedy Green.
You can get a fresh copy of this page from: or possibly from your local J: drive mirror:
http://mindprod.com/zips/java/replicator.html J:\mindprod\zips\java\replicator.html