iim - how it works

terminology

remote archive

A remote archive is some file tree on a remote server ; it has directories containing files and symlinks (links).

local archive

A local archive is some file tree on the local server.

events

An event is a (epoch, path, type) tuple, where

Each event represents an update in the remote archive ; the epoch is used to order the events.

The event stream is the ordered collection of all the events. As time progresses, the event stream is extended.

RECENT files

The RECENT files (RECENTs for short) are the Recent-.*.json files.

consistency

An archive is consistent if all the evens in the RECENTs in the archive have been applied on the archive. This means that a fresh RECENT (obtained from the remote archive) may only be moved into local archive, if/when all the events in the RECENT have been succesfully applied on the local archive.

epoch

The epoch of a (consistent) archive is the epoch of the last event in the last RECENT. The epoch of the archive is undefined if at least one RECENT is missing.

how it works

To keep the local archive in sync, RECENT files must be fetched ; the events therein must be applied as updates (fetches or deletes) on the local archive.

data

In Iim, 3 data areas are relevant :

Fresh RECENTs have to be kept in a staging area ; they can only be moved to area local if/when all events in the RECENTs have been processed.

the program

The basic structure of iim is simple:

  todo = []
  epoch <- initialize
  LOOP
    FETCH fresh recents
    epoch <- FIND new events
    todo  <- PROCESS new events, todo
    IF todo == [] MOVE recents INTO local

Array todo is used to carry unprocessed or failed events from one loop to the next.

initialize

The main object of initialize is to determine epoch from the epoch of the local archive. Other initializations follow from the loop-invariants we need to maintain.

If epoch is undefined or differs too much from the the epoch of remote, a full rsync is done first.

fetching RECENTs

To fetch (all) RECENTs, we sync them into area temp, marking their inode number. After the sync the RECENTs with a changed inode number are fresh.

This is very efficient on the client and the server.

On startup, we must initialize temp with hardlinks to the RECENTs in local.

finding new events

New events (more recent than epoch) are looked up in the (ordered set of) RECENTs in area temp. If new events are found, epoch is set to the epoch of the last event.

processing events - fetching files

When processing (a batch of) events, we must fetch (a batch of) files from remote. We want to rsync directly into local to take advantage of rsync's matched data magic ; if the source and destination files differ only a little, the transfer is very efficient.

First we determine the last event for each event-path in the batch ; if it is a delete event, we just delete the event-path from local ; if it is a new event, we add the path to the fetch-list, marking the inode number (possibly undef) of the path in local. Then we rsync the files from remote into local.

When rsync is finished, we determine which items on the fetch-list have made it into area sync. If the rsync had no errors, or the inode number of the file changed, the event is done. If the rsync was partial (rsync exit codes 23, 24), the tried counter of the even is incremented. The event is pushed on the todo list.

When an event's tried counter is over a certain limit (currently 3), the event is retried in a one-file-rsync. If the rsync succeeds, the event is considered done. If the rsync resulted in a partial transfer with number of files: 0, iim concludes that the event-path does not exist in remote, and the event is discarded as a bogus new event. Otherwise the event is put on the todo list (again).

Why worry about bogus new events ? Suppose that a bogus new event would appear in the event-list (because of a mistake in iim and/or the RECENTs). Then the event would sit in the todo list forever. As a consequence, no fresh RECENTs would ever be moved to local, because that's only allowed if the todo list is empty. That's why we make sure that bogus new events are handled properly, although the need may very well never arise.

moving RECENTs into local

Once all events in the RECENTs in temp have been processed, (todo is empty) the fresh Recents (having link count 1) in area temp may be moved to area local.

We move a fresh RECENT R (without copying) like this :

  unlink local/R
  link temp/R local/R

Aside: my Perl (v5.8.8 built for x86_64-linux-thread-multi) has a problem with renames ; if a, b and c are all hard links to the same thing, a rename 'a', 'b' returns 1, but has not renamed a. The rename can fail (according to the doc), but it should not return 1. Program /bin/mv does the right thing. I haven't tried rename(2).

fluff

That's it, basicly ; the rest is bells and whistles : items or features that are useful or decorative but not essential.

design goals

Iim should be (in order) : robust, efficient, simple. It means that we will always sacrifice efficiency and simplicity for robustness, and (within reason) simplicity for efficiency.


improvements

Before looking at improvements, let's state some assumptions.

improvements - using a cache

Suppose CPAN provides a file RECENT-cache, containing a list of filenames : RECENT-*, plus all the paths named in the events in (say) the past hour for which the path exists in the archive. Since some path can be mentioned in more than one event, RECENT-cache should only contain paths for which the last corresponding event is a new event.

Each loop begins by refreshing the cache (temp) :

  rsync --files-from MASTER::RECENT-cache MASTER::/ LOCAL/temp

Then, the client proceeds as before : it inspects the RECENTs for new events and creates a list of items to fetch. However, it can exclude items that are available in the cache. Normally (if/when the client is in sync), all required items will be in the cache, so a second rsync will not be necessary.

Refreshing the cache will be cheap, because most items will be unchanged.

Cache maintenance is easy : the cache always contains the most recent RECENT-cache ; so, after a refresh of the cache, the client can weed out obsolete entries from the cache.

improvements - using a truncated event stream

Assume for a moment that it is acceptable for a client to do a full sync if/when it finds itself out-of-sync for over a month (either on start-up or because it is suspended, or loses connection for more than a month, whatever).

Now we can drop from the server RECENT-{1Q,1Y,Z}.json, and set 'merged' to 'null' in RECENT-1M.json ; we 'forget' all events older than one month. Note that the epoch of an archive is still always defined, because the (truncated) history will never become empty, even when the master freezes for a very long time.

Truncating history doesn't matter to a client that is reasonably in sync. So, suppose that the client has slept for two months and wakes up. As usual, it syncs the remote RECENTs, and start looking for new, unseen events. if (and only if) the client runs out of history, it does a full rsync ; this is ok (by assumption).

Only when the master becomes available again, after being completely unreachable for over a month, would all clients start requesting full rsyncs. This scenario is very unlikely.

If one takes the view that full rsyncs may never be used, the price to pay is having to maintain (and use) the (ever growing) full history forever ; and forever is a very long time.

improvements - replace the RECENTs with database and a daemon

At the moment, the RECENTs are generated by an agent that watches the CPAN file-tree and generates (epoch,path,type)-events. Suppose the agent can output the events to some stream.

For now we assume the event-generator can send (type,path)-tuples over a socket to some daemon ; if/when the connection to the daemon fails, the event-generator logs the tuples to some file. On reconnect, it moves the file, sends the backlog to the daemon, then removes the file.

Suppose the event-stream is maintained (by some daemon) in an sqlite database with (id,type,path) tuples.

For diagnosic purposes, a time column could be added, initialised as int(epoch).

Next, we need a daemon, which does two things.

  1. the daemon maintains the database

    The daemon adds new events to the database.

    A root-daemon accepts authorised requests from a the event-generator ; like

      add $path $type $sum

    where $sum is the checksum of ``$path.$typ.$secret'', and $secret is shared between the event-generator and the daemon.

    On a downstream site, events are added by the client after the event is processed by the client.

  2. the daemon ansers queries

    On request, the daemon answers to a limited set of queries like

      first_id
      last_id
      from $x
      from $x to $y
      from $x limit $n
      path $path
      stat $path
      lstat $path
      readlink $path

    where $x and $y refer to id's. The result of a query is a proper json file.

    Query path returns the event for $path, if any. Query stat returns stat /path/to/cpan/$path ; likewise lstat and readlink.

the client side

The proposed daemon would make an ``instant mirroring'' client, simpler and more efficient.

benefits


author

© 2011-2013 Henk P. Penning, Faculty of Science, Utrecht University
iim version 0.4.14 - Tue Oct 4 07:56:13 2016 - dev revision 108


Valid XHTML 1.0 Strict   Valid CSS!