'Networker' from TechModern Limited

Networker package

This Java package provides the means by which any number of computers on a LAN may communicate with each other without prior knowledge of each others' existence. This is a key point since it avoids the need to install location-specific code or to update an installation when machine addresses change or new computers are added.

The package is based on the concepts of Services and Clients, as follows:

A Service is a program that performs a duty which may be of interest to another program, running either on the same computer on any other connected on the same LAN. To do this, it announces itself to the network, giving as part of the announcement a code that identifies the type of service being offered.

A Client is a program that makes use of one or more services. Unlike a service, it does not announce itself but waits passively until a suitable service is announced, then makes contact with it.

A single networked computer can host any number of services and clients, but all must run under the overall control of a single program managing the network interface. This Java package contains all the features of such a system; the network infrastructure and support for both services and clients.

Message fundamentals

Although this package is written in Java, the messaging system is not tied to Java. Unlike Jini, the information being transferred is not binary Java objects but plain text, so it is quite possible to build nodes using other languages. Here is a description of how the system works.

When a node (a computer) powers up and wishes to engage in networking, it first starts the networker framework. This creates internal tables and queues but otherwise does nothing. Next, the node creates one or more Services and/or Clients and calls the framework to add these objects to the networking system. As each is added, internal tables are built to keep track of them, then the framework announces itself by sending its name followed by a single space character, using multicast address 224.0.0.1 and port 17348. The name in this context is any string unique to this node; you might arrange for the program to ask you for a suitable name the first time it runs or to read a name from a database. Avoid names containing spaces or unusual ASCII characters.

A special kind of multicast message is the watchdog multicast, comprising the string "** ". This optional multicast may be sent by one (only) node on the network so that all the rest can detect network breaks. No particular watchdog interval is mandated but a few seconds is normally appropriate.

Multicast messages will be received by all other nodes that are currently online. Each responds to the standard multicast (not the watchdog) by sending a point-to-point message to the IP address from which the multicast came, again using port 17348. The format of this message is as follows:

={name}{type}[{name}{type}...]

?{name}{type}[{name}{type}...]

The first of these forms is a list of all the services available at the sending node. Each service comprises a name and a type code; both are URL-encoded strings which may contain any ASCII characters and be of any length. Each term is enclosed in curly braces. Any number of services can be described in this way. The second form contains the same data but requests the recipient to return a similar list of services. In this case it would be appropriate to reply to the service list using the first form but not the second, to avoid a race condition.

When one of these messages is received, the recipient parses the list of services (which may be empty) and builds suitable tables. Because this framework is designed for cooperating systems under the control of a single authority, no provision has been made for security against attack. In the applications for which the system is designed, unauthorised messages are never sent. The multicast messages cannot propagate beyond the local network, so discovery of nodes is limited to those having the same netmask.

Each time services are announced, the framework tells its local clients how many of these are relevant to it. This is where the type code comes in. A client having a type code "Clock" is only interested in services that announce themselves as Clock services. The framework will inform the client even if there are no Clock services available; this allows the client to detect the loss of a previous service.

Once a client is aware of a service, messages can pass between the two. The process is initiated by the client, since the service has no knowledge of which clients are aware of it. There are two kinds of communication, called TELL and ASK. The former is a command or instruction from a client to a service or vice versa; the latter is a request that requires an answer. But although a TELL does not require an answer, with both kinds of message the sender will be informed if the message failed to arrive at its destination.

A message comprises four tokens, as follows:

{id}{sender name}{recipient name}message

where the curly braces are part of the message. The first token is a numeric message ID; this is zero for a TELL message but for an ASK allows the sender to recoginize which message is being answered. The second and third tokens are the URL-encoded names of the sending and receiving services or clients, and the rest of the packet is the message itself, also URL-encoded but not enclosed in curly braces. This final part is unconstrained; you can send anything you please. In general, a particular type of service implies an agreed message format, but beyond a recommendation to stick to printable ASCII characters there's no need to be more specific. The name tokens are those used by the service announcements as part of the discovery process described above; the name of a client is only discovered by a service when a message arrives.

System usage

I'll use a simple example, that of a clock service providing the date and time, and a client that displays the date and time using a graphic component. A given network may have several clients all requiring to know the time to a reasonable degree of accuracy, and it's possible for there to be more than one clock service available. Timing can be derived from the system clock of one computer, from a GPS receiver or from an atomic clock published on the Internet.

The networker system makes no assumptions about locations. The communications between services and clients occurs in the same way whether the two are on the same or on different computers. The only constraint is that no two modules - services or clients - may have the same name on the same computer. It's best if every name is globally unique, which saves unexpected problems when moving services or clients from one location to another.

It makes no difference in which order services and clients are announced. There is a burst of network traffic following any node restarting, but the information will be exchanged between nodes no matter what the timing of startup. So although I shall describe the service starting first, the system will be quite happy to do things the other way round.

In this example, the first item to start is a node comprising a module that interfaces to a GPS receiver using a local RS-232 port. When the service starts up its node will issue a multicast, but being the first there are no other nodes to hear it so no responses are received.

Next, another node starts; this one contains a clock service that derives from a connection made over the Internet. Its initial multicast will be picked up by the GPS node and both will exchange service lists. Because neither has any clients, however, no notifications occur.

Now a client starts up somewhere else on the network. It's initial multicast is answered by service lists from both of the nodes already running, and the client will receive notification twice; firstly that a single clock service is available, then that two clock services are available. What use is made of this information is up to the client; it can either save the identities of both services for the user to make a choice, or it may pick one of them at random or according to some predetermined rule. This rule might indeed be as simple as to use the service called "GPS Clock". The client will also issue an empty service list, which will be received by both service nodes.

Finally, a second client starts up. It too will issue a multicast, and be answered by three nodes; its clock service will receive three notifications with some combination of zero, one or two services. The first client will also receive another notification of two services. This repetition is necessary because of the need to capture services that arrive late and to allow a client to deal with services that were previously known but are no longer available. It does point to the need for the client to take intelligent action, not just assume each notification is from a new service. The name, type and IP address of each service is available to the client.

Once the client has decided on a particular service (a decision it may choose to revoke later if a 'better' service is discovered) there are two ways it can get the time; by pull or by push. Pull involves sending an ASK message to the service requesting the current time; the service returns this in an appropriate (agreed) format. Push means registering with the service, which will then send regular messages until either it is unable to deliver one or the client asks it to stop. A clock service may be able to serve two or more clients; other types of service can only deal with a single client at a time. All the software for handling these logical data flows is the responsibility of the service and client programs written using the framework.

Potential applications

The range of applications for the networker system is very wide, but in the end comes down to the single principle of making transparent the network connecting together two or more computers. Programs are written in a modular fashion and made to communicate with each other using messages rather than implementing them as monolithic structures. This allows overloaded components to be easily moved to a new location without significant program changes.

In a control environment the system enables I/O hardware to be connected to a local computer but used by another machine on the network. In effect, the serial (or other) port on one computer is being made available for use by another, with the owner of the port able to perform elementary data gathering functions before onward transmission, thereby reducing the load on the second computer and permitting its functions to be moved around at will. If one of the nodes has an Internet connection it can be set up as a 'bridge' to a remote network; software to do this is currently under development.

Graham Trott
TechModern Limited
June 21, 2001