Ruby Programming/Standard Library/DRb

Distributed Ruby (DRb) allows inter-process communication between Ruby programs by implementing remote procedure calling.

Introduction
Distributed Ruby enables remote method calling for Ruby. It is part of the standard library and therefore you can expect it to be installed on most systems using MRI Ruby. Because the underlying object serialization depends on Marshal, which is implemented in C, good speeds are expectable.

Let's start with a simple example, so the use of this module becomes clear.

Here's, where we create a single instance of an object (in this case a Hash) and share it on TCP port 9000.

And here's :

Start the server in one shell session (or in the background), and in another session run the client a few times:

$ ruby client.rb 0 1 Last access time = $ ruby client.rb 1 2 Last access time = Fri Oct 22 22:23:59 BST 2004

The server and the client don't need to be run on the same machine. If you want the server to listen on all interfaces (and therefore also on remote connections) you need to change 'localhost' to '0.0.0.0' in. The client then needs to be configured to connect to the remote server by replacing 'localhost' with the IP (or hostname) of the server in.

Even just this simple example is immensely powerful. The above object could be used as a shared data store for session data on a webserver. Each web page request can look up and store information in this shared object. It works whether the web pages are served via standalone CGI scripts, Webrick threads, Apache mod_ruby, or fcgi/mod_fastcgi. It even works if you have a cluster of webservers. Furthermore, the session data is not lost if you restart Apache.

Functionality
DRb is actually rather sophisticated and elegant in its design, but the fundamental principle is very straightforward:

DRb packages up a method call as an array containing the method name and the arguments, converts it into a stream of bytes using the Marshal library, and sends it to the server. The server then executes the call on the front object to determine the result. The so received return value and eventual exceptions are put into another array, converted into a stream of bytes and sent back to the client.

Since DRb is written in Ruby, you can look at the code, which contains lots of comments and examples. You can find it on your system at a location like  or you can find a parsed version of the documentation and the examples here.

Security
If you are using a DRb object to store session data, make sure that only the webserver can contact your DRb object, and that it is not directly accessible from the outside, otherwise unwelcome guests could directly manipulate its contents. You can bind it to localhost (127.0.0.1) if all clients are on the same machine; otherwise you can put it on a separate private network, use firewall rules or DRb ACLs to block access from unwanted clients. It is important to do this before calling.

Example usage of :

Beware that every object contains methods which could be very dangerous if called by a hostile party. Some of these are private (e.g. exec, system) and DRb prevents these from being called, but there are other public methods which are equally dangerous (e.g. send, instance_eval, instance_variable_set). Consider for example.

So sharing an object with the whole Internet is a risky business. If you're going to do this then you should run with at least, and you should start your object from a blank slate without these dangerous methods included. You can achieve that like this:

Note that this example doesn't use  for setting @count to 0. If it did this, clients would be able to reset @count by calling the  method.

Here's an alternative implementation from Evil-Ruby.

Additionally, rather than sharing your original object, you may wish to build a wrapper object and share that instead. The wrapper object can have a limited set of methods (just the ones you really want to share), validate the parameters of incoming data, and delegate to another object when the data has been sanitised.

Thread-safety
Each incoming method call which hits the object you've shared by DRb is executed in a new thread. This is pretty essential if you think about it; there may be many clients, and the server can't control when the clients decide to send method calls to it. DRb does not serialise the requests, so that one client can't block out the other clients.

However, this means you have to take the same care with your DRb object as you would in any other threaded application. Consider what happens, for example, if two clients both decided to run at the same time. It might happen that both clients would retrieve obj[:counter] and see the same value (say 100), then independently add 1, and then both write back 101. That's probably not what you want, if :counter is supposed to generate unique sequence numbers.

Even the method  shown at the top of this page suffers the same problem, because two clients could decide to call   at the same time, causing two threads on the server to suffer the same race condition. The solution is to protect the increment operation with a :

Uncopyable objects
Why does the client run ?

A very good question, which leads us on to another interesting aspect of DRb.

In normal operation, DRb will use Marshal to send the arguments to a method call; when they are unmarshalled at the server side, it will have a copy of those objects. The same applies to the result returned from the method; it will be marshalled, sent back, and the client will have a copy of that object.

In many simple cases this copying of objects is not a problem, but there are several cases where it might be:


 * If the server makes a change to the local copy it received, then the client won't see that change.
 * The argument or response objects could be extremely large, and you might not want to send them back and forth (such as an object which holds references to other objects, forming a tree)
 * Some types of objects cannot be marshalled at all: they include files, sockets, procs/blocks, objects with a singleton class, and any object which contains those objects indirectly, e.g. in an instance variable.

In these cases, DRb can instead send over a 'proxy object' containing contact details to allow the original object to be called via DRb: that is, the hostname and port where the original object can be found. This is done automatically for any object which cannot be marshalled, or you can force it by including DRbUndumped in your object.

How can we demonstrate this? Well, consider the class defined in the following file, Now, let's have a server which accepts an object and calls 'inc' on it: Here's the corresponding client: Now, here's what happens if we run it:

$ ruby client2.rb
 * 1) 
 * 2) 
 * 3) 
 * 4) 

Oops. We passed across our objects 'a' and 'b', but because they were copied onto the server, only the local copies got updated by 'inc'. The objects on the client are unaffected.

Now try modifying the definition of Foo like this: Or alternatively you can modify the client program like this: And now the result is what we'd hope for:

$ ruby client2.rb
 * 1) 
 * 2) 
 * 3) 
 * 4) 

So what's happened is, instead of marshalling across an instance of Foo, we have marshalled across the information needed to build a proxy object: it contains the client's hostname, port, and object id which can be used to talk to the original object. When we pass across the proxy object for 'a' to the server, and it calls obj.inc, the 'inc' method call is made back over DRb to the client machine where object 'a' actually lives. You have effectively built a remote 'reference' to the object which can be passed around much like a normal object reference, except it can be handed from machine to machine. Method calls via this reference hit the same object.

Now, this is why the client program needs to run  - even though it's a "client" from our point of view, there might be method call arguments which generate these DRb proxy 'references', at which point the client also becomes a server for those objects.

We didn't specify a host or port here, so DRb chooses any spare TCP port on the system, and the host is whatever the system hostname is according to the 'gethostname' call - e.g. if the machine is called server.example.com then DRb might choose druby://server.example.com:45123

These two-way method calls can be a problem though when there is a firewall between the two machines. You can choose a fixed port on the client side in DRb.start_service instead of having one chosen dynamically; that lets you open up a hole in the firewall for DRb. However, if you are behind a NAT firewall, it almost certainly won't work at all.

Running DRb over SSH
One way to solve the problem with two-way method calls through a firewall is to run DRb over SSH. Not only do you get two-way operation with just a single outbound TCP connection through the firewall; you also have your method calls securely encrypted!

Here's how to set it up.

Voila, you are up and running. You can try the DRbUndumped example from above, with the client behind a NAT firewall. Also notice that the ssh -L and -R options bind to 127.0.0.1 by default, so people on other machines cannot connect to the tunnel endpoints (although of course, other people on the same machine can do so).
 * 1) Choose one port for the client end (say 9000) and one for the server end (say 9001)
 * 2) Establish an ssh connection with a pair of tunnels: port 9001 at the client side is redirected to port 9001 at the server side, and port 9000 at the server side is redirected to port 9000 at the client side. $ ssh -L9001:127.0.0.1:9001 -R9000:127.0.0.1:9000 server.example.com The -L flag requests that connections to port 9001 at the local (client) side are redirected through the ssh tunnel, and reconnected to 127.0.0.1:9001 at the server side. The -R flag request that connections to port 9000 at the remote (server) side are redirected back down the ssh tunnel, and connected to 127.0.0.1:9000 at the client side.
 * 3) At the server side, do DRb.start_service('druby://127.0.0.1:9001', a) as you would normally
 * 4) At the client side, do DRb.start_service('druby://127.0.0.1:9000') instead of just DRb.start_service. This gives us a fixed port number to work from.
 * 5) At the client side, connect to the remote object as:

An alternative to establishing an SSH connection from the command line is to use Net::SSH, a pure-Ruby implementation of SSH. If you haven't already, install Net::SSH using. To create a connection, execute the following before using DRb:

Following this, you can execute DRb code in the main thread as you would in the previous SSH example. The  Queue simply forces the main thread to wait for the channel to open.

NOTE: Do not use 'localhost' in place of '127.0.0.1' when using SSH and DRb, it can cause connections to be refused.

Running DRb over SSL
SSL is another way to secure and encrypt your connections (note: SSL and SSH are *not* the same thing!)

Online tutorial: HTTP://segment7.net/projects/ruby/drb/DRbSSL/

Running DRuby through firewalls - ruby-only solution ( HTTP://www.ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/89976 ) Often a client has firewall installed, so standard DRb will not be able to make callbacks, making block/io/DRbUndumped? arguments useless. To make sure DRb operates as normal, one can use HTTP://rubyforge.org/projects/drbfire and HTTP://drbfire.rubyforge.org/classes/DRbFire.html

from documentation:
 * 1) Start with require 'drb/drbfire'.
 * 2) Use drbfire:// instead of druby:// when specifying the server url.
 * 3) When calling DRb.start_service on the client, specify the server's uri as the uri (as opposed to the normal usage, which is to specify *no* uri).
 * 4) Specify the right configuration when calling DRb.start_service, specifically the role to use. Server: DRbFire::ROLE => DRbFire::SERVER and client: DRbFire::ROLE => DRbFire::CLIENT

Simple server:

And a simple client: