At Patch, as is generally the case with complex applications, we run quite a
few jobs in the background using cron — things like updating the current
weather, downloading new Twitter tweets, etc. Our general pattern is to have
cron invoke rake on whatever schedule makes sense to us.
This works well as far as it goes, but our app is quite beefy and it takes a
shall we say nontrivial amount of time to start. Since most of our scheduled
rake tasks require loading the environment, that ends up being a lot of time and
energy spent loading the environment to run a task that might only take a few
seconds. As they say, it doesn't really scale.
Enter RakeServer, a tool Cedric and
I hacked up this week. The idea here is to run Rake tasks using a client-server
architecture, with a long-running server that responds to requests from clients
to invoke tasks. The crucial feature of RakeServer is that it can also invoke
tasks greedily when the server first starts — in particular the
:environment task, which loads the Rails environment. That way, when the
client invokes Rails-dependent tasks, the server is able to respond immediately
without waiting all that time for the environment to load.
A few other details of how RakeServer works, before we look at a typical (our)
use case. First, each time a task request is made, the server forks a child
process to actually run it -- so tasks can be run in parallel by different
clients if so desired. Second, the server captures the tasks' output via a pipe
and streams it back to the server, so the client ends up with the tasks' output
on stdout, a lot like if it were a normal invocation of Rake (note that right
now, there's no distinction made between stdout and stderr; it all goes to
stdout. That could change). The client process stays alive as long as the server
task is running, and then exits when it completes.
OK, let's use this thing
First, install the gem:
$ sudo gem install rake_server
Now we're going to set up a pre-fork and post-fork hook. If you naively fork the
Rails environment, ActiveRecord's database connection gets all messed up, so
your best bet is to disconnect from the database before forking and re-connect
afterwards.
Let's create a file lib/tasks/rake_server.rake:
namespace :rake_server
task :fork_hooks => :environment
RakeServer::Server.before_fork { ActiveRecord::Base.remove_connection }
RakeServer::Server.after_fork { ActiveRecord::Base.establish_connection }
end
end
Now, we can start rake-server. By default, RakeServer looks for your
Rakefile in all the same places Rake does:
$ rake-server start rake_server:fork_hooks
The first argument tells RakeServer to run in the background (run would run it
in the foreground); the other arguments are rake tasks to run immediately when
the server starts. In this case, we're going to set up our fork hooks, and also
load the environment (since the :environment task is a dependency of the
:fork_hooks task).
After a few seconds, you'll see output that rake-server is up and running:
rake-server listening on 127.0.0.1:7253
And finally, we can invoke a task!
And it's that easy, folks - you'll see your migrations start running right away.
RakeServer is still a young project and there are a few improvements I'd like to
make, but at a basic level, it's already working, so take it for a spin today.