Patch Engineering

Tags

October 14

Taming Bundler

Posted by George Ogata

Bundler is the bomb for gem dependency management, but do you ever get sick of having to bundle install each time you switch git branches?

What happens is when you run your app, Bundler will cache some information (like your app’s load path) in .bundle/environment.rb. When you switch branches, the Gemfile may change underneath it, leaving the environment file stale. Now Bundler needs to ask you to rebuild your environment by running bundle install.

The solution? Maintain a per-branch bundler environment. Here’s how:

Put this file somewhere useful – either your home directory, or commit it into your project for others to use:

Now add this to your repository’s .git/hooks/post-checkout :

#!/bin/sh
exec /path/to/that/script $1 $2

Make sure both files are executable!

This will stash the old .bundle directory away, and swap in the one for the new branch each time you check out. Bundler never felt so smooth.

Happy bundling!

Comments [0]

May 6

Easily switch between multiple ruby interpreters using a four-line shell script

Posted by Mat Brown

Just thought I'd share a little tip on some local environment setup I did today. I'm a big fan of installing alternate Ruby versions as optional packages (that means they live in /opt/package-name-version; not in a shared hierarchy prefixed /opt, like MacPorts incorrectly does). Previously I'd mostly been using this to run test suites against a bunch of different versions, but now that Patch using Enterprise Edition in production I'd like to be able to switch my default ruby interpreter with minimum hassle.

Turns out it's really easy, and you don't need to install RVM and install ruby interpreters in your home directory (why does everyone seem to want to do that?). Here's what I did:

Create a directory ~/.ruby-opt

Write an executable shell script called ruby-opt, and put it somewhere in your PATH (I have ~/bin in my PATH for this sort of thing). The shell script looks like this:

#!/bin/sh
export PACKAGE=`ls -d1 /opt/ruby-*$1* | head -n1`
echo "Changing ruby interpreter to $PACKAGE"
ln -sfv $PACKAGE/bin ~/.ruby-opt
ruby -v

In your .bashrc, put this line:

export PATH=~/.ruby-opt/bin:$PATH

And you're done. The one catch here is that all the interpreters you want to use have to be installed as optional packages. Now when you want to switch versions, just run something like:

$ ruby-opt 1.8.6

The script uses wildcard expansion to find something in your /opt directory that contains both "ruby" and the argument you passed, so you only need to be specific enough for that to catch the right one.

Comments [2]

April 22

Quick, local short_urls

Posted by John Crepezzi

On patch.com, we generate short_urls for our editors to use on services like Twitter.  The benefit of generating the short URL internally is three-fold.  

  1. We get to maintain the Patch branding
  2. We don't have to rely on an external shortening service
  3. We can be in control of how our links are created

The last one is really fun, because it means that we can cleverly name our short_urls in a way that they don't need to make an extra request to the database for lookup.  So, instead of a user requesting http://patch.com/bE582, and us looking up in a table what page that references, we can created a short URL like: http://patch.com/L-dbrB and serve the user Listing 281923.

Its really simple, as is the code to support it:

# John Crepezzi  April 22, 2010
class ShortId
  
  # We cute out vowels to avoid shortened strings from mistakenly forming words
  Alphabet = 'bcdfghjklmnpqrstvwxyz0123456789BCDFGHJKLMNPQRSTVWXYZ'
  AlphabetLength = Alphabet.length

  # Encode a numeric ID
  def self.encode(id)
    alpha = ''
    while id != 0
      alpha = Alphabet[id % AlphabetLength].chr + alpha
      id /= AlphabetLength
    end
    alpha
  end

  # Decode an ID created with self.encode
  def self.decode(alpha)
    alpha = alpha.dup; id = 0
    0.upto(alpha.length - 1) do |i|
      id += Alphabet.index(alpha[-1]) * (AlphabetLength ** i)
      alpha.chop!
    end
    id
  end

end
Enjoy!

Comments [0]

April 2

RakeServer is a client-server Rake architecture for fast task invocation

Posted by Mat Brown

At Patch, as is generally the case with complex applications, we run quite a few jobs in the background using cron — things like updating the current weather, downloading new Twitter tweets, etc. Our general pattern is to have cron invoke rake on whatever schedule makes sense to us.

This works well as far as it goes, but our app is quite beefy and it takes a shall we say nontrivial amount of time to start. Since most of our scheduled rake tasks require loading the environment, that ends up being a lot of time and energy spent loading the environment to run a task that might only take a few seconds. As they say, it doesn't really scale.

Enter RakeServer, a tool Cedric and I hacked up this week. The idea here is to run Rake tasks using a client-server architecture, with a long-running server that responds to requests from clients to invoke tasks. The crucial feature of RakeServer is that it can also invoke tasks greedily when the server first starts — in particular the :environment task, which loads the Rails environment. That way, when the client invokes Rails-dependent tasks, the server is able to respond immediately without waiting all that time for the environment to load.

A few other details of how RakeServer works, before we look at a typical (our) use case. First, each time a task request is made, the server forks a child process to actually run it -- so tasks can be run in parallel by different clients if so desired. Second, the server captures the tasks' output via a pipe and streams it back to the server, so the client ends up with the tasks' output on stdout, a lot like if it were a normal invocation of Rake (note that right now, there's no distinction made between stdout and stderr; it all goes to stdout. That could change). The client process stays alive as long as the server task is running, and then exits when it completes.

OK, let's use this thing

First, install the gem:

$ sudo gem install rake_server

Now we're going to set up a pre-fork and post-fork hook. If you naively fork the Rails environment, ActiveRecord's database connection gets all messed up, so your best bet is to disconnect from the database before forking and re-connect afterwards.

Let's create a file lib/tasks/rake_server.rake:

namespace :rake_server
  task :fork_hooks => :environment
    RakeServer::Server.before_fork { ActiveRecord::Base.remove_connection }
    RakeServer::Server.after_fork { ActiveRecord::Base.establish_connection }
  end
end

Now, we can start rake-server. By default, RakeServer looks for your Rakefile in all the same places Rake does:

$ rake-server start rake_server:fork_hooks

The first argument tells RakeServer to run in the background (run would run it in the foreground); the other arguments are rake tasks to run immediately when the server starts. In this case, we're going to set up our fork hooks, and also load the environment (since the :environment task is a dependency of the :fork_hooks task).

After a few seconds, you'll see output that rake-server is up and running:

rake-server listening on 127.0.0.1:7253

And finally, we can invoke a task!

$ rake-client db:migrate

And it's that easy, folks - you'll see your migrations start running right away.

RakeServer is still a young project and there are a few improvements I'd like to make, but at a basic level, it's already working, so take it for a spin today.

Comments [0]

March 4

Sunspot 1.0 released!

Posted by Mat Brown

Sunspot 1.0, long assumed to be vaporware, has finally arrived. Read about all the cool new features.

Comments [0]

February 18

ActiveSupport Tip of the Day: Hash#slice, Hash#except

Posted by Mat Brown

Recently, while working on a side project, I had occasion to pore methodically over the ActiveSupport library. I was surprised to find that, despite my having worked in Rails for a shade under two years, there were quite a few methods in ActiveSupport that I was unaware of -- some of them pretty useful!

This brings us to the first part of our four-hundred thirty-five part series, ActiveSupport Tip of the Day .

Be picky about your hash keys

Today's tip regards two methods defined on the Hash class: slice and except. except does exactly what you'd expect - it returns a new Hash with the specified keys removed. slice does the opposite of except, which isn't necessarily what you'd expect from the name, but oh well.

So here's how they work:

hash = {:a => 1, :b => 2, :c => 3, :d => 4} 
=> {:a=>1, :b=>2, :c=>3, :d=>4} 
hash.slice(:a, :b) 
=> {:a=>1, :b=>2} 
hash.except(:a, :b) 
=> {:c=>3, :d=>4}

Not earth-shattering stuff, but these are definitely handy for filtering out an allowed set of options to pass into a method:

Post.all(:conditions => params.slice(:name, :category_id))

Both methods also have bang-forms, which modify the hash in place rather than returning a new one.

That is all. Soldier on, soldiers.

Comments [0]

February 12

Why your IP blacklisting isn't working, and how to fix it

Posted by Mat Brown

One of the major tools of spam prevention is IP blacklisting - if a given IP address belongs to a known spammer, then spam blockers like Akismet will consider it spam. So, let's say we're building a little spam checker in our comments controller:

# app/controllers/comments_controller.rb
class CommentsController
  def create
    viking = Viking::Akismet.new
    spam_response = viking.check_comment(
      :user_ip => request.remote_ip,
      :user_agent => request.user_agent
      # etc
    )
    if spam_response[:spam]
      render :status => 404
    else
      Comment.create(params[:comment])
      flash("Your comment was created!)
      redirect_to :back
    end
  end
end

So far, so good. But let's say one day you're in the situation where you're Rails application container is running behind several other machines - perhaps a hardware load balancer, delegating out to an Apache instance, which goes to a haproxy instance, which finally load balances out to your thins. And let's say further that your data center isn't running behind a typical NAT system - so those intermediate machines don't have a 192.168 or 10. IP address.

As it turns out, in this case the request.remote_ip method will return the IP address of your load balancer, or whatever machine is sitting in front of you in the request chain. To figure out why, let's take a look at that method in Rails:

# lib/action_controller/request.rb
def remote_ip
  remote_addr_list = @env['REMOTE_ADDR'] && @env['REMOTE_ADDR'].scan(/[^,\s]+/)

  unless remote_addr_list.blank?
    not_trusted_addrs = remote_addr_list.reject {|addr| addr =~ TRUSTED_PROXIES}
    return not_trusted_addrs.first unless not_trusted_addrs.empty?
  end
  # some more stuff here...
end

So what we're doing here is getting a list of all of the IPs of machines that this request has gone through on the way to your Rails instance. Then we return the first IP in that list that isn't considered a "trusted proxy".

TRUSTED_PROXIES = /^127\.0\.0\.1$|^(10|172\.(1[6-9]|2[0-9]|30|31)|192\.168)\./i

OK, so it's localhost, anything that starts with 10., and the usual suspects of 192.168 etc. But like I said before, you're not behind NAT, so your load balancer's IP doesn't match this list. As far as Rails is concerned, every request is coming from a load balancer, and you're not a spammer, are you?

The solution is pretty straightforward, and much easier than figuring out the problem: just modify that TRUSTED_PROXIES hash in an initializer. Here's how ours looks:

# config/initializers/trusted_proxies.rb
if CONFIG[:trusted_proxies]
  builtin_trusted_proxies = ActionController::Request::TRUSTED_PROXIES
  ActionController::Request.module_eval do
    remove_const(:TRUSTED_PROXIES)
    const_set(:TRUSTED_PROXIES, Regexp.union(builtin_trusted_proxies, Regexp.new(CONFIG[:trusted_proxies])))
  end
end

All we're doing here is reading a list of trusted proxies (i.e., IPs or partial IPs in our data center) out of our application config file, and then merging it into the existing TRUSTED_PROXIES regexp.

Voila! Now request.remote_ip will actually evaluate to the IP of the requestor.

Comments [1]