Sorry this page looks weird. It was automatically migrated from my old blog, which had a different layout and different CSS.

Deploying Thinking Sphinx

This is how I like to deploy Thinking Sphinx. In summary:

  1. Install Sphinx on the server.
  2. Decide where you want Sphinx’s PID file and indexes in production.
  3. Ignore Sphinx’s configuration and indexes in development.
  4. Configure Capistrano to work with Thinking Sphinx.
  5. Set up cron on the server to re-index your data regularly.

Instructions

1. Install Sphinx on the server.

On server:

$ curl -O http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz
$ gzip -d sphinx-0.9.9.tar.gz
$ tar xvf sphinx-0.9.9.tar
$ cd sphinx-0.9.9
$ ./configure
$ make
$ sudo make install

Or on your development machine (once you’ve hooked up Capistrano; see step 4):

$ cap thinking_sphinx:install:sphinx

And to make sure it installed correctly:

$ search

Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

Usage: search [OPTIONS] 
...

2. Decide where you want Sphinx’s PID file and indexes in production.

I like all my PID files in /var/run/<process>. We also want to preserve Sphinx’s indexes across deployments.

In your app create config/sphinx.yml:

production:
  pid_file: /var/run/sphinx/searchd.pid
  searchd_file_path: /path/to/your/app/shared/db/sphinx

On server:


$ sudo mkdir /var/run/sphinx
$ sudo chown deploy:deploy /var/run/sphinx
$ mkdir -p /path/to/your/app/shared/db/sphinx

Adjust the ownership to suit your needs.

3. Ignore Sphinx’s configuration and indexes in development.

This isn’t really a deployment step but it needs to be done.

Add to .gitignore:

config/development.sphinx.conf
db/sphinx/*

4. Configure Capistrano to work with Thinking Sphinx.

Add this to your config/deploy.rb:

require 'thinking_sphinx/deploy/capistrano'

# Thinking Sphinx typing shortcuts
namespace :ts do
  task :conf do
    thinking_sphinx.configure
  end
  task :in do
    thinking_sphinx.index
  end
  task :start do
    thinking_sphinx.start
  end
  task :stop do
    thinking_sphinx.stop
  end
  task :restart do
    thinking_sphinx.restart
  end
  task :rebuild do
    thinking_sphinx.rebuild
  end
end

# http://github.com/jamis/capistrano/blob/master/lib/capistrano/recipes/deploy.rb
# :default -> update, restart
# :update  -> update_code, symlink
namespace :deploy do
  desc "Link up Sphinx's indexes."
  task :symlink_sphinx_indexes, :roles => [:app] do
    run "ln -nfs #{shared_path}/db/sphinx #{release_path}/db/sphinx"
  end

  task :activate_sphinx, :roles => [:app] do
    symlink_sphinx_indexes
    thinking_sphinx.configure
    thinking_sphinx.start
  end

  before 'deploy:update_code', 'thinking_sphinx:stop'
  after 'deploy:update_code', 'deploy:activate_sphinx'
end

If you use monit to supervise your Sphinx daemon, you should override Thinking Sphinx’s stop and start tasks so they use monit too.

5. Set up cron on the server to re-index your data regularly.

Edit your cron table (crontab -e) and add something along the lines of:

0 * * * * cd /path/to/your/app/current && /usr/local/bin/rake RAILS_ENV=production thinking_sphinx:index

Comments

Thanks for the great walkthrough. You got me deployed in about 5 minutes.

The only issue I ran into was even with searchd running, the rake ts:stop thought it wasn’t and would break the deploy. I told ts to index the files instead (which automatically reloads searchd upon completion, if searchd is running) and that worked perfectly.

Cheers!

galen • 21 April 2009

Hi, thanks for this. Gave me the tips I needed to override the hostname and paths on production. Am now Thinking Sphinx enabled!

Darren • 25 April 2009

First, your article here was a huge help. i had similar issues with the starting and stopping as well. I use moonshine for deployment, and thought this may help others.

config/sphinx.yml

  production:
    config_file: /path/to/your/app/shared/config/production.sphinx.conf
    searchd_file_path: /path/to/your/app/shcf/shared/db/sphinx/production
    searchd_log_file: /path/to/your/app/shcf/shared/log/searchd.log
    query_log_file: /path/to/your/app/shcf/shared/log/searchd.query.log
    pid_file: /path/to/your/app/shcf/shared/log/searchd.production.pid

and then added to the moonshine_cap.rb

set :branch, 'master'
set :scm, :git
set :git_enable_submodules, 1
ssh_options[:paranoid] = false
ssh_options[:forward_agent] = true
default_run_options[:pty] = true
set :keep_releases, 2
set :rails_env, 'production'

after 'deploy:restart', 'deploy:cleanup'
after 'deploy:symlink', 'app:symlinks:update'

#load the moonshine configuration into
require 'yaml'
begin
  hash = YAML.load_file(File.join((ENV['RAILS_ROOT'] || Dir.pwd), 'config', 'moonshine.yml'))
  hash.each do |key, value|
    set(key.to_sym, value)
  end
rescue Exception
  puts "To use Capistrano with Moonshine, please run 'ruby script/generate moonshine',"
  puts "edit config/moonshine.yml, then re-run capistrano."
  exit(1)
end

namespace :moonshine do

  desc <<-DESC
  Bootstrap a barebones Ubuntu system with Git, Ruby, RubyGems, and Moonshine
  dependencies. Called by deploy:setup.
  DESC
  task :bootstrap do
    begin
      config = YAML.load_file(File.join(Dir.pwd, 'config', 'moonshine.yml'))
      put(YAML.dump(config),"/tmp/moonshine.yml")
    rescue
      puts "Please run 'ruby script/generate moonshine' and configure config/moonshine.yml first"
      exit(0)
    end
    put(File.read(File.join(File.dirname(__FILE__), '..', 'lib', 'moonshine_setup_manifest.rb')),"/tmp/moonshine_setup_manifest.rb")
    put(File.read(File.join(File.dirname(__FILE__), "bootstrap.#{fetch(:ruby, 'ree')}.sh")),"/tmp/bootstrap.sh")
    sudo 'chmod a+x /tmp/bootstrap.sh'
    sudo '/tmp/bootstrap.sh'
    sudo 'rm /tmp/bootstrap.sh'
    sudo "shadow_puppet /tmp/moonshine_setup_manifest.rb"
    sudo 'rm /tmp/moonshine_setup_manifest.rb'
    sudo 'rm /tmp/moonshine.yml'
  end

  desc 'Apply the Moonshine manifest for this application'
  task :apply do
    on_rollback do
      run "cd #{current_release} && RAILS_ENV=#{fetch(:rails_env, 'production')} rake --trace environment"
    end
    sudo "RAILS_ROOT=#{current_release} DEPLOY_STAGE=#{ENV['DEPLOY_STAGE']||fetch(:stage,'undefined')} RAILS_ENV=#{fetch(:rails_env, 'production')} shadow_puppet #{current_release}/app/manifests/#{fetch(:moonshine_manifest, 'application_manifest')}.rb"
  end

  desc "Update code and then run a console. Useful for debugging deployment."
  task :update_and_console do
    set :moonshine_apply, false
    deploy.update_code
    app.console
  end

  desc "Update code and then run 'rake environment'. Useful for debugging deployment."
  task :update_and_rake do
    set :moonshine_apply, false
    deploy.update_code
    run "cd #{current_release} && RAILS_ENV=#{fetch(:rails_env, 'production')} rake --trace environment"
  end

  after 'deploy:finalize_update' do
    local_config.upload
    local_config.symlink
  end

  before 'deploy:restart' do
    apply if fetch(:moonshine_apply, true) == true
  end

end

namespace :app do

  namespace :symlinks do

    desc <<-DESC
    Link public directories to shared location.
    DESC
    task :update, :roles => [:app, :web] do
      fetch(:app_symlinks, []).each { |link| run "ln -nfs #{shared_path}/public/#{link} #{current_path}/public/#{link}" }
    end

  end

  desc "remotely console"
  task :console, :roles => :app, :except => {:no_symlink => true} do
    input = ''
    run "cd #{current_path} && ./script/console #{fetch(:rails_env, "production")}" do |channel, stream, data|
      next if data.chomp == input.chomp || data.chomp == ''
      print data
      channel.send_data(input = $stdin.gets) if data =~ /^(>|\\?)>/
    end
  end

  desc "Show requests per second"
  task :rps, :roles => :app, :except => {:no_symlink => true} do
    count = 0
    last = Time.now
    run "tail -f #{shared_path}/log/#{fetch(:rails_env, "production")}.log" do |ch, stream, out|
      break if stream == :err
      count += 1 if out =~ /^Completed in/
      if Time.now - last >= 1
        puts "#{ch[:host]}: %2d Requests / Second" % count
        count = 0
        last = Time.now
      end
    end
  end

  desc "tail application log file"
  task :log, :roles => :app, :except => {:no_symlink => true} do
    run "tail -f #{shared_path}/log/#{fetch(:rails_env, "production")}.log" do |channel, stream, data|
      puts "#{data}"
      break if stream == :err
    end
  end

  desc "tail vmstat"
  task :vmstat, :roles => [:web, :db] do
    run "vmstat 5" do |channel, stream, data|
      puts "[#{channel[:host]}]"
      puts data.gsub(/\\s+/, "\\t")
      break if stream == :err
    end
  end

end

namespace :local_config do

  desc <<-DESC
  Uploads local configuration files to the application's shared directory for
  later symlinking (if necessary). Called if local_config is set.
  DESC
  task :upload do
    fetch(:local_config,[]).each do |file|
      filename = File.split(file).last
      if File.exist?( file )
        put(File.read( file ),"#{shared_path}/config/#{filename}")
      end
    end
  end
  
  desc <<-DESC
  Symlinks uploaded local configurations into the release directory.
  DESC
  task :symlink do
    fetch(:local_config,[]).each do |file|
      filename = File.split(file).last
      run "ls #{current_release}/#{file} 2> /dev/null || ln -nfs #{shared_path}/config/#{filename} #{current_release}/#{file}"
    end
  end
  
end

# Thinking Sphinx
namespace :thinking_sphinx do
  task :configure, :roles => [:app] do
    run "cd #{current_path};  rake thinking_sphinx:configure RAILS_ENV=#{rails_env}"
  end
  task :index, :roles => [:app] do
    run "cd #{current_path};  rake thinking_sphinx:index RAILS_ENV=#{rails_env}"
  end
  task :start, :roles => [:app] do
    run "cd #{current_path};  rake thinking_sphinx:start RAILS_ENV=#{rails_env}"
  end
  task :stop, :roles => [:app] do
    run "cd #{current_path};  rake thinking_sphinx:stop RAILS_ENV=#{rails_env}"
  end
  task :restart, :roles => [:app] do
    run "cd #{current_path};  rake thinking_sphinx:restart RAILS_ENV=#{rails_env}"
  end
end

# Thinking Sphinx typing shortcuts
namespace :ts do
  task :configure, :roles => [:app] do
    run "cd #{current_path};  rake thinking_sphinx:configure RAILS_ENV=#{rails_env}"
  end
  task :in, :roles => [:app] do
    run "cd #{current_path};  rake thinking_sphinx:index RAILS_ENV=#{rails_env}"
  end
  task :start, :roles => [:app] do
    run "cd #{current_path};  rake thinking_sphinx:start RAILS_ENV=#{rails_env}"
  end
  task :stop, :roles => [:app] do
    run "cd #{current_path};  rake thinking_sphinx:stop RAILS_ENV=#{rails_env}"
  end
  task :restart, :roles => [:app] do
    run "cd #{current_path};  rake thinking_sphinx:restart RAILS_ENV=#{rails_env}"
  end
end

namespace :deploy do
  desc "Restart the Passenger processes on the app server by touching tmp/restart.txt."
  task :restart, :roles => :app, :except => { :no_release => true } do
    run "touch #{current_path}/tmp/restart.txt"
  end

  [:start, :stop].each do |t|
    desc "#{t} task is a no-op with Passenger"
    task t, :roles => :app do ; end
  end

  desc <<-DESC
    Prepares one or more servers for deployment. Before you can use any \\
    of the Capistrano deployment tasks with your project, you will need to \\
    make sure all of your servers have been prepared with `cap deploy:setup'. When \\
    you add a new server to your cluster, you can easily run the setup task \\
    on just that server by specifying the HOSTS environment variable:
 
      $ cap HOSTS=new.server.com deploy:setup
 
    It is safe to run this task on servers that have already been set up; it \\
    will not destroy any deployed revisions or data.
  DESC
  task :setup, :except => { :no_release => true } do
    moonshine.bootstrap
  end
  
  task :before_update do
    # Stop Thinking Sphinx before the update so it finds its configuration file.
    thinking_sphinx.index
  end
  
  task :after_update do
    symlink_sphinx_indexes
    thinking_sphinx.configure
    thinking_sphinx.start
  end
  
  desc "Link up Sphinx's indexes."
  task :symlink_sphinx_indexes, :roles => [:app] do
    run "ln -nfs #{shared_path}/db/sphinx #{current_path}/db/sphinx"
  end
end

its long, but i hope it helps. I spent days trying to get this to work. You do need to have the TS plugin installed, and i had to ssh into my slice and install sphinx according to the directions above. Bob

Bob Hanson • 1 May 2009

I think the first line to add to .gitignore should be:

config/development.sphinx.conf
Erik Ostrom • 19 June 2009

If you find that your scheduled thinking sphinx task is running but your index isn’t updating then you probably need to add PATH and possibly SHELL variables to the top of your crontab file because cron doesn’t load your users ennvironment.

Andy Ferra • 13 July 2009

Thanks for the clear concise walk through. Helped me solve a problem.

Keep up the good work!

Alastair Brunton • 10 November 2009

Andrew Stewart • 10 April 2009 • RailsDeploymentDatabases
You can reach me by email or on Twitter.