Seeding Your Rails Database With Reference Data
Migrations: Not Guaranteed To Work
Migrations are an excellent way to evolve your Rails app’s database schema in step with the models. However if you fall more than one migration behind, there is no guarantee that you will be able to execute the outstanding ones successfully.
This is because when a migration starts it instantiates the corresponding model (update: if you ask it to). Since models are not versioned as migrations are, you will have the latest model with the pre-migration schema. The migration will fail if the two are incompatible.
What Are Reference Data?
Most applications need some reference data (a.k.a. basic data). You can think of this as background knowledge for your app. It’s generally not the data your users create, though they may update it from time to time; it’s data on which their data depends.
I worked at one of Europe’s largest hedge funds for four years building heaps of software. Throughout that time, indeed to this day, the one project that never seemed to be finished was the Reference Data Project. Currencies, exchanges, financial instruments, counterparties, holidays, contract notice dates… they’re all background facts that the systems need to know in order to trade.
Loading Reference Data
Until recently when I noticed that migrations aren’t guaranteed to run all the way through, I used my migrations to create my reference data. A migration defining a
countries table would then execute a few
Country.creates. I was combining data definition with data loading.
Now we know we should use a different mechanism for data definition, we need to separate the data loading. Ideally I’d like to write fixtures for my reference data and load those into the database, having first loaded the schema.
Although Rails does provide a way to load fixtures —
rake db:fixtures:load — it loads your test fixtures. This isn’t what we want: test data and reference data are different kettles of ball games (thanks Dr Brown!) and should not be conflated.
Surprisingly Mr Google thought I was the only person in the world with this problem so I wrote my own Rake task to load fixtures from the
db/basic_data directory. Invoke it like this:
$ rake db:fixtures:basic_data
Optionally you can pass
FIXTURES=x,y to specify which fixtures to load, just like Rails'
The Manage Fixtures plugin doesn’t load reference data the way I have just described but it’s jolly useful in the right circumstances. A few months ago I helped somebody move their Rails app off a shared host; the host didn’t allow one to run
mysqldump to dump the database — but Manage Fixtures would have got the data out.
[12th Februrary 2008] I just found another article proposing the same solution. Great minds think alike.
The problem with the method proposed above is the lack of validation. This alternative resolves that.