Syncing CouchDB to ActiveRecord

I have been tasked with writing a mobile app for collecting data about gas wells. My forays into the field before showed me how difficult the synchronization of complex data is, so I wanted to use CouchDB for the data store.

CouchDB has a pool of javascript objects, each with a guid so they can be copied with impunity, The server keeps track of a revision history as well as conflict history when there are conflicting writes.

Unfortunately, the client needs the capacity to run SQL queries on the data, so backing Rails with CouchRest Model isn't an option. This means running a process to synchronize the databases.

There are two ORMs being used in the system: Ember data is being used to abstract the mobile data in CouchDB and ActiveRecord is being used to persist the Rails data in Postgresql. I would like to have access to both, but since one is in ruby and the other javascript, that's likely not possible. Scenarios I have imagined:

Javascript running in the mobile app syncs using the Ember ORM and Rails JSON API.
Same as above, but running in a Node.js app.
Ruby listens to changes to the CouchDB app and syncs changes through ActiveRecord.
Ruby loads the Ember ORM in therubyracer and syncs changes through ActiveRecord.

The last option leverages the ORM on both sides, so I am going to try it. It's unlikely to be efficient, but it should be maintainable.

ActiveRecord is apparently available in rake tasks, so I will try creating one at lib/tasks/sync.rake:

desc "Sync a couchdb instance with a rails app"

namespace :sync do
  desc "Sync ActiveRecord to CouchDB"
  task couch: :environment do
  end
end

Now I can run rake sync:couch to execute that code. The process that I want is change adapter triggers a javascript method which interfaces with ActiveRecord. The first thing I will test is simply creating a new ActiveRecord instance from javascript.

To begin, I add therubyracer to my Gemfile, do bundle install, and the rake task works.

I'm uncertain how Ember will react to being loaded outside a browser, so the first thing I will do is load the framework. The ember app is in a separate git repository. It would be nice if it were loaded and kept up to date with the other dependencies using bundler, but I've never created a gem before and don't want to take the time to learn. So, I will simply add it to the repo as a git submodule.

git submodule add git@github.com:synaptian/howlin-wolf-android.git lib/android clones it, then because I'm my code isn't in the master branch cd lib/android && git checkout feature/ember-mockup.

Now in my rake task I can add:

cxt.load(Rails.root.join('lib', 'android', 'www', 'js', 'libs', 'ember-1.5.1.js'))

When I run the task, I get an error: window is not defined. In looking through the code, there is generally a check if window is defined, but it is missing from line 202. So, the proper path is an update to Ember. Forking the project and git submodule add git@github.com:wholcomb/ember.js.git lib/ember adds it to my project.

Next I build it:

bundle install
npm install
rake dist

Next I change the code in my rake task to: cxt.load(Rails.root.join('lib', 'ember', 'dist', 'ember.js'). Now the error occurs on line 247. To find it, in the Ember directory I do, git grep isChrome to find it in the source and change the line to:

var isChrome = typeof window !== 'undefined' && !!window.chrome && !window.opera;

After rerunning rake dist and rake sync:couch, it now complains that Handlebars is missing, so I change the rake task to:

cxt.load(Rails.root.join('lib', 'android', 'www', 'js', 'libs', 'jquery-1.10.2.js'))
cxt.load(Rails.root.join('lib', 'android', 'www', 'js', 'libs', 'handlebars-1.1.2.js'))
cxt.load(Rails.root.join('lib', 'ember', 'dist', 'ember.js'))

Handlebars loads ok, but jQuery dies trying to access the window object. Unlike Ember, it is not designed to load without it, so I will have to insert a shim. In app/assets/javascripts/window.js, I put var window = {} and add it to the beginning of the rake task.

Using the same process, to allow jQuery to load, eventually you end up with:

var window = {
    document: {
        documentElement: {},
        createElement: function() {
            return {
                setAttribute: function() {},
                getElementsByTagName: function() {
                    return []
                }
            }
        },
        createDocumentFragment: function() {
            return {
                appendChild: function() {
                    return {}
                }
            }
        },
        appendChild: function() {
            return {}
        },
        addEventListener: function() {}
    },
    addEventListener: function() {}
}

This allows jQuery to load, but it isn't being found by Ember. To correct this, I add another shim after jQuery loads in app/assets/javascripts/set_jquery.js which is simply: var jQuery = window.jQuery.

As Ember loads more completely, there are a couple places it complains document isn't defined. I fixed those and Ember now loads without errors.

Currently the Ember app is just backed by the default data store. To switch it to CouchDB, I included the scripts from ember-couchbd-kit's distribution folder. When I try loading the resultant page, I get a 404: http://localhost:5984/wells/_design/reading/_view/all?include_docs=true&key=%22reading%22.

It is looking for a view named all, in a design named reading, and a database named wells. To create the database, I do: curl -X PUT http://localhost:5984/wells. The design document, then, is:

curl -X PUT http://localhost:5984/wells/_design/reading -H 'Content-Type: application/json' -d '
{
  "_id": "_design/reading",
  "language": "javascript",
  "views": {
    "all": {
      "map": "function(doc) { if( doc.type == \"reading\" ) emit(null, doc) }"
    }
  }
}

After I load this, I get an error that No 'Access-Control-Allow-Origin' header is present. To fix this I edited /etc/couchdb/local.ini and the page loads.

I get an error when clicking on the wells tab and the solution is an adapted version of what it took for readings.

I can add a new well and when redirected to the wells page, it shows up. When I reload the page, however, it disappears and the list is empty. To check the database, I go to http://localhost:5984/_utils/database.html?wells. I see data, but there is no Type attribute to specify the type. I had assumed that the db adapter would do that for me.

To fix it, I add the following to my well model: type: DS.attr('string', { defaultValue: 'well' } ). That does correct it in the database, but it is still not showing up when the page is refreshed.

The structure that I am using to specify the model for a page is:

App.WellsRoute = Ember.Route.extend( {
    model: function() {
        return this.store.find( 'well' )
    }
} )

In looking at the CouchDB kit example, they use all instead of find, but changing that doesn't fix it.

The data is in the system until I reload the page, so I created a well and a reading, which is associated with a well. Then I tried reloading and no data in either view. In looking at the network traffic, when the page loads there is a request for http://localhost:5984/wells/_design/reading/_view/all?include_docs=true&key=%22reading%22. When I try the URL in the browser, I get:

{"total_rows":0,"offset":0,"rows":[]}

The issue is the key parameter. The all view I used didn't emit a key. I need to change it to key on the document type. With CouchDB I have to specify the previous revision when creating an update, so the request looks like:

curl -X PUT http://localhost:5984/wells/_design/reading -H 'Content-Type: application/json' -d '
{
  "_id": "_design/reading",
  "language": "javascript",
  "views": {
    "all": {
      "map": "function(doc) { emit(doc.type, doc) }"
    }
  },
  "_rev": "1-c4de9076711fc50aab2ec4a4695ac5c2"
}'

The reading now shows up. I can click and view a well. If I try to load the wells list directly, there is no network request and no data in the UI. That's a problem, but I'm not going to worry with it now.

I need to decide on a sync direction to test first. Both have to be working before it can be deployed, so I will first push from the server to CouchDB since I can reset the database with impunity.

So the process is get the wells from ActiveRecord, pass them to therubyracer, create ember models from them and save them. First, I have to load ember-data and the adapter in the rake task:

I added the appropriate load statements to the rake task. When it tries to load I get errors that EmberCouchDBKit isn't defined. This is because it is declared a global variable by assigning it to the window object. That object exists, because of the shim for jQuery, but assigning to it has no special consequences. To fix that, I insert a script in the execution chain with:

var EmberCouchDBKit = window.EmberCouchDBKit
var DS = window.DS

The script now loads. I create the wells and insert them into the context like:

cxt['wells'] = Well.all.map do |well|
  well_obj = cxt.eval 'new Object()'
  well_obj[:name] = well.name
  well_obj[:asset_id] = well.asset_id
  well_obj
end

When I try to access the store to create a new record, however, it is not defined:

var store = App.__container__.lookup('store:main')

A response to a StackOverflow question suggested that the store is only available once everything is loaded and that I should put the code in an App.ready function.

I did this and, not surprisingly, nothing happens. The document load event is almost certainly necessary for jQuery to switch to ready, and Ember likely relies on that. I use the following code in the jQuery shim to handle sending the load event:

addEventListener: function( name, listener ) {
  if( name == 'load' ) {
    window.loadListeners = window.loadListeners || []
    window.loadListeners.push( listener )
  }
},
removeEventListener: function() {},
fireLoad: function() {
  window.loadListeners.forEach( function( listener ) {
    listener.call( window, {} )
  } )
}

The App.ready listener still doesn't fire. Neither does a jQuery ready listner. In looking at the code, promises are used and they can be executed. It might be possible the rake task is exiting before they have a chance to complete. To give them time, I will try a sleep.

That had no effect, so changed the setTimeout function to:

function setTimeout( fn ) { fn.call() }

Now I get a stack overflow. The source is a recursive check for document.body, so I add that to the shim. Next I get an error: Cannot read property 'childNodes' of undefined.

I realized that I am loading the entirety of my Ember app, so it will try to render the UI. To prevent this, I isolated the models in a separate file and only loaded them.

That did not fix it, in looking more closely at the stack trace, I see it is coming from Ember's testCheckboxClick which inserts a checkbox into the document and tests it. I added document existence checks to a couple places they were needed in the initialization process.

And it doesnn't work. This code executes:

App.ready = function() {
    var store = App.__container__.lookup('store:main')
    wells.forEach( function( well ) {
        store
            .createRecord( 'well', {
                asset_id: well.asset_id,
                name: well.name
            } )
            .save()
    } )
}

Nothing is saved to the database. Now that I think about it, none of the network infrastructure is there, so there's no way it can work. Damn.

So my options are either to figure out what network calls are being made and implement the needed functionality, or give up on using Ember. I hate to have wasted time, but the objects that are being created are very simple. I think I'll look into accesing Couch from ruby.

With rails, I can either use CouchRest or CouchRest Model and have a generated ORM. Since neither implements the changes API, I am going to avoid the extra complexity of the persistence layer.

The code that I used to create the wells in the database is just:

@db = CouchRest.database! 'http://127.0.0.1:5984/wells'
Well.all.each do |well|
  response = @db.save_doc( {
                             name: well.name,
                             asset_id: well.asset_id,
                             type: 'well'
                           } )
end

This creates the wells and they load in the mobile app. if it is run again, however, it will create all the wells a second time. I need to check if a well exists before adding it.

Each well has a unique asset id, so I can use that to look well up. The only query possible on a CouchDB is on the key used in for map/reduce, so I need to create a view for wells that is keyed on asset id.

curl -X PUT http://localhost:5984/wells/_design/well -H 'Content-Type: application/json' -d '
{
  "_id": "_design/well",
  "language": "javascript",
  "views": {
    "all": { "map": "function(doc) { emit(doc.type, doc) }" },
    "asset_ids": { "map": "function(doc) { emit(doc.asset_id, doc) }" }
  },
  "_rev": "4-8c76c4fac23a63ad4e8764551afb650a"
}'

I can now visit http://localhost:5984/wells/_design/well/_view/asset_ids?key=1010 and view a single well entry. The query looks like:

Well.all.each do |well|
  response = @db.view( 'well/asset_ids', { key: well.asset_id } )
  if response['rows'].count == 0
    ⋮

Next I would like to sync the readings. In the Rails app these are called WellVolumes. Each has an associated well id, so I have to map between the Rails ids and those in Couch.

I collect that mapping as I check for the existence of wells:

db_ids = []

Well.all.each do |well|
  search_response = @db.view( 'well/asset_ids', { key: well.asset_id } )
  wells = search_response['rows']
  if wells.count > 1
    puts "#{wells.count} wells found for asset id: #{well.asset_id}"
  elsif wells.count == 1
    db_ids[well.id] = wells[0]['id']
  else
    puts "Creating well in Couch: #{well.name}"
    save_response = @db.save_doc( {
                                    name: well.name,
                                    asset_id: well.asset_id,
                                    type: 'well'
                                  } )
    db_ids[well.id] = save_response['id']
  end
end

It would be more efficient to load all the wells at once, but I'll leave that for later. Avoiding inserting data twice is more complex for readings because there is no inherent unique identifier. I have two options, either build a query on well and reading date, which should be unique, or store the rails id and use it.

The Rails id seems to be the most foolproof method, so I will use it. Accessing it means updating the readings view:

curl -X PUT http://localhost:5984/wells/_design/reading -H 'Content-Type: application/json' -d '
{
  "_id": "_design/reading",
  "language": "javascript",
  "views": {
    "all": { "map": "function(doc) { emit(doc.type, doc) }" },
    "rails_ids": { "map": "function(doc) { if(doc.type == \"reading\") emit(doc.rails_id, doc) }" }
  },
  "_rev": "2-15e26d319dc99e2d1c10e0e9e439d3ed"
}'

The data did not load as expected and I would like to recreate the database. To make this easier, I created a rake task:

namespace :couch do
  desc "Reinitialize CouchDB"
  task reset: :environment do
    host = URI::encode "http://#{ENV['COUCH_USER']}:#{ENV['COUCH_PASS']}@127.0.0.1:5984"
    begin
      RestClient.delete "#{host}/wells"
    rescue => ex
      raise ex if ex.response.code != 404
    end
    RestClient.put "#{host}/wells", ''
    RestClient.put("#{host}/wells/_design/well",
                   '{
                      "_id": "_design/well",
                      "language": "javascript",
                      "views": {
                        "all": { "map": "function(doc) { emit(doc.type, doc) }" },
                        "asset_ids": { "map": "function(doc) { if(doc.type == \"well\") emit(doc.asset_id, doc) }" }
                      }
                    }',
                   content_type: 'application/json')
    RestClient.put("#{host}/wells/_design/reading",
                   '{
                      "_id": "_design/reading",
                      "language": "javascript",
                      "views": {
                        "all": { "map": "function(doc) { emit(doc.type, doc) }" },
                        "rails_ids": { "map": "function(doc) { if(doc.type == \"reading\") emit(doc.rails_id, doc) }" }
                      }
                    }',
                   content_type: 'application/json')
  end
end

After fixing the import, there is too much data. With a few thosand entries, it takes a couple seconds for the page to load. With more than 8-9000 I get an error: Uncaught RangeError: Maximum call stack size exceeded. I need to paginate the results to deal with this.

CouchDB has limit and offset parameters, but there is a recommendation against using them because offset is inefficient. Instead an extra result is returned and that key is used as the start position for the next query.

To start with I am going to create a view that returns the readings sorted by time:

"by_time": { "map": "function(doc) { if(doc.type == \"reading\") { d = new Date(doc.time); emit([d.getFullYear(), d.getMonth(), d.getDate(), d.getHours(), d.getMinutes()], doc) } }"

Now I have to tell Ember to return items from this view, that looks like:

return this.store.findQuery( 'reading', {
  designDoc: 'reading',
  viewName: 'by_time',
  options: {
    descending: true
  }
} )

Syncing CouchDB to ActiveRecord

Will Holcomb

1 May 2014