I think the most challenging part of my RailsConf 2022 talk (both to prepare and probably also to listen to) was where I discussed the connection between Active Record Associations and Relations.

I tried presenting the material in at least three different ways, and agonized over making it as clear and concise as possible. What I ended up with is still fairly complex, and the longest section of the talk. There’s probably still clearer ways to present this material, but for now I’m accepting that some amount of complexity here is inevitable.

One of the complications that I largely bypass in the final version of the talk is that the Association and Relation classes both include a caching mechanism (I’m pretty sure like 90% of Active Record is just various layers of caching). The Association stores the records it loads in an instance variable called @target:

class Association
  def intialize
    @loaded = false
    @target = nil
  end
end

whereas the Relation stores them in @records.

class Relation
  def initialize
    @loaded = false
    @records = nil
  end
end

The naming is different, but these caches are essentially the same. They start out with @loaded = false, and then set that to true once the records are loaded. If @loaded is true, they return the in-memory records rather than loading them again.

In the talk I show a HasManyAssociation#reader method, and then later update it to return an object called a CollectionProxy. The initial HasManyAssociation#reader sets the association @target to an array of records (note the call to to_a):

class HasManyAssociation < Association
  def reader
    if loaded?
      target
    else
      self.target = klass.where(foreign_key => @owner[primary_key]).to_a
    end
  end
end

But it occurred to me that I might avoid the whole CollectionProxy class by getting rid of that call to to_a and instead setting the association target as a Relation:

class HasManyAssociation < Association
  def reader
    if loaded?
      target
    else
      self.target = klass.where(foreign_key => @owner[primary_key])
    end
  end
end

Then I’d be relying on the Relation to cache the array of records instead of the Association. Initially this seems to work:

pull_requests = repository.pull_requests
#=> #<ActiveRecord::Relation>
pull_requests.to_a
#=> SELECT * FROM pull_requests WHERE repository_id = ?
#=> []
pull_requests.to_a
#=> []

Calling to_a the first time loads and stores the records in @records. The second time the Relation is already loaded so it can return the in-memory @records without loading them again. Perfect!

But there’s a problem:

pull_requests.create!
#=> #<PullRequest>
pull_requests.to_a
#=> []

Oops! After creating a new record the Relation’s cache is stale—@records doesn’t include the newly created record.

This seems fine for the way relations are typically used. I don’t think it’s common to build a Relation and then use it for both reading and creating new records.

But that scenario does seem common for associations, and so a stale cache there is fairly undesirable. Active Record tries really hard to keep your associations up to date with the latest information. This saves you from having to manually reload them all the time.

Oh well! I’ll go back to storing an array of records as the association’s @target, and let the association manage that array. To get an Association that has all the features of a Relation but also maintains an up-to-date cache, we turn to the CollectionProxy.

Check out my talk if you’d like to learn more about how this CollectionProxy makes the magic happen!