RailsConf 2022—Relating Relations
I think the most challenging part of my RailsConf 2022 talk (both to
prepare and probably also to listen to) was where I discussed the connection
between Active Record Association
s and Relation
s.
I tried presenting the material in at least three different ways, and agonized over making it as clear and concise as possible. What I ended up with is still fairly complex, and the longest section of the talk. There’s probably still clearer ways to present this material, but for now I’m accepting that some amount of complexity here is inevitable.
One of the complications that I largely bypass in the final version of the
talk is that the Association
and Relation
classes both include a caching
mechanism (I’m pretty sure like 90% of Active Record is just various layers of
caching). The Association
stores the records it loads in an instance variable
called @target
:
class Association
def intialize
@loaded = false
@target = nil
end
end
whereas the Relation
stores them in @records
.
class Relation
def initialize
@loaded = false
@records = nil
end
end
The naming is different, but these caches are essentially the same. They start
out with @loaded = false
, and then set that to true
once the records are
loaded. If @loaded
is true
, they return the in-memory records rather than
loading them again.
In the talk I show a HasManyAssociation#reader
method, and then later update
it to return an object called a CollectionProxy
. The initial
HasManyAssociation#reader
sets the association @target
to an array of
records (note the call to to_a
):
class HasManyAssociation < Association
def reader
if loaded?
target
else
self.target = klass.where(foreign_key => @owner[primary_key]).to_a
end
end
end
But it occurred to me that I might avoid the whole CollectionProxy
class by
getting rid of that call to to_a
and instead setting the association target as
a Relation
:
class HasManyAssociation < Association
def reader
if loaded?
target
else
self.target = klass.where(foreign_key => @owner[primary_key])
end
end
end
Then I’d be relying on the Relation
to cache the array of records instead of
the Association
. Initially this seems to work:
pull_requests = repository.pull_requests
#=> #<ActiveRecord::Relation>
pull_requests.to_a
#=> SELECT * FROM pull_requests WHERE repository_id = ?
#=> []
pull_requests.to_a
#=> []
Calling to_a
the first time loads and stores the records in @records
. The
second time the Relation
is already loaded so it can return the in-memory
@records
without loading them again. Perfect!
But there’s a problem:
pull_requests.create!
#=> #<PullRequest>
pull_requests.to_a
#=> []
Oops! After creating a new record the Relation
’s cache is stale—@records
doesn’t include the newly created record.
This seems fine for the way relations are typically used. I don’t think it’s
common to build a Relation
and then use it for both reading and creating new
records.
But that scenario does seem common for associations, and so a stale cache there is fairly undesirable. Active Record tries really hard to keep your associations up to date with the latest information. This saves you from having to manually reload them all the time.
Oh well! I’ll go back to storing an array of records as the association’s
@target
, and let the association manage that array. To get an Association
that has all the features of a Relation
but also maintains an up-to-date
cache, we turn to the CollectionProxy
.
Check out my talk if you’d like to learn more about how this
CollectionProxy
makes the magic happen!