1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651
|
**DO NOT READ THIS FILE ON GITHUB, GUIDES ARE PUBLISHED ON https://guides.rubyonrails.org.**
Multiple Databases with Active Record
=====================================
This guide covers using multiple databases with your Rails application.
After reading this guide you will know:
* How to set up your application for multiple databases.
* How automatic connection switching works.
* How to use horizontal sharding for multiple databases.
* What features are supported and what's still a work in progress.
--------------------------------------------------------------------------------
As an application grows in popularity and usage, you'll need to scale the application
to support your new users and their data. One way in which your application may need
to scale is on the database level. Rails supports using multiple databases, so you don't
have to store your data all in one place.
At this time the following features are supported:
* Multiple writer databases and a replica for each
* Automatic connection switching for the model you're working with
* Automatic swapping between the writer and replica depending on the HTTP verb and recent writes
* Rails tasks for creating, dropping, migrating, and interacting with the multiple databases
The following features are not (yet) supported:
* Load balancing replicas
## Setting up Your Application
While Rails tries to do most of the work for you, there are still some steps you'll
need to do to get your application ready for multiple databases.
Let's say we have an application with a single writer database, and we need to add a
new database for some new tables we're adding. The name of the new database will be
"animals".
`config/database.yml` looks like this:
```yaml
production:
database: my_primary_database
adapter: mysql2
username: root
password: <%= ENV['ROOT_PASSWORD'] %>
```
Let's add a second database called "animals" and replicas for both databases as
well. To do this, we need to change our `config/database.yml` from a 2-tier to a
3-tier config.
If a `primary` configuration key is provided, it will be used as the "default" configuration. If
there is no configuration named `primary`, Rails will use the first configuration as default
for each environment. The default configurations will use the default Rails filenames. For example,
primary configurations will use `db/schema.rb` for the schema file, whereas all the other entries
will use `db/[CONFIGURATION_NAMESPACE]_schema.rb` for the filename.
```yaml
production:
primary:
database: my_primary_database
username: root
password: <%= ENV['ROOT_PASSWORD'] %>
adapter: mysql2
primary_replica:
database: my_primary_database
username: root_readonly
password: <%= ENV['ROOT_READONLY_PASSWORD'] %>
adapter: mysql2
replica: true
animals:
database: my_animals_database
username: animals_root
password: <%= ENV['ANIMALS_ROOT_PASSWORD'] %>
adapter: mysql2
migrations_paths: db/animals_migrate
animals_replica:
database: my_animals_database
username: animals_readonly
password: <%= ENV['ANIMALS_READONLY_PASSWORD'] %>
adapter: mysql2
replica: true
```
When using multiple databases, there are a few important settings.
First, the database name for `primary` and `primary_replica` should be the same because they contain
the same data. This is also the case for `animals` and `animals_replica`.
Second, the username for the writers and replicas should be different, and the
replica user's database permissions should be set to only read and not write.
When using a replica database, you need to add a `replica: true` entry to the replica in
`config/database.yml`. This is because Rails otherwise has no way of knowing which one is a replica
and which one is the writer. Rails will not run certain tasks, such as migrations, against replicas.
Lastly, for new writer databases, you need to set the `migrations_paths` key to the directory
where you will store migrations for that database. We'll look more at `migrations_paths`
later on in this guide.
You can also configure the schema dump file by setting `schema_dump` to a custom schema file name
or completely skip the schema dumping by setting `schema_dump: false`.
Now that we have a new database, let's set up the connection model.
The primary database replica may be configured in `ApplicationRecord` this way:
```ruby
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
connects_to database: { writing: :primary, reading: :primary_replica }
end
```
If you use a differently named class for your application record you need to
set `primary_abstract_class` instead, so that Rails knows which class `ActiveRecord::Base`
should share a connection with.
```ruby
class PrimaryApplicationRecord < ActiveRecord::Base
primary_abstract_class
connects_to database: { writing: :primary, reading: :primary_replica }
end
```
In that case, classes that connect to `primary`/`primary_replica` can inherit
from your primary abstract class like standard Rails applications do with
`ApplicationRecord`:
```ruby
class Person < PrimaryApplicationRecord
end
```
On the other hand, we need to setup our models persisted in the "animals" database:
```ruby
class AnimalsRecord < ApplicationRecord
self.abstract_class = true
connects_to database: { writing: :animals, reading: :animals_replica }
end
```
Those models should inherit from that common abstract class:
```ruby
class Dog < AnimalsRecord
# Talks automatically to the animals database.
end
```
By default, Rails expects the database roles to be `writing` and `reading` for the primary
and replica respectively. If you have a legacy system you may already have roles set up that
you don't want to change. In that case you can set a new role name in your application config.
```ruby
config.active_record.writing_role = :default
config.active_record.reading_role = :readonly
```
It's important to connect to your database in a single model and then inherit from that model
for the tables rather than connect multiple individual models to the same database. Database
clients have a limit to the number of open connections there can be, and if you do this, it will
multiply the number of connections you have since Rails uses the model class name for the
connection specification name.
Now that we have the `config/database.yml` and the new model set up, it's time
to create the databases. Rails ships with all the commands you need to use
multiple databases.
You can run `bin/rails --help` to see all the commands you're able to run. You should see the following:
```bash
$ bin/rails --help
...
db:create # Create the database from DATABASE_URL or config/database.yml for the ...
db:create:animals # Create animals database for current environment
db:create:primary # Create primary database for current environment
db:drop # Drop the database from DATABASE_URL or config/database.yml for the cu...
db:drop:animals # Drop animals database for current environment
db:drop:primary # Drop primary database for current environment
db:migrate # Migrate the database (options: VERSION=x, VERBOSE=false, SCOPE=blog)
db:migrate:animals # Migrate animals database for current environment
db:migrate:primary # Migrate primary database for current environment
db:migrate:status # Display status of migrations
db:migrate:status:animals # Display status of migrations for animals database
db:migrate:status:primary # Display status of migrations for primary database
db:reset # Drop and recreates all databases from their schema for the current environment and loads the seeds
db:reset:animals # Drop and recreates the animals database from its schema for the current environment and loads the seeds
db:reset:primary # Drop and recreates the primary database from its schema for the current environment and loads the seeds
db:rollback # Roll the schema back to the previous version (specify steps w/ STEP=n)
db:rollback:animals # Rollback animals database for current environment (specify steps w/ STEP=n)
db:rollback:primary # Rollback primary database for current environment (specify steps w/ STEP=n)
db:schema:dump # Create a database schema file (either db/schema.rb or db/structure.sql ...
db:schema:dump:animals # Create a database schema file (either db/schema.rb or db/structure.sql ...
db:schema:dump:primary # Create a db/schema.rb file that is portable against any DB supported ...
db:schema:load # Load a database schema file (either db/schema.rb or db/structure.sql ...
db:schema:load:animals # Load a database schema file (either db/schema.rb or db/structure.sql ...
db:schema:load:primary # Load a database schema file (either db/schema.rb or db/structure.sql ...
db:setup # Create all databases, loads all schemas, and initializes with the seed data (use db:reset to also drop all databases first)
db:setup:animals # Create the animals database, loads the schema, and initializes with the seed data (use db:reset:animals to also drop the database first)
db:setup:primary # Create the primary database, loads the schema, and initializes with the seed data (use db:reset:primary to also drop the database first)
...
```
Running a command like `bin/rails db:create` will create both the primary and animals databases.
Note that there is no command for creating the database users, and you'll need to do that manually
to support the readonly users for your replicas. If you want to create just the animals
database you can run `bin/rails db:create:animals`.
## Connecting to Databases without Managing Schema and Migrations
If you would like to connect to an external database without any database
management tasks such as schema management, migrations, seeds, etc., you can set
the per database config option `database_tasks: false`. By default it is
set to true.
```yaml
production:
primary:
database: my_database
adapter: mysql2
animals:
database: my_animals_database
adapter: mysql2
database_tasks: false
```
## Generators and Migrations
Migrations for multiple databases should live in their own folders prefixed with the
name of the database key in the configuration.
You also need to set `migrations_paths` in the database configurations to tell
Rails where to find the migrations.
For example the `animals` database would look for migrations in the `db/animals_migrate` directory and
`primary` would look in `db/migrate`. Rails generators now take a `--database` option
so that the file is generated in the correct directory. The command can be run like so:
```bash
$ bin/rails generate migration CreateDogs name:string --database animals
```
If you are using Rails generators, the scaffold and model generators will create the abstract
class for you. Simply pass the database key to the command line.
```bash
$ bin/rails generate scaffold Dog name:string --database animals
```
A class with the camelized database name and `Record` will be created. In this
example the database is "animals" so we end up with `AnimalsRecord`:
```ruby
class AnimalsRecord < ApplicationRecord
self.abstract_class = true
connects_to database: { writing: :animals }
end
```
The generated model will automatically inherit from `AnimalsRecord`.
```ruby
class Dog < AnimalsRecord
end
```
NOTE: Since Rails doesn't know which database is the replica for your writer you will need to
add this to the abstract class after you're done.
Rails will only generate `AnimalsRecord` once. It will not be overwritten by new
scaffolds or deleted if the scaffold is deleted.
If you already have an abstract class and its name differs from `AnimalsRecord`, you can pass
the `--parent` option to indicate you want a different abstract class:
```bash
$ bin/rails generate scaffold Dog name:string --database animals --parent Animals::Record
```
This will skip generating `AnimalsRecord` since you've indicated to Rails that you want to
use a different parent class.
## Activating Automatic Role Switching
Finally, in order to use the read-only replica in your application, you'll need to activate
the middleware for automatic switching.
Automatic switching allows the application to switch from the writer to replica or replica
to writer based on the HTTP verb and whether there was a recent write by the requesting user.
If the application receives a POST, PUT, DELETE, or PATCH request, the application will
automatically write to the writer database. If the request is not one of those methods,
but the application recently made a write, the writer database will also be used. All
other requests will use the replica database.
To activate the automatic connection switching middleware you can run the automatic swapping
generator:
```bash
$ bin/rails g active_record:multi_db
```
And then uncomment the following lines:
```ruby
Rails.application.configure do
config.active_record.database_selector = { delay: 2.seconds }
config.active_record.database_resolver = ActiveRecord::Middleware::DatabaseSelector::Resolver
config.active_record.database_resolver_context = ActiveRecord::Middleware::DatabaseSelector::Resolver::Session
end
```
Rails guarantees "read your own write" and will send your GET or HEAD request to the
writer if it's within the `delay` window. By default the delay is set to 2 seconds. You
should change this based on your database infrastructure. Rails doesn't guarantee "read
a recent write" for other users within the delay window and will send GET and HEAD requests
to the replicas unless they wrote recently.
The automatic connection switching in Rails is relatively primitive and deliberately doesn't
do a whole lot. The goal is a system that demonstrates how to do automatic connection
switching that was flexible enough to be customizable by app developers.
The setup in Rails allows you to easily change how the switching is done and what
parameters it's based on. Let's say you want to use a cookie instead of a session to
decide when to swap connections. You can write your own class:
```ruby
class MyCookieResolver < ActiveRecord::Middleware::DatabaseSelector::Resolver
def self.call(request)
new(request.cookies)
end
def initialize(cookies)
@cookies = cookies
end
attr_reader :cookies
def last_write_timestamp
self.class.convert_timestamp_to_time(cookies[:last_write])
end
def update_last_write_timestamp
cookies[:last_write] = self.class.convert_time_to_timestamp(Time.now)
end
def save(response)
end
end
```
And then pass it to the middleware:
```ruby
config.active_record.database_selector = { delay: 2.seconds }
config.active_record.database_resolver = ActiveRecord::Middleware::DatabaseSelector::Resolver
config.active_record.database_resolver_context = MyCookieResolver
```
## Using Manual Connection Switching
There are some cases where you may want your application to connect to a writer or a replica
and the automatic connection switching isn't adequate. For example, you may know that for a
particular request you always want to send the request to a replica, even when you are in a
POST request path.
To do this Rails provides a `connected_to` method that will switch to the connection you
need.
```ruby
ActiveRecord::Base.connected_to(role: :reading) do
# All code in this block will be connected to the reading role.
end
```
The "role" in the `connected_to` call looks up the connections that are connected on that
connection handler (or role). The `reading` connection handler will hold all the connections
that were connected via `connects_to` with the role name of `reading`.
Note that `connected_to` with a role will look up an existing connection and switch
using the connection specification name. This means that if you pass an unknown role
like `connected_to(role: :nonexistent)` you will get an error that says
`ActiveRecord::ConnectionNotEstablished (No connection pool for 'ActiveRecord::Base' found for the 'nonexistent' role.)`
If you want Rails to ensure any queries performed are read only, pass `prevent_writes: true`.
This just prevents queries that look like writes from being sent to the database.
You should also configure your replica database to run in read-only mode.
```ruby
ActiveRecord::Base.connected_to(role: :reading, prevent_writes: true) do
# Rails will check each query to ensure it's a read query.
end
```
## Horizontal Sharding
Horizontal sharding is when you split up your database to reduce the number of rows on each
database server, but maintain the same schema across "shards". This is commonly called "multi-tenant"
sharding.
The API for supporting horizontal sharding in Rails is similar to the multiple database / vertical
sharding API that's existed since Rails 6.0.
Shards are declared in the three-tier config like this:
```yaml
production:
primary:
database: my_primary_database
adapter: mysql2
primary_replica:
database: my_primary_database
adapter: mysql2
replica: true
primary_shard_one:
database: my_primary_shard_one
adapter: mysql2
migrations_paths: db/migrate_shards
primary_shard_one_replica:
database: my_primary_shard_one
adapter: mysql2
replica: true
primary_shard_two:
database: my_primary_shard_two
adapter: mysql2
migrations_paths: db/migrate_shards
primary_shard_two_replica:
database: my_primary_shard_two
adapter: mysql2
replica: true
```
Models are then connected with the `connects_to` API via the `shards` key:
```ruby
class ApplicationRecord < ActiveRecord::Base
primary_abstract_class
connects_to database: { writing: :primary, reading: :primary_replica }
end
class ShardRecord < ApplicationRecord
self.abstract_class = true
connects_to shards: {
shard_one: { writing: :primary_shard_one, reading: :primary_shard_one_replica },
shard_two: { writing: :primary_shard_two, reading: :primary_shard_two_replica }
}
end
```
If you're using shards, make sure both `migrations_paths` and `schema_dump` remain unchanged for
all the shards. When generating a migration you can pass the `--database` option and
use one of the shard names. Since they all set the same path, it doesn't matter which
one you choose.
```
$ bin/rails g scaffold Dog name:string --database primary_shard_one
```
Then models can swap shards manually via the `connected_to` API. If
using sharding, both a `role` and a `shard` must be passed:
```ruby
ActiveRecord::Base.connected_to(role: :writing, shard: :default) do
@id = Person.create! # Creates a record in shard named ":default"
end
ActiveRecord::Base.connected_to(role: :writing, shard: :shard_one) do
Person.find(@id) # Can't find record, doesn't exist because it was created
# in the shard named ":default".
end
```
The horizontal sharding API also supports read replicas. You can swap the
role and the shard with the `connected_to` API.
```ruby
ActiveRecord::Base.connected_to(role: :reading, shard: :shard_one) do
Person.first # Lookup record from read replica of shard one.
end
```
## Activating Automatic Shard Switching
Applications are able to automatically switch shards per request using the provided
middleware.
The `ShardSelector` middleware provides a framework for automatically
swapping shards. Rails provides a basic framework to determine which
shard to switch to and allows for applications to write custom strategies
for swapping if needed.
`ShardSelector` takes a set of options (currently only `lock` is supported)
that can be used by the middleware to alter behavior. `lock` is
true by default and will prohibit the request from switching shards once
inside the block. If `lock` is false, then shard swapping will be allowed.
For tenant based sharding, `lock` should always be true to prevent application
code from mistakenly switching between tenants.
The same generator as the database selector can be used to generate the file for
automatic shard swapping:
```bash
$ bin/rails g active_record:multi_db
```
Then in the generated `config/initializers/multi_db.rb` uncomment the following:
```ruby
Rails.application.configure do
config.active_record.shard_selector = { lock: true }
config.active_record.shard_resolver = ->(request) { Tenant.find_by!(host: request.host).shard }
end
```
Applications must provide the code for the resolver as it depends on application
specific models. An example resolver would look like this:
```ruby
config.active_record.shard_resolver = ->(request) {
subdomain = request.subdomain
tenant = Tenant.find_by_subdomain!(subdomain)
tenant.shard
}
```
## Granular Database Connection Switching
Starting from Rails 6.1, it's possible to switch connections for one database
instead of all databases globally.
With granular database connection switching, any abstract connection class
will be able to switch connections without affecting other connections. This
is useful for switching your `AnimalsRecord` queries to read from the replica
while ensuring your `ApplicationRecord` queries go to the primary.
```ruby
AnimalsRecord.connected_to(role: :reading) do
Dog.first # Reads from animals_replica.
Person.first # Reads from primary.
end
```
It's also possible to swap connections granularly for shards.
```ruby
AnimalsRecord.connected_to(role: :reading, shard: :shard_one) do
# Will read from shard_one_replica. If no connection exists for shard_one_replica,
# a ConnectionNotEstablished error will be raised.
Dog.first
# Will read from primary writer.
Person.first
end
```
To switch only the primary database cluster use `ApplicationRecord`:
```ruby
ApplicationRecord.connected_to(role: :reading, shard: :shard_one) do
Person.first # Reads from primary_shard_one_replica.
Dog.first # Reads from animals_primary.
end
```
`ActiveRecord::Base.connected_to` maintains the ability to switch
connections globally.
### Handling Associations with Joins across Databases
As of Rails 7.0+, Active Record has an option for handling associations that would perform
a join across multiple databases. If you have a has many through or a has one through association
that you want to disable joining and perform 2 or more queries, pass the `disable_joins: true` option.
For example:
```ruby
class Dog < AnimalsRecord
has_many :treats, through: :humans, disable_joins: true
has_many :humans
has_one :home
has_one :yard, through: :home, disable_joins: true
end
class Home
belongs_to :dog
has_one :yard
end
class Yard
belongs_to :home
end
```
Previously calling `@dog.treats` without `disable_joins` or `@dog.yard` without `disable_joins`
would raise an error because databases are unable to handle joins across clusters. With the
`disable_joins` option, Rails will generate multiple select queries
to avoid attempting joining across clusters. For the above association, `@dog.treats` would generate the
following SQL:
```sql
SELECT "humans"."id" FROM "humans" WHERE "humans"."dog_id" = ? [["dog_id", 1]]
SELECT "treats".* FROM "treats" WHERE "treats"."human_id" IN (?, ?, ?) [["human_id", 1], ["human_id", 2], ["human_id", 3]]
```
While `@dog.yard` would generate the following SQL:
```sql
SELECT "home"."id" FROM "homes" WHERE "homes"."dog_id" = ? [["dog_id", 1]]
SELECT "yards".* FROM "yards" WHERE "yards"."home_id" = ? [["home_id", 1]]
```
There are some important things to be aware of with this option:
1. There may be performance implications since now two or more queries will be performed (depending
on the association) rather than a join. If the select for `humans` returned a high number of IDs
the select for `treats` may send too many IDs.
2. Since we are no longer performing joins, a query with an order or limit is now sorted in-memory since
order from one table cannot be applied to another table.
3. This setting must be added to all associations where you want joining to be disabled.
Rails can't guess this for you because association loading is lazy, to load `treats` in `@dog.treats`
Rails already needs to know what SQL should be generated.
### Schema Caching
If you want to load a schema cache for each database you must set
`schema_cache_path` in each database configuration and set
`config.active_record.lazily_load_schema_cache = true` in your application
configuration. Note that this will lazily load the cache when the database
connections are established.
## Caveats
### Load Balancing Replicas
Rails doesn't support automatic load balancing of replicas. This is very
dependent on your infrastructure. We may implement basic, primitive load
balancing in the future, but for an application at scale this should be
something your application handles outside of Rails.
|