Mastering SQL Migrations: Strategies for Safe and Efficient Database Updates
When dealing with relational databases, projects must acknowledge and address this crucial reality efficiently. This is the key to maintaining applications’ scalability, extensibility, and maintainability. It underscores the need for developers to thoroughly discuss and plan architectural changes to tackle the migration challenge.
When working with any relational database, there is only one way to make data changes: run a migration.
For those who are not familiar with the term migration, active records define it as a mechanism to manage the evolution of a schema used by several physical databases, a solution to the common problem of adding a field to make a new feature work in your local database but being unsure of how to push that change to other developers and to the production server. With migrations, you can describe the transformations in self-contained classes that can be checked into version control systems and executed against another database that might be one, two, or five versions behind” (Ruby on Rails, n.d.)
Working with new resources, I would consider is one with the least cascading effects since there is no need to conciliate records as often as it is required when extending existing functionality. To accomplish this, the developer may create the new resource by following these steps, which I will provide using examples within a nestjs project.
By convention in software development, a migration has at least two SQL scripts or methods when defined within a class. The up method contains the SQL commands to perform the desired changes within the database e.g., ALTER TABLE_NAME ADD FK_001, while the down method implements the required steps to undo the up migration e.g., ALTER TABLE_NAME DROP FK_001.
The straightforward parts.
1. Define the entity.
// User entity
// Using Prisma we may define the entity in the schema.prisma file as
model User {
id String @id @default(dbgenerated("gen_random_uuid()")) @db.Uuid
phone_number String
username String @unique()
password String
is_active Boolean @default(true)
created_at DateTime @default(now())
updated_at DateTime @default(now())
}
// This which would be the equivalent in Typeorm to
@Entity()
export class User {
@PrimaryGeneratedColumn('uuid')
id: string
@ApiProperty({ type: 'string' })
@IsNumberString()
@Column({ type: 'varchar', nullable: false })
phone_number: string
@ApiProperty({ type: 'string' })
@IsString()
@Column({ type: 'varchar', nullable: false, unique: true })
username: string
@ApiProperty({ type: 'string' })
@IsString()
@Column({ type: 'varchar', nullable: false })
password: string
@ApiProperty({ type: 'boolean' })
@IsBoolean()
@Column({ default: false })
is_active: boolean
@CreateDateColumn({ type: 'datetime' })
created_at: Date
@UpdateDateColumn({ type: 'datetime' })
updated_at: Date
}
In the typeorm example I am adding the dto schema validations using the class-validator to avoid code duplication defining a dto when the intent is validating as a resource. In Prisma, it is recommended to define an entity, which I am omitting to avoid extending this post with repeated examples.
2. Understand the migration file.
At this point, we have the data model definition which is usually useful to develop the api layers on top of this model before persisting new changes to the database, and depending on the framework or technology the project implements using this model we can use a CLI to generate the migration file.
In prisma, we could create the script by running
npx prisma migrate dev - create-only -n AddsUserTable
which would be the equivalent intypeorm to
typeorm migration:create ./path-to-migrations-dir/AddsUserTable
This will generate a file with the pattern timestamp-AddsUserTable.ts which contains a class with a generated SQL script.
import { MigrationInterface, QueryRunner } from 'typeorm'
export class AddsUserTable1633427228144 implements MigrationInterface {
public async up(queryRunner: QueryRunner): Promise<void> {
await queryRunner.query(`
CREATE TABLE user (
id char(36) NOT NULL,
phone_number varchar(255) NOT NULL,
username varchar(255) NOT NULL,
password text NULL,
is_active tinyint NOT NULL DEFAULT 0
created_at datetime(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
updated_at datetime(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6) ON UPDATE CURRENT_TIMESTAMP(6),
PRIMARY KEY (id)
) ENGINE=InnoDB`)
}
public async down(queryRunner: QueryRunner): Promise<void> {
await queryRunner.query(`DROP TABLE \`user\``)
}
}
I always recommend making sense of the generated file for two reasons:
- it will help you improve your writing of pure SQL
- it will allow you to see how the migration aligns with the intended changes within the business model, of course if the generated script is doing what you want to do, then go ahead and apply the change to the database.
// typeorm
typeorm migration:run
// prisma
npx prisma migrate dev
While developing, it is normal to have to recreate the resource multiple times before shipping the code, so for creating new resources, a safeguard that we can add to the script is safe checking the creation of the table and safe dropping the table we would have to replace the following:
public async up(queryRunner: QueryRunner): Promise<void> {
await queryRunner.query(`
CREATE TABLE IF NOT EXISTS user (
// table definition
) ENGINE=InnoDB`)
}
public async down(queryRunner: QueryRunner): Promise<void> {
await queryRunner.query(`DROP TABLE IF EXISTS `user`)
}
This will allow you to run the up and down migration even after failing to mark the migration as applied, we can verify this by inspecting the migrations table.
The not so straightforward parts.
Handling Updates on Large Amount of Records
When performing data migrations that involve updating a large number of records, it’s crucial to handle the process efficiently to avoid performance bottlenecks and ensure data integrity. Here’s a recommended approach for handling such updates in batches, which can be adapted to various scenarios. The following example demonstrates the best practices for this process.
Best Practices for Batch Updates
- Batch Size: Define an appropriate batch size that balances performance and load on the database. Too large batches may lead to locking issues, while too small batches may result in an increased number of queries.
- Track Progress: Keep track of the progress by storing the ID of the last processed record. This helps in resuming the process in case of failure.
- Incremental Updates: Process records incrementally to manage large data sets effectively without overwhelming the system.
- Logging: Log the number of updates made per batch and the total updates to monitor the migration progress.
This a possible implementation to handle hundred of thousands of updates
public async up(queryRunner: QueryRunner): Promise<void> {
let lastId = '0'; // Assuming IDs are strings; adjust the initial value accordingly
const batchCount = 500; // Size of each batch
let totalUpdatesMade = 0;
let batchUpdatesMade = 0;
do {
// Get the maximum id for the current batch
const ids = await queryRunner.query(
`SELECT id FROM source_table
WHERE id > ?
ORDER BY id
LIMIT ?`,
[lastId, batchCount]
);
if (ids.length) {
lastId = ids[ids.length - 1].id; // Prepare the last_id for the next iteration
// Convert ids array to a simple array of id values
const idsToUpdate = ids.map((row) => row.id);
// Perform the update
await queryRunner.query(
`UPDATE target_table
SET target_column = source_value
WHERE id IN (?)`,
[idsToUpdate]
);
batchUpdatesMade = ids.length; // Set the number of updates made
console.log('Updates made in batch:', batchUpdatesMade);
totalUpdatesMade += batchUpdatesMade; // Increment total updates made
} else {
batchUpdatesMade = 0; // No more updates needed
}
} while (batchUpdatesMade > 0);
console.log('Total updates made:', totalUpdatesMade);
}
Key Takeaways
Efficiency: By updating records in batches, the load on the database is controlled, and the migration process can handle large data sets without overwhelming the system.
Resilience: Tracking the last processed ID ensures that the process can be resumed from where it left off in case of an interruption.
Monitoring: Logging updates provides visibility into the migration progress, helping to identify and address any issues promptly.
Running a Migration within a Transaction
Transactions are essential in database operations to ensure data integrity and consistency. A transaction is a sequence of operations performed as a single logical unit of work, which is either fully completed or fully failed. This ensures that all operations within the transaction are successfully executed before committing the changes to the database, or none of them are applied if any operation fails, thus preventing partial updates (Silberschatz et al., 2020).
What is a Transaction?
According to Silberschatz et al. (2020), a transaction is “a collection of operations that performs a single logical function in a database application”. Transactions provide the properties of atomicity, consistency, isolation, and durability (ACID), which are crucial for reliable database management.
Implementing Transactions in TypeORM
When performing migrations, it is a best practice to use transactions to ensure that updates are only committed if all checks and verifications are successful. Here’s how you can implement a migration within a transaction in TypeORM:
import { QueryRunner, MigrationInterface } from "typeorm";
export class ExampleMigration1620281942000 implements MigrationInterface {
public async up(queryRunner: QueryRunner): Promise<void> {
// Start a transaction
await queryRunner.startTransaction();
try {
// Perform the update
await queryRunner.query(
`UPDATE target_table
SET target_column = new_value
WHERE some_condition = true`
);
// Verification step
const result = await queryRunner.query(
`SELECT COUNT(*) as count FROM target_table
WHERE target_column = new_value AND some_condition = true`
);
// Check if the verification is successful
if (result[0].count === expected_count) {
// Commit the transaction if verification passes
await queryRunner.commitTransaction();
console.log('Transaction committed successfully.');
} else {
// Rollback the transaction if verification fails
await queryRunner.rollbackTransaction();
console.log('Transaction rolled back due to verification failure.');
}
} catch (error) {
// Rollback the transaction in case of error
await queryRunner.rollbackTransaction();
console.error('Transaction rolled back due to error:', error);
} finally {
// Release the query runner
await queryRunner.release();
}
}
public async down(queryRunner: QueryRunner): Promise<void> {
// Define rollback logic here if necessary
}
}
Key Takeaways
Atomicity: Ensures that all updates are applied only if the entire sequence is successful.
Consistency: Verifies that the database remains in a valid state before and after the transaction.
Isolation: Maintains that operations within a transaction are isolated from other operations.
Durability: Guarantees that once a transaction is committed, it remains committed even in case of a system failure.
Safely Adding Columns, Indexes, and Foreign Keys
When modifying database schemas, it’s essential to ensure that changes such as adding columns, indexes, and foreign keys do not disrupt the existing database structure or cause errors due to already existing elements. Here’s a structured approach to safely performing these modifications by checking the existence of columns, indexes, and foreign keys before making changes.
Best Practices for Schema Modifications
Check for Existing Columns: Before adding a new column, verify whether it already exists. This prevents errors from attempting to add a column that is already present.
Check for Existing Indexes and Foreign Keys: Similar to columns, check for existing indexes and foreign keys before creating new ones to avoid conflicts and errors.
Use Generic Examples: The following example demonstrates the approach using generic table and column names, ensuring clarity and adaptability.
Typeorm example:
import { MigrationInterface, QueryRunner } from 'typeorm';
export class ModifyGenericTable1612592821628 implements MigrationInterface {
name = 'ModifyGenericTable1612592821628';
public async up(queryRunner: QueryRunner): Promise<void> {
// Check and add columns
if (!(await queryRunner.hasColumn('generic_table', 'new_column1'))) {
await queryRunner.query(`ALTER TABLE \`generic_table\` ADD \`new_column1\` varchar(255) NULL`);
}
if (!(await queryRunner.hasColumn('generic_table', 'new_column2'))) {
await queryRunner.query(`ALTER TABLE \`generic_table\` ADD \`new_column2\` varchar(255) NULL`);
}
// Define index keys
const foreignKey1 = 'FK_generic1';
const foreignKey2 = 'FK_generic2';
// Check for existing foreign keys
const foreignKey1Exists = await queryRunner.query(
`SELECT CONSTRAINT_NAME FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
WHERE TABLE_NAME = 'generic_table' AND CONSTRAINT_TYPE = 'FOREIGN KEY' AND CONSTRAINT_NAME = '${foreignKey1}'`
);
const foreignKey2Exists = await queryRunner.query(
`SELECT CONSTRAINT_NAME FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
WHERE TABLE_NAME = 'generic_table' AND CONSTRAINT_TYPE = 'FOREIGN KEY' AND CONSTRAINT_NAME = '${foreignKey2}'`
);
// Drop existing foreign keys if they exist
if (foreignKey1Exists.length) {
await queryRunner.query(`ALTER TABLE \`generic_table\` DROP FOREIGN KEY \`${foreignKey1}\``);
}
if (foreignKey2Exists.length) {
await queryRunner.query(`ALTER TABLE \`generic_table\` DROP FOREIGN KEY \`${foreignKey2}\``);
}
// Add new foreign keys
await queryRunner.query(
`ALTER TABLE \`generic_table\` ADD CONSTRAINT \`${foreignKey1}\`
FOREIGN KEY (\`new_column1\`) REFERENCES \`related_table1\`(\`related_column1\`)
ON DELETE CASCADE ON UPDATE NO ACTION`
);
await queryRunner.query(
`ALTER TABLE \`generic_table\` ADD CONSTRAINT \`${foreignKey2}\`
FOREIGN KEY (\`new_column2\`) REFERENCES \`related_table2\`(\`related_column2\`)
ON DELETE CASCADE ON UPDATE NO ACTION`
);
}
public async down(queryRunner: QueryRunner): Promise<void> {
// down implementation, it should have the same checks
}
}
This approach not only helps in maintaining the database schema without errors but also ensures that the modifications are performed in a controlled and predictable manner, minimizing the risk of disrupting the database operations.
Key Takeaways
Preemptive Checks: Ensure the column or index does not already exist before attempting to add it, avoiding redundant operations and potential errors.
Safety and Rollbacks: The use of down methods provides a way to revert changes, maintaining database integrity.
Foreign Key Management: Properly handle foreign keys by checking their existence and safely dropping them before adding new ones.
References
Ruby on Rails. (n.d.). ActiveRecord::Migration. https://api.rubyonrails.org/classes/ActiveRecord/Migration.html
Silberschatz, A., Korth, H. F., & Sudarshan, S. (2020). Database System Concepts (7th ed.). McGraw-Hill Education.