When you have an international database, how do you make sure that the collation is correct for all locales? Here are some thoughts. Note that you have to be careful when mixing collations in queries or SQL will complain.
Table with virtual, generated columns
CREATE TABLE `note` (
`id` bigint NOT NULL AUTO_INCREMENT,
`text` varchar(256) DEFAULT NULL,
`text_swedish` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_swedish_ci GENERATED ALWAYS AS (`text`) VIRTUAL,
`text_danish` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_danish_ci GENERATED ALWAYS AS (`text`) VIRTUAL,
...
PRIMARY KEY (`id`),
KEY `text` (`text`),
KEY `text_swedish` (`text_swedish`),
KEY `text_danish` (`text_danish`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
There is support for generating columns from another column. I’m unsure how this affects memory usage.
Many tables per language
Create a table called note
, and then create an additional table per language, holding the localized data. Like the following tables:
CREATE TABLE `note` (
`id` bigint NOT NULL AUTO_INCREMENT,
...
`text` varchar(256) DEFAULT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
CREATE TABLE `note_sv` (
`note` bigint NOT NULL AUTO_INCREMENT,
...
`text` varchar(256) DEFAULT NULL,
PRIMARY KEY (`note`),
FOREIGN KEY (note) REFERENCES note(id) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_swedish_ci
CREATE TABLE `note_da` (
`note` bigint NOT NULL AUTO_INCREMENT,
...
`text` varchar(256) DEFAULT NULL,
PRIMARY KEY (`note`),
FOREIGN KEY (note) REFERENCES note(id) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_danish_ci
One table per localized note
This example is when you don’t want to mix data at all. E.g. for customers, you can put swedish customers separately from danish customers.
CREATE TABLE `note_sv` (
`id` bigint NOT NULL AUTO_INCREMENT,
...
`text` varchar(256) DEFAULT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_swedish_ci
CREATE TABLE `note_da` (
`id` bigint NOT NULL AUTO_INCREMENT,
...
`text` varchar(256) DEFAULT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_danish_ci
Building a custom collation
I haven’t dug into how this works, but it should be possible to build a custom collation which works good enough. Seems like a hack though.