Steps for converting db from PostgreSQL to MariaDB

Our Django apps are used to connect PostgreSQL db, until we want to hire more programmers. The reason is the same as we transfer version control tool from hg to git: MySQL/MariaDB, and git are familiar to more developers.

Because Django is a strong web framework, it has a data convertion tool on json format. We can dump data from PostgreSQL, and restore the json data to MariaDB by dumpdata/loaddata commands of Django. Compare to commercial solutions, we can ensure that the converted data runs better on Django.

In PostgreSQL side

「is_suspend」 is a special field in our project. Codes as below are only needed for our system.

ShopOption.all_objects.filter(is_suspend=True).update(is_suspend=False)
WorkOrder.all_objects.filter(is_suspend=True).delete()
Order.all_objects.filter(is_suspend=True).delete()

Reduce the table size before data converting:

delete from multi_sites_uniquesession where update_time < '2018-01-18 00:00:00+00:00';
delete from django_session where expire_date < '2018-01-18 00:00:00+00:00';

There are some CHARSET/COLLATION issues in Mariadb 10.2. If we want to support 「utf8mb4」 charset, then we only have utf8mb4_unicode_ci collation can be choosed. And utf8mb4_unicode_ci has a few behaviors that we don’t like:

  1. auto-convert full-width char to half-width char.

  2. ignore the space char in the begin or end of value.

  3. case insensitive

In PostgreSQL, we usually use zh_TW.UTF-8, ja_JP.UTF-8 collation, and values are well defined unique, but it will raise IntegrityError when it is stored in MariaDB.

So that we need to adjust some values in original PG db:

update trade_address set street = replace(street, '(', ' (') where street like '%(%';
update trade_address set street = replace(street, ')', ') ') where street like '%)%';
update trade_address set street = replace(street, '-', ' - ') where street like '%-%';
update trade_address set street = regexp_replace(street, ' $', '。') where street ~ ' $';
update trade_address set street = 'T e s t' where street = 'Test';
update trade_address set street = concat(street, id) where street like '%test%';
update trade_address set street = concat(street, id) where street like '%TEST%';
update trade_address set street = replace(street, '0', ' 0 ') where street like '%0%';
update trade_address set street = replace(street, '1', ' 1 ') where street like '%1%';
update trade_address set street = replace(street, '2', ' 2 ') where street like '%2%';
update trade_address set street = replace(street, '3', ' 3 ') where street like '%3%';
update trade_address set street = replace(street, '4', ' 4 ') where street like '%4%';
update trade_address set street = replace(street, '5', ' 5 ') where street like '%5%';
update trade_address set street = replace(street, '6', ' 6 ') where street like '%6%';
update trade_address set street = replace(street, '7', ' 7 ') where street like '%7%';
update trade_address set street = replace(street, '8', ' 8 ') where street like '%8%';
update trade_address set street = replace(street, '9', ' 9 ') where street like '%9%';

update trade_consumer set address = replace(address, '(', ' (') where address like '%(%';
update trade_consumer set address = replace(address, ')', ') ') where address like '%)%';
update trade_consumer set address = replace(address, '-', ' - ') where address like '%-%';
update trade_consumer set address = replace(address, 'Test', 'T e s t') where address like '%Test%';
update trade_consumer set address = replace(address, '0', ' 0 ') where address like '%0%';
update trade_consumer set address = replace(address, '1', ' 1 ') where address like '%1%';
update trade_consumer set address = replace(address, '2', ' 2 ') where address like '%2%';
update trade_consumer set address = replace(address, '3', ' 3 ') where address like '%3%';
update trade_consumer set address = replace(address, '4', ' 4 ') where address like '%4%';
update trade_consumer set address = replace(address, '5', ' 5 ') where address like '%5%';
update trade_consumer set address = replace(address, '6', ' 6 ') where address like '%6%';
update trade_consumer set address = replace(address, '7', ' 7 ') where address like '%7%';
update trade_consumer set address = replace(address, '8', ' 8 ') where address like '%8%';
update trade_consumer set address = replace(address, '9', ' 9 ') where address like '%9%';

update trade_consumer set cellphone = replace(cellphone, '0', ' 0 ') where cellphone like '%0%';
update trade_consumer set cellphone = replace(cellphone, '1', ' 1 ') where cellphone like '%1%';
update trade_consumer set cellphone = replace(cellphone, '2', ' 2 ') where cellphone like '%2%';
update trade_consumer set cellphone = replace(cellphone, '3', ' 3 ') where cellphone like '%3%';
update trade_consumer set cellphone = replace(cellphone, '4', ' 4 ') where cellphone like '%4%';
update trade_consumer set cellphone = replace(cellphone, '5', ' 5 ') where cellphone like '%5%';
update trade_consumer set cellphone = replace(cellphone, '6', ' 6 ') where cellphone like '%6%';
update trade_consumer set cellphone = replace(cellphone, '7', ' 7 ') where cellphone like '%7%';
update trade_consumer set cellphone = replace(cellphone, '8', ' 8 ') where cellphone like '%8%';
update trade_consumer set cellphone = replace(cellphone, '9', ' 9 ') where cellphone like '%9%';

update trade_consumer set phone = replace(phone, '0', ' 0 ') where phone like '%0%';
update trade_consumer set phone = replace(phone, '1', ' 1 ') where phone like '%1%';
update trade_consumer set phone = replace(phone, '2', ' 2 ') where phone like '%2%';
update trade_consumer set phone = replace(phone, '3', ' 3 ') where phone like '%3%';
update trade_consumer set phone = replace(phone, '4', ' 4 ') where phone like '%4%';
update trade_consumer set phone = replace(phone, '5', ' 5 ') where phone like '%5%';
update trade_consumer set phone = replace(phone, '6', ' 6 ') where phone like '%6%';
update trade_consumer set phone = replace(phone, '7', ' 7 ') where phone like '%7%';
update trade_consumer set phone = replace(phone, '8', ' 8 ') where phone like '%8%';
update trade_consumer set phone = replace(phone, '9', ' 9 ') where phone like '%9%';

update trade_consumer set name = regexp_replace(name, ' $', '。') where name ~ ' $';
update trade_consumer set email = regexp_replace(email, ' $', '。') where email ~ ' $';
update trade_consumer set cellphone = regexp_replace(cellphone, ' $', '。') where cellphone ~ ' $';
update trade_consumer set phone = regexp_replace(phone, ' $', '。') where phone ~ ' $';
update trade_consumer set address = regexp_replace(address, ' $', '。') where address ~ ' $';

update trade_consumer set name = regexp_replace(name, ' $', ' 。') where name ~ ' $';
update trade_consumer set email = regexp_replace(email, ' $', ' 。') where email ~ ' $';
update trade_consumer set cellphone = regexp_replace(cellphone, ' $', ' 。') where cellphone ~ ' $';
update trade_consumer set phone = regexp_replace(phone, ' $', ' 。') where phone ~ ' $';
update trade_consumer set address = regexp_replace(address, ' $', ' 。') where address ~ ' $';

update trade_consumer set cellphone = concat(substring(cellphone, 0, 5), '-', substring(cellphone, 5)) where id = 10855 and cellphone not like '%-%';
update trade_consumer set phone = concat(substring(phone, 0, 5), '-', substring(phone, 5)) where id = 15491 and phone not like '%-%';

update maillist_recipient set email = upper(email) where id = 166888 and upper(email) != email;

Dump json from PG

./manage.py dumpdata -e maillist --indent 1 > all_exclude_maillist.json
./manage.py dumpdata maillist --indent 1 > maillist.json

In MariaDB side

Create a new database:

CREATE DATABASE ec_bio_enzyme_com CHARACTER SET = 'utf8mb4' COLLATE = 'utf8mb4_unicode_ci';
CREATE USER ec_bio_enzyme_com@'%' identified by 'password';
GRANT ALL PRIVILEGES on ec_bio_enzyme_com.* to ec_bio_enzyme_com@'%';
GRANT ALL PRIVILEGES on test_ec_bio_enzyme_com.* to ec_bio_enzyme_com@'%';

Update DB information in settings.py. Then migrate the new db:

./manage.py migrate

The id of some old values are different form the migration, so we need to truncate data in four tables:

SET FOREIGN_KEY_CHECKS = 0;
TRUNCATE TABLE auth_permission;
TRUNCATE TABLE django_content_type;
TRUNCATE TABLE django_site;
TRUNCATE TABLE auth_user;
SET FOREIGN_KEY_CHECKS = 1;

We only change the collation in email field, becuase there are so many emails are have upper and lower case in the same time:

ALTER TABLE trade_consumer MODIFY COLUMN email varchar(254) COLLATE utf8_bin NOT NULL;
ALTER TABLE maillist_recipient MODIFY COLUMN email varchar(254) COLLATE utf8_bin NOT NULL;

Load json to MariaDB

./manage.py loaddata all_exclude_maillist.json
./manage.py loaddata maillist.json

Verify row counts

Log table row counts in PG:

\o 'count_pg.sql';
SELECT concat('SELECT concat(''', relname, ''', '', '', count(*)) from ', relname, ';') FROM pg_stat_user_tables ORDER BY relname;
psql -U ec_bio_enzyme_com -W ec_bio_enzyme_com < count_pg.sql | grep ", " > pg.log

Check MariaDB table rows:

\T count_mysql.sql
SELECT concat('SELECT concat(''', TABLE_NAME, ''', '', '', count(*)) from ', TABLE_NAME, ';') FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'ec_bio_enzyme_com' order by TABLE_NAME;
./manage.py dbshell < count_mysql.sql | grep ", [0-9]" > my.log
diff -w pg.log my.log
    18c18
    < back_operator_workorderhistoryproduct, 25927
    ---
    > back_operator_workorderhistoryproduct, 25786
    31c31
    < db_cache, 124
    ---
    > db_cache, 0
    34c34
    < django_migrations, 53
    ---
    > django_migrations, 55
    94c94
    < trade_htmlcontent, 5949
    ---
    > trade_htmlcontent, 5948
    101c101
    < trade_product, 88
    ---
    > trade_product, 86

Except db_cache, django_migrations tables(dumpdata ignores those), we can see the different counts are only involved by 「is_suspend = true」.

Hooray~ It is Done!