How To Dump And Import Utf 8 Mysql Databases In A Safe Way

Published:
Last modified:

Overview

If you have a MySQL database with its character set encoded in UTF-8, then you need to treat it in a special way when dumping and restoring the database to be able to read its special characters.

Dumping a database

The common and quickest way of dumping a database with mysqldump does not treat utf-8 encoding right.

Avoid doing it like:

This is the correct way to dump a MySQL database safely:


$ mysqldump -uroot -p database -r utf8.dump
...

The -r or, the same,--result-file option, will create the output in ASCII format.

--result-file=file_name, -r file_name
       Direct output to the named file. The result file is created and its previous contents overwritten, even if an error
       occurs while generating the dump.

       This option should be used on Windows to prevent newline “\n” characters from being converted to “\r\n” carriage
       return/newline sequences.

Restoring a database

Usually you will restore a database simply inserting the data with an input file descriptor <, but this won’t handle utf-8 encoding properly, so this should be avoided:


$ mysql -uroot -p --default-character-set=utf8 database
mysql>  SET names 'utf8'
mysql>  SOURCE utf8.dump
...

Related Definitions

Character set

A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set. Suppose that we have an alphabet with four letters: A, B, a, b. We give each letter a number: A = 0, B = 1, a = 2, b = 3. The letter A is a symbol, the number 0 is the encoding for A, and the combination of all four letters and their encodings is a character set.
Character Sets and Collations in General MySQL Docs

UTF-8

UTF-8 (Unicode Transformation Format with 8-bit units) is an alternative way to store Unicode data. It is implemented according to RFC 3629, which describes encoding sequences that take from one to four bytes.
The utf8 Character Set (3-Byte UTF-8 Unicode Encoding) MySQL Docs

Reference

Uruguay
Marcelo Canina
I'm Marcelo Canina, a developer from Uruguay. I build websites and web-based applications from the ground up and share what I learn here.
comments powered by Disqus


Except as otherwise noted, the content of this page is licensed under CC BY-NC-ND 4.0 . Terms and Policy.

Powered by SimpleIT Hugo Theme

·