UTF-8 Conversion

News and discussion about beta versions and CVS changes
User avatar
gRegor
Posts: 738
Joined: Tue May 14, 2002 3:17 am
Location: Bellingham, WA
Contact:

UTF-8 Conversion

Postby gRegor » Mon Mar 03, 2014 9:13 am

I have had this in the works for . . . far too long. I'm happy to finally announce that the first version is out.

https://github.com/gRegorLove/nucleus-cms-to-utf8

The problem
A lot of legacy installs of Nucleus have the tables' encoding set to latin1 or some other non-utf8. "latin1" was the default on a lot of mysql installs and - for quite some time - the Nucleus install SQL did not specify a character set, so the mysql default was used.

This becomes a problem when UTF-8 characters are inserted into these tables and "coerced" into latin1 . . . often displaying question marks or other odd characters.

The fix
Converting a table to UTF-8 and converting the text without losing some characters can be quite a challenge. Especially if you want to do it programatically, as I did. Thankfully I came across a PHP class that does a really good job of fixing improperly encoded UTF-8 characters.

Installing and Using
1. Back up your database.
2. Upload the utf8/ directory to your nucleus/ directory.
3. You backed up your database, right?
4. In your browser, open the URL to the nucleus/utf8/ directory on your site. E.g. example.com/nucleus/utf8/
5. Click the link on that page to initiate the conversion process.

Notes
While I have tested this quite a bit on my own Nucleus install, it should be considered experimental and you should absolutely back up your entire Nucleus database before trying it.

Depending on the size of your database, the conversion process could take quite some time.

There are a few Nucleus tables that do not have primary keys and thus cannot be converted by this program. (Note to future devs: fix this.) Also, if a table has no fields with a character data type, it will be skipped.

Unfortunately, this will likely be my last major contribution to Nucleus CMS for the foreseeable future. I will make bug fixes as necessary and consider other updates (send a github pull request, please). I will check in here periodically. Other developers, obviously feel free to take this code and modify it however you want.

Many thanks to Sebastián Grignoli for his forceutf8 package. Without it this would not work and I probably would have never finished it on my own.
— gRegor
User avatar
dis
Posts: 209
Joined: Mon Aug 19, 2002 3:56 am

Postby dis » Sun Jul 13, 2014 1:56 am

Just wanted to say thank you for this update, and for all your hard work on Nucleus over the years, gRegor!

Unfortunately, this will likely be my last major contribution to Nucleus CMS for the foreseeable future.


:cry:
User avatar
gRegor
Posts: 738
Joined: Tue May 14, 2002 3:17 am
Location: Bellingham, WA
Contact:

Postby gRegor » Sun Jul 13, 2014 7:49 pm

You're welcome! I hope it works well for you. Be sure to back up that database first. :)
— gRegor
User avatar
dis
Posts: 209
Joined: Mon Aug 19, 2002 3:56 am

Postby dis » Sun Jul 13, 2014 8:37 pm

hahaha i will! thanks again! :D

Return to “Core Development”