Monday, March 26, 2007

UTF-8 ... and why you should use it, part 2

I've sadly discovered that my previous post was lacking some bits of information. For starters,
when using tomcat, you should place this directive ( URIEncoding="UTF-8" ) in your connector definition. Then, you should use this attribute ( accept-charset="utf-8" ) with all your forms; this isn't really necessary, just good for practice. Then, you should ensure that you have a filter defined within your web.xml document to coerce incoming requests into UTF-8 format. The easy way of doing this if you're using Spring is to place the following definitions into your web.xml document :


<filter>
<filter-name>charsetFilter</filter-name>
<filter-class>
org.springframework.web.filter.CharacterEncodingFilter
</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>

<filter-mapping>
<filter-name>charsetFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>

This will filter all your incoming requests for you. *Edit* : As one additional side note, if you're moving an existing database to UTF-8 (which is fairly likely since new production databases aren't started that often), then you'll also need to run an ALTER DATABASE statement on the database in MySQL to set its default character set to UTF-8.

No comments: