Handling UTF-8 Quoted Printable strings

Categories

This post still working for you?

It's been a while since this was posted. Hopefully the information in here is still useful to you (if it isn't please let me know!). If you want to get the new stuff as soon as it's out though, sign up to the mailing list below.

Join the Mailing list

While working on VCFPorter (a project that will be out soon), I’ve run into the problem of trying to parse (and fully understand) how javascript handles UTF8. While I thought I had a pretty good grasp on things, having to actually translate UTF-8 Quoted Printable strings (part of the vcard spec) sent me scurrying to Google to look up how exactly to encode/decode UTF-8 strings in javascript. That’s always a good indicator that I don’t know enough about something.

While feeling a little frustrated that I didn’t understand javascript as well as I should have (and binary encodings), I ran into the following great resources (some for the second/third time) that explain encoding, and more specifically UTF8:

http://www.joelonsoftware.com/articles/Unicode.html (Joel makes fantastic articles, this is just one of many)

https://en.wikipedia.org/wiki/UTF-8 (detailed UTF-8 byte-by-byte description)

http://ecmanaut.blogspot.co.uk/2006/07/encoding-decoding-utf8-in-javascript.html (small code snippet with how to translate to UTF8 strings)

http://www.convertstring.com/EncodeDecode/QuotedPrintableDecode (for testing to find out what your string should look like)

Primarily the last one exposed a really awesome trick (API designers hate him!) to decoding a UTF-8 encoded binary string:

`While working on VCFPorter (a project that will be out soon), I’ve run into the problem of trying to parse (and fully understand) how javascript handles UTF8. While I thought I had a pretty good grasp on things, having to actually translate UTF-8 Quoted Printable strings (part of the vcard spec) sent me scurrying to Google to look up how exactly to encode/decode UTF-8 strings in javascript. That’s always a good indicator that I don’t know enough about something.

While feeling a little frustrated that I didn’t understand javascript as well as I should have (and binary encodings), I ran into the following great resources (some for the second/third time) that explain encoding, and more specifically UTF8:

http://www.joelonsoftware.com/articles/Unicode.html (Joel makes fantastic articles, this is just one of many)

https://en.wikipedia.org/wiki/UTF-8 (detailed UTF-8 byte-by-byte description)

http://ecmanaut.blogspot.co.uk/2006/07/encoding-decoding-utf8-in-javascript.html (small code snippet with how to translate to UTF8 strings)

http://www.convertstring.com/EncodeDecode/QuotedPrintableDecode (for testing to find out what your string should look like)

Primarily the last one exposed a really awesome trick (API designers hate him!) to decoding a UTF-8 encoded binary string:

`

``While working on VCFPorter (a project that will be out soon), I’ve run into the problem of trying to parse (and fully understand) how javascript handles UTF8. While I thought I had a pretty good grasp on things, having to actually translate UTF-8 Quoted Printable strings (part of the vcard spec) sent me scurrying to Google to look up how exactly to encode/decode UTF-8 strings in javascript. That’s always a good indicator that I don’t know enough about something.

While feeling a little frustrated that I didn’t understand javascript as well as I should have (and binary encodings), I ran into the following great resources (some for the second/third time) that explain encoding, and more specifically UTF8:

http://www.joelonsoftware.com/articles/Unicode.html (Joel makes fantastic articles, this is just one of many)

https://en.wikipedia.org/wiki/UTF-8 (detailed UTF-8 byte-by-byte description)

http://ecmanaut.blogspot.co.uk/2006/07/encoding-decoding-utf8-in-javascript.html (small code snippet with how to translate to UTF8 strings)

http://www.convertstring.com/EncodeDecode/QuotedPrintableDecode (for testing to find out what your string should look like)

Primarily the last one exposed a really awesome trick (API designers hate him!) to decoding a UTF-8 encoded binary string:

`While working on VCFPorter (a project that will be out soon), I’ve run into the problem of trying to parse (and fully understand) how javascript handles UTF8. While I thought I had a pretty good grasp on things, having to actually translate UTF-8 Quoted Printable strings (part of the vcard spec) sent me scurrying to Google to look up how exactly to encode/decode UTF-8 strings in javascript. That’s always a good indicator that I don’t know enough about something.

While feeling a little frustrated that I didn’t understand javascript as well as I should have (and binary encodings), I ran into the following great resources (some for the second/third time) that explain encoding, and more specifically UTF8:

http://www.joelonsoftware.com/articles/Unicode.html (Joel makes fantastic articles, this is just one of many)

https://en.wikipedia.org/wiki/UTF-8 (detailed UTF-8 byte-by-byte description)

http://ecmanaut.blogspot.co.uk/2006/07/encoding-decoding-utf8-in-javascript.html (small code snippet with how to translate to UTF8 strings)

http://www.convertstring.com/EncodeDecode/QuotedPrintableDecode (for testing to find out what your string should look like)

Primarily the last one exposed a really awesome trick (API designers hate him!) to decoding a UTF-8 encoded binary string:

`

``

These two functions are what did the trick for me (primarily the decoding), and I was happy to find the answer, but think it took too long. Also, there’s quite a difference going from:

=E8=97=A4=E6=A3=AE

to

藤森

Without furthere ado, here’s the code I had to write to accomplish this:

Like what you're reading? Get it in your inbox