Awesome FOSS Logo
Discover awesome open source software
Launched 🚀🧑‍🚀

Handling UTF-8 Quoted Printable strings

Categories

While working on VCFPorter (a project that will be out soon), I’ve run into the problem of trying to parse (and fully understand) how javascript handles UTF8. While I thought I had a pretty good grasp on things, having to actually translate UTF-8 Quoted Printable strings (part of the vcard spec) sent me scurrying to Google to look up how exactly to encode/decode UTF-8 strings in javascript. That’s always a good indicator that I don’t know enough about something.

While feeling a little frustrated that I didn’t understand javascript as well as I should have (and binary encodings), I ran into the following great resources (some for the second/third time) that explain encoding, and more specifically UTF8:

http://www.joelonsoftware.com/articles/Unicode.html (Joel makes fantastic articles, this is just one of many)

https://en.wikipedia.org/wiki/UTF-8 (detailed UTF-8 byte-by-byte description)

http://ecmanaut.blogspot.co.uk/2006/07/encoding-decoding-utf8-in-javascript.html (small code snippet with how to translate to UTF8 strings)

http://www.convertstring.com/EncodeDecode/QuotedPrintableDecode (for testing to find out what your string should look like)

Primarily the last one exposed a really awesome trick (API designers hate him!) to decoding a UTF-8 encoded binary string:

`While working on VCFPorter (a project that will be out soon), I’ve run into the problem of trying to parse (and fully understand) how javascript handles UTF8. While I thought I had a pretty good grasp on things, having to actually translate UTF-8 Quoted Printable strings (part of the vcard spec) sent me scurrying to Google to look up how exactly to encode/decode UTF-8 strings in javascript. That’s always a good indicator that I don’t know enough about something.

While feeling a little frustrated that I didn’t understand javascript as well as I should have (and binary encodings), I ran into the following great resources (some for the second/third time) that explain encoding, and more specifically UTF8:

http://www.joelonsoftware.com/articles/Unicode.html (Joel makes fantastic articles, this is just one of many)

https://en.wikipedia.org/wiki/UTF-8 (detailed UTF-8 byte-by-byte description)

http://ecmanaut.blogspot.co.uk/2006/07/encoding-decoding-utf8-in-javascript.html (small code snippet with how to translate to UTF8 strings)

http://www.convertstring.com/EncodeDecode/QuotedPrintableDecode (for testing to find out what your string should look like)

Primarily the last one exposed a really awesome trick (API designers hate him!) to decoding a UTF-8 encoded binary string:

`

``While working on VCFPorter (a project that will be out soon), I’ve run into the problem of trying to parse (and fully understand) how javascript handles UTF8. While I thought I had a pretty good grasp on things, having to actually translate UTF-8 Quoted Printable strings (part of the vcard spec) sent me scurrying to Google to look up how exactly to encode/decode UTF-8 strings in javascript. That’s always a good indicator that I don’t know enough about something.

While feeling a little frustrated that I didn’t understand javascript as well as I should have (and binary encodings), I ran into the following great resources (some for the second/third time) that explain encoding, and more specifically UTF8:

http://www.joelonsoftware.com/articles/Unicode.html (Joel makes fantastic articles, this is just one of many)

https://en.wikipedia.org/wiki/UTF-8 (detailed UTF-8 byte-by-byte description)

http://ecmanaut.blogspot.co.uk/2006/07/encoding-decoding-utf8-in-javascript.html (small code snippet with how to translate to UTF8 strings)

http://www.convertstring.com/EncodeDecode/QuotedPrintableDecode (for testing to find out what your string should look like)

Primarily the last one exposed a really awesome trick (API designers hate him!) to decoding a UTF-8 encoded binary string:

`While working on VCFPorter (a project that will be out soon), I’ve run into the problem of trying to parse (and fully understand) how javascript handles UTF8. While I thought I had a pretty good grasp on things, having to actually translate UTF-8 Quoted Printable strings (part of the vcard spec) sent me scurrying to Google to look up how exactly to encode/decode UTF-8 strings in javascript. That’s always a good indicator that I don’t know enough about something.

While feeling a little frustrated that I didn’t understand javascript as well as I should have (and binary encodings), I ran into the following great resources (some for the second/third time) that explain encoding, and more specifically UTF8:

http://www.joelonsoftware.com/articles/Unicode.html (Joel makes fantastic articles, this is just one of many)

https://en.wikipedia.org/wiki/UTF-8 (detailed UTF-8 byte-by-byte description)

http://ecmanaut.blogspot.co.uk/2006/07/encoding-decoding-utf8-in-javascript.html (small code snippet with how to translate to UTF8 strings)

http://www.convertstring.com/EncodeDecode/QuotedPrintableDecode (for testing to find out what your string should look like)

Primarily the last one exposed a really awesome trick (API designers hate him!) to decoding a UTF-8 encoded binary string:

`

``

These two functions are what did the trick for me (primarily the decoding), and I was happy to find the answer, but think it took too long. Also, there’s quite a difference going from:

=E8=97=A4=E6=A3=AE

to

藤森

Without furthere ado, here’s the code I had to write to accomplish this: