Unibabel

Naive Approach

var text = "Hello World!";
var textLen = text.length;
var buffer = new Uint8Array(textLen);
var index;

//
// Create the Buffer from the String
//
for (index = 0; index < textLen; index += 1) {
  // most browsers now support text.codePointAt(index), which is better
  // WARNING buffer[index] = text[index] WILL FAIL SILENTLY
  buffer[index] = text.charCodeAt(index);
}

//
// Create and print SHA-1 Hash
//
function printHex(sha1buf) {
  console.log(sha1buf);
  var hex = bufferToHex(sha1buf); // implementation in appendix
  console.log(hex);
}

function printErr(err) {
  console.error(err);
}

window.crypto.subtle.digest("SHA-1", buffer).then(printHex, printErr);

The Unicode Problem

Now let's consider that we're in 2015, not 1986:

1-byte	~	a	0	!
2-byte	¶	¢	ε	ñ
3-byte	♥	☢	☃	‱
6-byte	𩶘	𐑶	𐐦	𝄢

Our users use weird symbols (💩) and, surprise surprise, they don't all speak ASCII English.

Also, check out these full-color symbols.

var radSnoBass = "I ♥ ☢ 𝄢!";              // 6 chars
Unibabel.strToUtf8Arr(radSnoBass).length; // 17 bytes

"I".length;                               // 1 char
Unibabel.strToUtf8Arr("I").length;        // 1 byte

"♥".length;                               // 1 char
Unibabel.strToUtf8Arr("♥").length;        // 2 bytes

"☢".length;                               // 1 char
Unibabel.strToUtf8Arr("☢").length;        // 3 bytes

"𝄢".length;                               // 2 chars WHAT!?!?
Unibabel.strToUtf8Arr("𝄢").length;        // 6 bytes

var happyBuf = Unibabel.strToUtf8Arr(radSnoBass);

So what's wrong with this? Well... a lot.

JavaScript uses UCS2. Not UTF-8. Not Unicode.
^^ I don't know what that means, but it's BAD!!!
It gives us a bad charater count for characters.
It gives us a bad byte count.
We can't loop over characters!

Let's take a look

var text = "I♥☢𝄢"; // aka "I Love Radioactive Bass"
var textLen = text.length;
var buffer = new Uint8Array(textLen);
var index;

for (index = 0; index < textLen; index += 1) {
  console.log('char[' + index + ']', text[index]
    , text.charCodeAt(index), text.codePointAt(index));

  buffer[index] = text.charCodeAt(index);

  console.log('buffer[' + index + ']', buffer[index]);
}

window.crypto.subtle.digest("SHA-1", buffer).then(printHex, printErr);
// BAD!: da548f7a00f799317d9ba6c03a6ee9d14065223d
// compare with `echo "I♥☢𝄢" | shasum`
// Good: d5bb644c3a9f517bec9c36400cbc449be271f65f

TextEncoder / TextDecoder

After writing this demo I found out about another new API

var encoder = new TextEncoder("utf-8");
var buf = encoder.encode("!¶☢☃𩶘𝄢");
console.log(buf);

var decoder = new TextDecoder("utf-8");
var msg = decoder.decode(buf)
console.log(msg);

Mozilla's Cross-Browser Solution

I took MDN's sample code and published it as unibabel on bower

How does it work?

Magic.
'nuf said

Check out the article (above) if you're really interested. It's a bunch of double and triple bit-shifting and other similar nonsense that I don't care to think about, but for which I am very grateful.

bower install unibabel

var buffer = window.Unibabel.strToUtf8Arr("I♥☢𝄢");
console.log(buffer);

// Unibabel also supports base64 conversion, if you need it
window.Unibabel.arrToBase64(buffer); // SeKZpeKYou2gtO20og==

window.crypto.subtle.digest("SHA-1", buffer).then(printHex, printErr);
// YAY! da548f7a00f799317d9ba6c03a6ee9d14065223d

Can I encrypt in node and decrypt in the browser?

What do you get when you mix an Elephant with a Rhino?

Let's try!

aes-256-cbc

Appendix

function bufferToHex(buf) {
  // NOTE: new Uint8Array(sha1buf) would create a mutable copy, whereas a DataView does not
  var dv = new DataView(buf)
  var i;
  var len;
  var hex = '';
  var c;

  for (i = 0, len = dv.byteLength; i < len; i += 1) {
    c = dv.getUint8(i).toString(16);
    if (c.length < 2) {
      c = '0' + c;
    }
    hex += c;
  }

  return hex;
}

Helpful Hints:

Unicode (!¶☢☃𩶘𝄢), WebCrypto, and You