Huffman coding is a variable-length coding algorithm, usually used for lossless compression.
Take a string like below as an example.
abbcccddddeeeeeffffff
How could we express it efficiently?
We know that 1 bit could express 2 cases: 0 and 1. 2 bit could express 4 case: 00, 01, 10, 11. With this technique, we could make a lookup table to map every unique character to a binary format. Because here we have 6 unique characters, so we need at lest 3 bits to express each character clearly.
a 000
b 001
c 010
d 011
e 100
f 101
g 110
So how many bits used for encoding this string in total? Because we have used 3 bits to express each character. So in total, number of bits is number of characters multiply 3.
21(number of characters) * 3(number of bits for encoding 1 character) = 63 bits
So this is the very naive way to encoding a string. Huffman coding a another way to tackle this problem.
The key idea is, instead of use fixed length bits to encode every characters, use less bits to express character more frequently seen.
Now let's see how huffman coding works.
To get a new mapping by huffman coding algorithm, the key step is to create a huffman tree by the input string. Then the new mapping table could be created from this huffman tree.
First, we need to compute every characters' frequency.
const m: Record<string, number> = {};
[...str].forEach(char => {
if (char in m) {
m[char] += 1;
} else {
m[char] = 1;
}
});
Next, we define a HuffmanTreeNode data structure, which is a pretty standard binary tree structure with a symbol element to save the character and a frequency element to save its frequency.
type HuffmanTreeNode = {
symbol: string;
frequency: number;
left: HuffmanTreeNode | null;
right: HuffmanTreeNode | null;
};
Next, we create an array of HuffmanTreeNode for every unqiue characters in the string.
Object.keys(m).map(char => ({ symbol: char, frequency: m[char], left: null, right: null }))
Later we need to push/pop HuffmanTreeNode according to its frequencies, so here, we use a min heap data structure to save this array.
const minHeap = new Heap<HuffmanTreeNode>(
Object.keys(m).map(char => ({ symbol: char, frequency: m[char], left: null, right: null })),
(e1, e2) => e1.frequency > e2.frequency
);
Now comes the key part. For each loop, we get two HuffmanTreeNodes with least frequencies(because its a min heap), then create a new node with these two nodes. The new node's symbol is not important, we just concat it here. The new node's frequency should be the sum of the 2 popped nodes. And lastly, make this new node as the parent of these 2 popped nodes, and push it into the heap.
while (minHeap.length > 1) {
const n1 = minHeap.pop() as HuffmanTreeNode;
const n2 = minHeap.pop() as HuffmanTreeNode;
minHeap.push({
symbol: n1.symbol + n2.symbol,
frequency: n1.frequency + n2.frequency,
left: n1,
right: n2
});
}
After this loop ends, the minHeap should be left with only one node, which is the root of the Huffman Tree.
function createHuffmanTree(str: string): HuffmanTreeNode {
// ...
return minHeap.pop() as HuffmanTreeNode;
}
const root = createHuffmanTree(str);
We can print this root node, and it looks like blew.
{
symbol: "decabf",
frequency: 21,
left: {
symbol: "de",
frequency: 9,
left: { symbol: "d", frequency: 4, left: null, right: null },
right: { symbol: "e", frequency: 5, left: null, right: null }
},
right: {
symbol: "cabf",
frequency: 12,
left: {
symbol: "cab",
frequency: 6,
left: { symbol: "c", frequency: 3, left: null, right: null },
right: { symbol: "ab", frequency: 3, left: [Object], right: [Object] }
},
right: { symbol: "f", frequency: 6, left: null, right: null }
}
}
With the above root node, next step is to walk the tree from root node to leaf node. Among the traversal process, if we go left child, use bit 0, if we go right child, use bit 1. Note that every leaf node stands for a unique character from the original string, so when we walk to the left node, we can get a unique bit array, which stands for the encoding for that character.
let mappings: Record<string, string> = {};
function getMappings(node: HuffmanTreeNode, cur: number[] = []) {
if (node.left === null || node.right === null) {
mappings[node.symbol] = cur.join("");
return;
}
getMappings(node.left, [...cur, 0]);
getMappings(node.right, [...cur, 1]);
}
getMappings(root);
Now, let see what the mappings looks like.
{ d: "00", e: "01", c: "100", a: "1010", b: "1011", f: "11" }
As you can see, the character a appear very few times, so its encoding use more bits. And the character f appears many times, to its encoding use less bits.
Now let's see how many bits used in total with this mappings.
char frequency bits total bits
a 1 1010 1 * 4
b 2 1011 2 * 4
c 3 100 3 * 3
d 4 00 4 * 2
e 5 01 5 * 2
f 6 11 6 * 2
total: 51 bits
So with huffman coding, we can express the same string with 51 bits, compared to 63 bits of naive coding.
See whole code:
class Heap<T> {
private arr: T[];
private comparator: (e1: T, e2: T) => boolean;
constructor(arr: T[], comparator: (e1: T, e2: T) => boolean) {
this.arr = arr;
this.comparator = comparator;
for (let i = this.arr.length - 1; i >= 0; i--) {
this.heapify(i);
}
}
private heapify(index: number) {
const leftChildIndex = 2 * index + 1;
const rightChildIndex = 2 * index + 2;
let parentIndex = index;
if (leftChildIndex < this.arr.length && this.comparator(this.arr[parentIndex], this.arr[leftChildIndex])) {
parentIndex = leftChildIndex;
}
if (rightChildIndex < this.arr.length && this.comparator(this.arr[parentIndex], this.arr[rightChildIndex])) {
parentIndex = rightChildIndex;
}
if (parentIndex !== index) {
this.swap(parentIndex, index);
this.heapify(parentIndex);
}
}
private swapWithParent(index: number) {
if (index === 0) return;
const parentIndex = ~~((index - 1) / 2);
if (this.comparator(this.arr[parentIndex], this.arr[index])) {
this.swap(parentIndex, index);
this.swapWithParent(parentIndex);
}
}
private swap(index1: number, index2: number) {
[this.arr[index1], this.arr[index2]] = [this.arr[index2], this.arr[index1]];
}
public push(element: T) {
this.arr.push(element);
this.swapWithParent(this.arr.length - 1);
}
public pop(): T | null {
if (this.arr.length === 0) return null;
const element = this.arr[0];
this.swap(0, this.arr.length - 1);
this.arr.splice(this.arr.length - 1, 1);
this.heapify(0);
return element;
}
public get length(): number {
return this.arr.length;
}
}
type HuffmanTreeNode = {
symbol: string;
frequency: number;
left: HuffmanTreeNode | null;
right: HuffmanTreeNode | null;
};
const str = "abbcccddddeeeeeffffff";
console.log(str.length);
function createHuffmanTree(str: string): HuffmanTreeNode {
// key: unique characters, value: frequency
const m: Record<string, number> = {};
[...str].forEach(char => {
if (char in m) {
m[char] += 1;
} else {
m[char] = 1;
}
});
const minHeap = new Heap<HuffmanTreeNode>(
Object.keys(m).map(char => ({ symbol: char, frequency: m[char], left: null, right: null })),
(e1, e2) => e1.frequency > e2.frequency
);
while (minHeap.length > 1) {
const n1 = minHeap.pop() as HuffmanTreeNode;
const n2 = minHeap.pop() as HuffmanTreeNode;
minHeap.push({
symbol: n1.symbol + n2.symbol,
frequency: n1.frequency + n2.frequency,
left: n1,
right: n2
});
}
return minHeap.pop() as HuffmanTreeNode;
}
const root = createHuffmanTree(str);
console.log(root);
let mappings: Record<string, string> = {};
function getMappings(node: HuffmanTreeNode, cur: number[] = []) {
if (node.left === null || node.right === null) {
mappings[node.symbol] = cur.join("");
return;
}
getMappings(node.left, [...cur, 0]);
getMappings(node.right, [...cur, 1]);
}
getMappings(root);
console.log(mappings);