820. Short Encoding of Words - Leetcode Solution

Code Implementation

class Solution:
    def minimumLengthEncoding(self, words):
        words = set(words)
        for word in list(words):
            for k in range(1, len(word)):
                words.discard(word[k:])
        return sum(len(word) + 1 for word in words)

class Solution {
public:
    int minimumLengthEncoding(vector<string>& words) {
        unordered_set<string> s(words.begin(), words.end());
        for (const string& word : words) {
            for (int k = 1; k < word.size(); ++k) {
                s.erase(word.substr(k));
            }
        }
        int res = 0;
        for (const string& word : s) {
            res += word.size() + 1;
        }
        return res;
    }
};

class Solution {
    public int minimumLengthEncoding(String[] words) {
        Set<String> s = new HashSet<>(Arrays.asList(words));
        for (String word : words) {
            for (int k = 1; k < word.length(); ++k) {
                s.remove(word.substring(k));
            }
        }
        int res = 0;
        for (String word : s) {
            res += word.length() + 1;
        }
        return res;
    }
}

var minimumLengthEncoding = function(words) {
    let s = new Set(words);
    for (let word of words) {
        for (let k = 1; k < word.length; ++k) {
            s.delete(word.slice(k));
        }
    }
    let res = 0;
    for (let word of s) {
        res += word.length + 1;
    }
    return res;
};

Problem Description

Given a list of words, we want to encode them as a single string in such a way that each word is a suffix of the string, and every word ends with a '#' character. The goal is to minimize the length of this encoded string. If a word is already a suffix of another word, it does not need to be encoded separately. The output is the minimum possible length of such an encoding.

Each word must appear as a suffix, ending at a '#'.
If one word is a suffix of another, we only need to encode the longer word.
All words are lowercase English letters.
There is only one valid answer for the minimum length, and words cannot be reused in the encoding.

Thought Process

At first glance, it may seem that we need to encode every word separately, simply joining them together with '#' characters. However, the key insight is that if a word is a suffix of another word, it does not need to be encoded on its own. For example, if "time" and "me" are both in the list, encoding "time#" already covers "me" as a suffix.

A brute-force approach might try to check for all possible pairs of words to see if one is a suffix of another, but this is inefficient. Instead, we can optimize by focusing only on the words that are not suffixes of any other word. Only these "root" words need to be encoded, as all their suffixes are automatically included.

Solution Approach

We start by putting all the words into a set. This allows for fast lookups and removals.
For each word in the original list, we generate all its possible non-empty suffixes (excluding the word itself).
We remove these suffixes from the set, because if a word is a suffix of another, it does not need to be encoded separately.
At the end, the set contains only those words that are not suffixes of any other word. These are the words that must appear as distinct entries in the encoding.
The minimum encoding length is the sum of the length of each remaining word plus one (for the '#' at the end).

This approach is efficient because:

Using a set gives us O(1) time for lookups and removals.
We avoid redundant checks by processing each word only once.
We never need to explicitly build the encoding string, just count the required length.

Example Walkthrough

Consider the input words = ["time", "me", "bell"].

Start with the set: {"time", "me", "bell"}
For "time", remove its suffixes: "ime", "me", "e". After removal, set is {"time", "bell"}
For "me", remove its suffix: "e". "me" is already removed, so nothing changes.
For "bell", remove its suffixes: "ell", "ll", "l". None of these are in the set, so set remains {"time", "bell"}
Final set: {"time", "bell"}
Minimum encoding: "time#bell#" (length = 5 + 5 = 10)

Both "me" and "e" are suffixes of "time", so they do not need to be encoded separately.

Time and Space Complexity

Brute-force: For each pair of words, check if one is a suffix of another. This is O(N^2 * L), where N is the number of words and L is the average word length.
Optimized approach (using set):
- We process each word and for each, generate its suffixes (up to L per word), so total time is O(N * L^2).
- Set operations are O(1) on average.
- Space complexity is O(N * L) due to storing all words and suffixes.

Summary

The shortest encoding of words problem is elegantly solved by removing all words that are suffixes of any other word, and only encoding the remaining words. The use of a set makes suffix removal efficient, and the final answer is simply the sum of the lengths of these root words plus one for each '#'. This approach avoids unnecessary computations and leverages the properties of sets for optimal performance.

AlgoMap.io

Want Help Cracking FAANG?