Want Help Cracking FAANG?

(Then click this)

×
Back to Question Bank

1410. HTML Entity Parser - Leetcode Solution

Code Implementation

class Solution:
    def entityParser(self, text: str) -> str:
        entities = {
            """: '"',
            "'": "'",
            "&": "&",
            ">": ">",
            "<": "<",
            "⁄": "/"
        }
        # Replace < with < (not itself)
        entities["<"] = "<"
        
        # Iterate and replace all entities
        for ent, ch in entities.items():
            text = text.replace(ent, ch)
        return text
      
class Solution {
public:
    string entityParser(string text) {
        vector<pair<string, string>> entities = {
            {"&quot;", "\""},
            {"&apos;", "'"},
            {"&amp;", "&"},
            {"&gt;", ">"},
            {"&lt;", "<"},
            {"&frasl;", "/"}
        };
        for (auto &ent : entities) {
            size_t pos = 0;
            while ((pos = text.find(ent.first, pos)) != string::npos) {
                text.replace(pos, ent.first.length(), ent.second);
                pos += ent.second.length();
            }
        }
        return text;
    }
};
      
class Solution {
    public String entityParser(String text) {
        String[][] entities = {
            {""", "\""},
            {"'", "'"},
            {"&", "&"},
            {">", ">"},
            {"<", "<"},
            {"⁄", "/"}
        };
        for (String[] ent : entities) {
            text = text.replace(ent[0], ent[1]);
        }
        return text;
    }
}
      
var entityParser = function(text) {
    const entities = {
        """: "\"",
        "'": "'",
        "&": "&",
        ">": ">",
        "<": "<",
        "⁄": "/"
    };
    for (let ent in entities) {
        // Replace all occurrences using split/join
        text = text.split(ent).join(entities[ent]);
    }
    return text;
};
      

Problem Description

The problem requires you to implement an HTML entity parser. You are given a string text that may contain certain HTML character entities (such as &quot; for " or &amp; for &). Your task is to replace every occurrence of these specific entities in the string with their corresponding characters, and return the resulting string.

The entities you need to handle are:

  • &quot; replaced by "
  • &apos; replaced by '
  • &amp; replaced by &
  • &gt; replaced by >
  • &lt; replaced by <
  • &frasl; replaced by /
Only these six entities need to be parsed and replaced. Other sequences that look similar should not be changed.

The key constraint is that all occurrences of these entities should be replaced in the order they appear, and only the above valid entities should be recognized.

Thought Process

The core of the problem is to search for specific substrings (the HTML entities) and replace them with their corresponding characters. At first glance, a brute-force approach might involve scanning the string repeatedly to find and replace each entity. However, this could be inefficient if not done carefully, especially if entities overlap or are nested.

Since only a small, fixed set of entities need to be recognized, and their replacements are always single characters, we can use a straightforward replacement strategy. The key is to ensure we do not accidentally replace similar-looking substrings that are not valid entities.

An efficient approach is to use a mapping (like a hash map or dictionary) from entity strings to their replacements, and then iterate over the input text, performing replacements for each entity. This avoids the complexity of parsing arbitrary HTML and focuses only on the known entities.

Solution Approach

Let's break down the solution step by step:

  1. Define the entity mapping:
    • Create a dictionary (or equivalent structure) that maps each entity string (like &quot;) to its replacement character (like ").
  2. Iterate and replace:
    • Loop over each entity in the mapping.
    • For each entity, replace all occurrences in the input text with its corresponding character.
    • This can be done using built-in string replacement methods, which are efficient for this use case.
  3. Return the result:
    • After all replacements are done, return the modified string as the answer.

This approach is simple, readable, and efficient, especially since the number of entities is very small.

Example Walkthrough

Let's walk through an example to see how the solution works.

Input: text = "x > y && x < y is always "false"."

  1. Start with the original text: x > y && x < y is always "false".
  2. Replace &quot; with ":
    • Now: x > y && x < y is always "false".
  3. Replace &apos; with ':
    • No change, since &apos; is not present.
  4. Replace &amp; with &:
    • Now: x > y && x < y is always "false".
  5. Replace &gt; with >:
    • Now: x > y && x < y is always "false".
  6. Replace &lt; with <:
    • Now: x > y && x < y is always "false".
  7. Replace &frasl; with /:
    • No change, since &frasl; is not present.
  8. Final Output: x > y && x < y is always "false".

Time and Space Complexity

Brute-force approach:

  • If you scanned the string character by character and tried to parse entities yourself, time complexity could be O(N * M), where N is the length of the string and M is the number of entities, especially if you repeatedly scan for each entity.
Optimized approach (current solution):
  • We loop through each entity (constant, only 6) and replace all occurrences in the string. Each replacement takes O(N), so total time is O(N), where N is the length of the input string.
  • Space complexity is O(N) due to the storage of the output string (since strings are immutable in many languages, replacements may create new strings).

The approach is efficient because the number of entities is small and each replacement is handled in linear time.

Summary

The HTML Entity Parser problem is a classic string manipulation task, where the challenge is to efficiently and accurately replace specific substrings (HTML entities) with their corresponding characters. By using a simple mapping and iterating over the fixed set of entities, we can solve the problem in linear time with minimal code. The solution avoids unnecessary complexity and leverages built-in string operations, making it both elegant and practical for real-world scenarios where only known entities need to be parsed.