class Solution:
def entityParser(self, text: str) -> str:
entities = {
""": '"',
"'": "'",
"&": "&",
">": ">",
"<": "<",
"⁄": "/"
}
# Replace < with < (not itself)
entities["<"] = "<"
# Iterate and replace all entities
for ent, ch in entities.items():
text = text.replace(ent, ch)
return text
class Solution {
public:
string entityParser(string text) {
vector<pair<string, string>> entities = {
{""", "\""},
{"'", "'"},
{"&", "&"},
{">", ">"},
{"<", "<"},
{"⁄", "/"}
};
for (auto &ent : entities) {
size_t pos = 0;
while ((pos = text.find(ent.first, pos)) != string::npos) {
text.replace(pos, ent.first.length(), ent.second);
pos += ent.second.length();
}
}
return text;
}
};
class Solution {
public String entityParser(String text) {
String[][] entities = {
{""", "\""},
{"'", "'"},
{"&", "&"},
{">", ">"},
{"<", "<"},
{"⁄", "/"}
};
for (String[] ent : entities) {
text = text.replace(ent[0], ent[1]);
}
return text;
}
}
var entityParser = function(text) {
const entities = {
""": "\"",
"'": "'",
"&": "&",
">": ">",
"<": "<",
"⁄": "/"
};
for (let ent in entities) {
// Replace all occurrences using split/join
text = text.split(ent).join(entities[ent]);
}
return text;
};
The problem requires you to implement an HTML entity parser. You are given a string text
that may contain certain HTML character entities (such as "
for "
or &
for &
). Your task is to replace every occurrence of these specific entities in the string with their corresponding characters, and return the resulting string.
The entities you need to handle are:
"
replaced by "
'
replaced by '
&
replaced by &
>
replaced by >
<
replaced by <
⁄
replaced by /
The key constraint is that all occurrences of these entities should be replaced in the order they appear, and only the above valid entities should be recognized.
The core of the problem is to search for specific substrings (the HTML entities) and replace them with their corresponding characters. At first glance, a brute-force approach might involve scanning the string repeatedly to find and replace each entity. However, this could be inefficient if not done carefully, especially if entities overlap or are nested.
Since only a small, fixed set of entities need to be recognized, and their replacements are always single characters, we can use a straightforward replacement strategy. The key is to ensure we do not accidentally replace similar-looking substrings that are not valid entities.
An efficient approach is to use a mapping (like a hash map or dictionary) from entity strings to their replacements, and then iterate over the input text, performing replacements for each entity. This avoids the complexity of parsing arbitrary HTML and focuses only on the known entities.
Let's break down the solution step by step:
"
) to its replacement character (like "
).This approach is simple, readable, and efficient, especially since the number of entities is very small.
Let's walk through an example to see how the solution works.
Input: text = "x > y && x < y is always "false"."
x > y && x < y is always "false".
"
with "
:
x > y && x < y is always "false".
'
with '
:
'
is not present.&
with &
:
x > y && x < y is always "false".
>
with >
:
x > y && x < y is always "false".
<
with <
:
x > y && x < y is always "false".
⁄
with /
:
⁄
is not present.x > y && x < y is always "false".
Brute-force approach:
The approach is efficient because the number of entities is small and each replacement is handled in linear time.
The HTML Entity Parser problem is a classic string manipulation task, where the challenge is to efficiently and accurately replace specific substrings (HTML entities) with their corresponding characters. By using a simple mapping and iterating over the fixed set of entities, we can solve the problem in linear time with minimal code. The solution avoids unnecessary complexity and leverages built-in string operations, making it both elegant and practical for real-world scenarios where only known entities need to be parsed.