929. Unique Email Addresses - Leetcode Solution

Code Implementation

class Solution:
    def numUniqueEmails(self, emails):
        unique_emails = set()
        for email in emails:
            local, domain = email.split('@')
            local = local.split('+')[0]
            local = local.replace('.', '')
            normalized = local + '@' + domain
            unique_emails.add(normalized)
        return len(unique_emails)

class Solution {
public:
    int numUniqueEmails(vector<string>& emails) {
        unordered_set<string> unique;
        for (const string& email : emails) {
            size_t at = email.find('@');
            string local = email.substr(0, at);
            string domain = email.substr(at);
            size_t plus = local.find('+');
            if (plus != string::npos)
                local = local.substr(0, plus);
            local.erase(remove(local.begin(), local.end(), '.'), local.end());
            unique.insert(local + domain);
        }
        return unique.size();
    }
};

class Solution {
    public int numUniqueEmails(String[] emails) {
        Set<String> unique = new HashSet<>();
        for (String email : emails) {
            String[] parts = email.split("@");
            String local = parts[0].split("\\+")[0].replace(".", "");
            String normalized = local + "@" + parts[1];
            unique.add(normalized);
        }
        return unique.size();
    }
}

var numUniqueEmails = function(emails) {
    const unique = new Set();
    for (let email of emails) {
        let [local, domain] = email.split("@");
        local = local.split("+")[0].replace(/\./g, "");
        unique.add(local + "@" + domain);
    }
    return unique.size;
};

Problem Description

You are given a list of email addresses as an array called emails. Every email address is composed of a local name and a domain name separated by the '@' sign.

There are two rules for normalizing emails:

Periods ('.') in the local name part are ignored.
Everything after a plus sign ('+') in the local name is ignored.

The domain name is left unchanged.

Your task is to find out how many unique email addresses there are after normalization. Uniqueness is determined by the normalized form of each email.

Constraints:

Each email will have exactly one '@' character.
All emails are valid according to the problem's definition.

Thought Process

At first glance, it might seem like we need to compare every email address to every other email to check for uniqueness, but the problem gives us a shortcut: normalization. By applying the given rules to each email, we can transform them into a standard form. If two emails normalize to the same string, they are considered the same.

A brute-force approach would involve comparing each email with every other, but that's inefficient. Instead, we can use a set (or hash set) to keep track of all the normalized emails. Sets automatically handle uniqueness for us, so we don't have to check manually.

The main challenge is to correctly parse and normalize each email according to the rules, and then count how many unique normalized emails we have.

Solution Approach

Let's break down the solution step by step:

Split Each Email: For every email in emails, split it into two parts at the '@' sign: the local name and the domain name.
Normalize the Local Name:
- If there is a '+' character in the local name, ignore everything after (and including) the plus sign.
- Remove all periods ('.') from the local name.
Recombine: Combine the normalized local name with the unchanged domain name, separated by '@', to get the normalized email.
Track Uniqueness: Insert each normalized email into a set. Since sets only keep unique items, duplicates are ignored automatically.
Count: The number of unique emails is simply the size of the set at the end.

We use a set (or hash set) because lookups and insertions are both O(1) on average, making this approach very efficient.

Example Walkthrough

Let's walk through an example with the following input:

["test.email+alex@leetcode.com", "test.e.mail+bob.cathy@leetcode.com", "testemail+david@lee.tcode.com"]

First email: "test.email+alex@leetcode.com"
- Local: "test.email+alex", Domain: "leetcode.com"
- Ignore after '+': "test.email"
- Remove '.': "testemail"
- Normalized: "testemail@leetcode.com"
Second email: "test.e.mail+bob.cathy@leetcode.com"
- Local: "test.e.mail+bob.cathy", Domain: "leetcode.com"
- Ignore after '+': "test.e.mail"
- Remove '.': "testemail"
- Normalized: "testemail@leetcode.com"
Third email: "testemail+david@lee.tcode.com"
- Local: "testemail+david", Domain: "lee.tcode.com"
- Ignore after '+': "testemail"
- Remove '.': "testemail" (no dots to remove)
- Normalized: "testemail@lee.tcode.com"

After normalization, we have two unique emails: "testemail@leetcode.com" and "testemail@lee.tcode.com". The answer is 2.

Time and Space Complexity

Brute-force approach: If we compared each email to every other, the time complexity would be O(N^2), where N is the number of emails.

Optimized approach:

Time Complexity: O(N * L), where N is the number of emails and L is the maximum length of an email. We process each character of each email exactly once.
Space Complexity: O(N * L) in the worst case, if all normalized emails are unique and stored in the set.

The use of a set ensures uniqueness with efficient O(1) insertions and lookups.

Summary

To solve the Unique Email Addresses problem, we normalize each email according to the specified rules and track the unique normalized emails using a set. This approach is efficient and elegant because it leverages string manipulation and the properties of sets to solve the problem in linear time. The key insight is that normalization reduces the problem to a simple uniqueness check, which sets handle naturally.

AlgoMap.io

Want Help Cracking FAANG?