433. Minimum Genetic Mutation - Leetcode Solution

Code Implementation

from collections import deque

class Solution:
    def minMutation(self, start: str, end: str, bank: list[str]) -> int:
        bank_set = set(bank)
        if end not in bank_set:
            return -1
        
        queue = deque()
        queue.append((start, 0))
        genes = ['A', 'C', 'G', 'T']
        
        while queue:
            current, steps = queue.popleft()
            if current == end:
                return steps
            for i in range(len(current)):
                for g in genes:
                    if g != current[i]:
                        mutated = current[:i] + g + current[i+1:]
                        if mutated in bank_set:
                            queue.append((mutated, steps + 1))
                            bank_set.remove(mutated)
        return -1

#include <string>
#include <vector>
#include <unordered_set>
#include <queue>
using namespace std;

class Solution {
public:
    int minMutation(string start, string end, vector<string>& bank) {
        unordered_set<string> bankSet(bank.begin(), bank.end());
        if (bankSet.find(end) == bankSet.end()) return -1;
        queue<pair<string, int>> q;
        q.push({start, 0});
        char genes[4] = {'A', 'C', 'G', 'T'};
        
        while (!q.empty()) {
            auto [current, steps] = q.front(); q.pop();
            if (current == end) return steps;
            for (int i = 0; i < current.size(); ++i) {
                char old = current[i];
                for (char g : genes) {
                    if (g != old) {
                        current[i] = g;
                        if (bankSet.count(current)) {
                            q.push({current, steps + 1});
                            bankSet.erase(current);
                        }
                    }
                }
                current[i] = old;
            }
        }
        return -1;
    }
};

import java.util.*;

class Solution {
    public int minMutation(String start, String end, String[] bank) {
        Set<String> bankSet = new HashSet<>(Arrays.asList(bank));
        if (!bankSet.contains(end)) return -1;
        Queue<String> queue = new LinkedList<>();
        queue.offer(start);
        int steps = 0;
        char[] genes = new char[]{'A','C','G','T'};
        
        while (!queue.isEmpty()) {
            int size = queue.size();
            for (int i = 0; i < size; i++) {
                String current = queue.poll();
                if (current.equals(end)) return steps;
                char[] currArr = current.toCharArray();
                for (int j = 0; j < currArr.length; j++) {
                    char old = currArr[j];
                    for (char g : genes) {
                        if (g != old) {
                            currArr[j] = g;
                            String mutated = new String(currArr);
                            if (bankSet.contains(mutated)) {
                                queue.offer(mutated);
                                bankSet.remove(mutated);
                            }
                        }
                    }
                    currArr[j] = old;
                }
            }
            steps++;
        }
        return -1;
    }
}

/**
 * @param {string} start
 * @param {string} end
 * @param {string[]} bank
 * @return {number}
 */
var minMutation = function(start, end, bank) {
    let bankSet = new Set(bank);
    if (!bankSet.has(end)) return -1;
    const genes = ['A','C','G','T'];
    let queue = [[start, 0]];
    while (queue.length) {
        let [current, steps] = queue.shift();
        if (current === end) return steps;
        for (let i = 0; i < current.length; i++) {
            for (let g of genes) {
                if (g !== current[i]) {
                    let mutated = current.slice(0, i) + g + current.slice(i+1);
                    if (bankSet.has(mutated)) {
                        queue.push([mutated, steps + 1]);
                        bankSet.delete(mutated);
                    }
                }
            }
        }
    }
    return -1;
};

Problem Description

You are given three inputs:

start: A string representing the starting gene sequence (e.g., "AACCGGTT").
end: The target gene sequence you want to reach.
bank: An array of valid gene sequences, where each string is 8 characters long and uses only the letters 'A', 'C', 'G', and 'T'.

Each mutation changes exactly one character in the gene sequence, and each intermediate sequence must be in the bank. Your task is to find the minimum number of mutations needed to mutate from start to end. If it is not possible, return -1.

Key constraints:

Each mutation must be a valid string from the bank.
You may not reuse sequences from the bank once visited.
There may be no valid solution, in which case you must return -1.

Thought Process

At first glance, the problem seems similar to finding the shortest path in a maze, where each "room" is a gene sequence and "doors" connect sequences that differ by one character and are present in the bank.

A brute-force approach would be to try all possible mutation paths, but this quickly becomes infeasible as the number of possible sequences grows exponentially. Instead, we notice that we are looking for the shortest transformation, which is a classic scenario for Breadth-First Search (BFS).

The challenge then becomes: how do we efficiently generate valid mutations, avoid cycles (reusing sequences), and ensure we do not waste time on invalid paths?

Solution Approach

We use Breadth-First Search (BFS) to systematically explore all possible gene mutations, level by level, to find the shortest path from start to end:

Represent the bank as a set: This allows O(1) lookups to check if a mutation is valid.
Use a queue for BFS: Each element in the queue is a tuple (current_sequence, mutation_steps_so_far).
For each sequence, generate all possible single-character mutations:
- For each position in the string, try replacing it with 'A', 'C', 'G', or 'T' (except the current character).
- If the new sequence is in the bank set, enqueue it and remove it from the set (to prevent revisiting).
Stop when you reach the end sequence: Return the number of mutation steps taken.
If the queue is exhausted without finding the end: Return -1, as no mutation path exists.

We use a set for the bank to allow fast checking and removal, ensuring each sequence is visited at most once.

Example Walkthrough

Let's use this example:

start = "AACCGGTT"
end = "AAACGGTA"
bank = ["AACCGGTA", "AACCGCTA", "AAACGGTA"]

Step-by-step:

Start at "AACCGGTT" (0 mutations).
Mutate one letter at a time, checking if the result is in the bank:
- "AACCGGTA" is one mutation away and in the bank. Enqueue ("AACCGGTA", 1).
Next, process "AACCGGTA" (1 mutation):
- "AAACGGTA" is one mutation away and in the bank. Enqueue ("AAACGGTA", 2).
- "AACCGCTA" is also one mutation away and in the bank, but does not lead directly to the end.
Process "AAACGGTA" (2 mutations):
- This matches the end sequence. Return 2.

Therefore, the minimum number of mutations is 2.

Time and Space Complexity

Brute-force approach:

Time: O(4^L * L!), where L is the gene length, due to the vast number of possible sequences and permutations.
Space: O(4^L), for storing all possible sequences in the worst case.

Optimized BFS approach:

Time: O(N * L * 4), where N is the number of genes in the bank and L is the length of the gene. For each of up to N sequences, we try mutating each position (L positions) to 3 other letters (since we skip the current one).
Space: O(N), since we store the bank set and the BFS queue can be at most O(N) in size.

The BFS approach is efficient and practical for the input constraints.

Summary

The Minimum Genetic Mutation problem is a shortest-path search where each gene sequence is a node and valid single-character mutations are edges. By using BFS, a set for fast lookup, and careful mutation generation, we efficiently find the minimum number of mutations or determine if the transformation is impossible. The approach is both elegant and practical, leveraging classic graph traversal techniques for a problem framed in genetics.

AlgoMap.io

Want Help Cracking FAANG?