The "Active Users" problem asks you to determine the number of users who were active in your system within a given period. You are given a list of user activity logs, where each log contains a user_id
and a timestamp
(representing when the user was active). Your task is to find out how many unique users were active at least once in the last k
days, given a reference day today
.
Key constraints:
(user_id, timestamp)
.today - k + 1
to today
(inclusive).
The first idea that comes to mind is to check every log and see if its timestamp falls within the last k
days, then collect the user_id
values. However, if we simply collect all logs and then count, we might double-count users active on multiple days.
To avoid double-counting, we need a way to keep track of unique users. Using a set data structure is ideal, since it automatically ignores duplicates. That way, every time we find a log within the window, we add the user_id
to the set. The final answer is the size of the set.
Brute-force would be to scan all logs, check the timestamp, and add to a set. But if the data is huge, can we do better? The main optimization is to only look at logs within the date window and ignore the rest as early as possible.
Let's break down the steps to solve this problem efficiently:
user_id
values.
today - k + 1
to today
, inclusive.
timestamp
is within the active window.
user_id
to the set.
Why use a set? A set gives O(1) average time complexity for add and check operations, and automatically handles duplicates.
Why iterate through all logs? We must check each log, since any could fall within the window and represent a unique user.
Let's say we have the following logs:
(1, 10)
(2, 11)
(1, 12)
(3, 8)
(2, 12)
today = 12
and k = 3
. The active window is 10
to 12
(inclusive).
(1, 10)
is in the window. Add user 1 to the set.(2, 11)
is in the window. Add user 2 to the set.(1, 12)
is in the window. User 1 is already in the set, so no change.(3, 8)
is not in the window. Ignore.(2, 12)
is in the window. User 2 is already in the set, so no change.The set now contains {1, 2}. The answer is 2.
Brute-force approach:
The solution is optimal because we cannot avoid looking at every log (they could all be in the window), and the set only grows as large as the number of unique users.
The "Active Users" problem is efficiently solved by using a set to track unique users who were active in a given date window. By scanning each log and adding user IDs from qualifying timestamps, we avoid double-counting and achieve optimal time and space complexity. The elegance comes from leveraging the set data structure, which handles duplicates and provides fast lookups, making the solution both simple and robust.
def count_active_users(logs, k, today):
"""
logs: List of tuples (user_id, timestamp)
k: integer, number of days for the window
today: integer, the reference day
"""
active_users = set()
window_start = today - k + 1
for user_id, timestamp in logs:
if window_start <= timestamp <= today:
active_users.add(user_id)
return len(active_users)
#include <vector>
#include <unordered_set>
using namespace std;
int countActiveUsers(vector<pair<int, int>>& logs, int k, int today) {
unordered_set<int> activeUsers;
int windowStart = today - k + 1;
for (auto& log : logs) {
int userId = log.first;
int timestamp = log.second;
if (timestamp >= windowStart && timestamp <= today) {
activeUsers.insert(userId);
}
}
return activeUsers.size();
}
import java.util.*;
public class Solution {
public int countActiveUsers(int[][] logs, int k, int today) {
Set<Integer> activeUsers = new HashSet<>();
int windowStart = today - k + 1;
for (int[] log : logs) {
int userId = log[0];
int timestamp = log[1];
if (timestamp >= windowStart && timestamp <= today) {
activeUsers.add(userId);
}
}
return activeUsers.size();
}
}
function countActiveUsers(logs, k, today) {
const activeUsers = new Set();
const windowStart = today - k + 1;
for (const [userId, timestamp] of logs) {
if (timestamp >= windowStart && timestamp <= today) {
activeUsers.add(userId);
}
}
return activeUsers.size;
}