Back to questions

Given an input list containing $n$ distinct numbers in the range 0 to $n$, return the only number in the range that is missing from the list.

For example, given , return . Because the input list has 3 elements in it, we expect to see the numbers 0 to 3 in there, but 2 is missing.

Another example: given , return . We return becuase the input list has 4 elements in it, so we expect to see the numbers 0 to 4 in there, but 0 itself is missing!

This is a fun interview problem, because it's simple to answer, but most people can't think of multiple ways to solve it.

If you breezed through this problem, can you think of ways to optimize the space or run-time complexity by utilizing a different approach? Or find a way to use Gauss's summation formula?

We can sort the input list, and then iterate through the sorted input. At the $i^{th}$ spot in the sorted input, we'd expect to see the value of $i$.

If that condition doesn't hold, then $i$ is our missing integer! And if we get to the end of the list, then the missing value is the last integer we'd expect (the legnth of the input list).

Sorting the input list has a time complexity of $O(n*log{n})$.

**Can you think of a more efficient solution, that runs faster (but might have to use more space)?**

We can first stash the entire input into a set. Then, we can iterate through all the expected numbers (the size of the input list + 1) and if it doesn't occur in the we've found our missing number:

This is efficient because Python sets are implemented as hash tables, similar to dictionaries, which use a hash function to map elements to specific buckets in the table. This hashing allows for very efficient lookup times because it reduces the search space to a specific bucket, making it possible to find an element in constant time on average ($O(1)$).

Overall, for this solution, the time complexity is $O(n)$ due to the linear pass through all $n$ integers. The space complexity is also $O(n)$ due to the hash set we are using.

Johann Karl Friedrich Gauss, also known as 'Top G', came up with a formula in elementary school to sum up the numbers $1, 2, 3, ..n$.

He invented this formula after the teacher tried to give him busy-work, and asked him to sum up the first 100 numbers.

Yung Gauss, like the boss he is, came up with this nifty formula instead: $1 + 2 + 3 + ... n = \frac{n * (n+1)}{2}$

No wonder he's the Top G of math.

*Okay, story time over*.

**Can you see the connection between this formula, and the missing integer problem at hand?**

We know that if the input list was $n$ elements long, and had all integers $0$ to $n$ inside it, the sum would be $\frac{n * (n+1)}{2}$. So, we can just compare the sum we expect from Gauss's formula, and the actual sum – the difference would reveal which number was missing.

Remember – we are Data Scientists – not Software Engineers. Coming up with a clever, mathematical-based solution like this during an onsite interview is an easy way to win the hearts and minds of your hiring committee.

Python