Back to questions

Histogram of Tweets Twitter SQL Interview Question

Histogram of Tweets

Twitter SQL Interview Question

This is the same question as problem #6 in the SQL Chapter of Ace the Data Science Interview!

Assume you're given a table Twitter tweet data, write a query to obtain a histogram of tweets posted per user in 2022. Output the tweet count per user as the bucket and the number of Twitter users who fall into that bucket.

In other words, group the users by the number of tweets they posted in 2022 and count the number of users in each group.

Table:

Column Name	Type
tweet_id	integer
user_id	integer
msg	string
tweet_date	timestamp

Example Input:

tweet_id	user_id	msg	tweet_date
214252	111	Am considering taking Tesla private at $420. Funding secured.	12/30/2021 00:00:00
739252	111	Despite the constant negative press covfefe	01/01/2022 00:00:00
846402	111	Following @NickSinghTech on Twitter changed my life!	02/14/2022 00:00:00
241425	254	If the salary is so competitive why won’t you tell me what it is?	03/01/2022 00:00:00
231574	148	I no longer have a manager. I can't be managed	03/23/2022 00:00:00

Example Output:

tweet_bucket	users_num
1	2
2	1

Explanation:

Based on the example output, there are two users who posted only one tweet in 2022, and one user who posted two tweets in 2022. The query groups the users by the number of tweets they posted and displays the number of users in each group.

The dataset you are querying against may have different input & output - this is just an example!

Solution

First, we need to find the number of tweets posted by each user in 2022 by grouping the tweet records by user ID and counting the tweets.

The output shows the number of tweets posted by each user in 2022:

user_id	tweet_count_per_user
111	2
148	1
254	1

Based on the output, we can infer that in the year 2022, user 111 has posted two tweets, while users 148 and 254 have only posted one tweet each.

Next, we use the query above as a subquery, then we use the field as the tweet bucket and retrieve the number of users.

This query generates a histogram of the number of tweets per user in 2022. The output shows the tweet count per user as the tweet bucket and the number of Twitter users who fall into that bucket.

tweet_bucket	users_num
1	2
2	1

Solution #2: Using CTE

Alternatively, we can use a Common Table Expression (CTE) instead of a subquery to compute the tweet counts.

Sourced from

Twitter

Difficulty

Easy

Input

Output