Back to questions
This is the same question as problem #8 in the SQL Chapter of Ace the Data Science Interview!
Assume you are given the table below that shows job postings for all companies on the LinkedIn platform. Write a query to get the number of companies that have posted duplicate job listings.
Clarification:
Column Name | Type |
---|---|
job_id | integer |
company_id | integer |
title | string |
description | string |
job_id | company_id | title | description |
---|---|---|---|
248 | 827 | Business Analyst | Business analyst evaluates past and current business data with the primary goal of improving decision-making processes within organizations. |
149 | 845 | Business Analyst | Business analyst evaluates past and current business data with the primary goal of improving decision-making processes within organizations. |
945 | 345 | Data Analyst | Data analyst reviews data to identify key insights into a business's customers and ways the data can be used to solve problems. |
164 | 345 | Data Analyst | Data analyst reviews data to identify key insights into a business's customers and ways the data can be used to solve problems. |
172 | 244 | Data Engineer | Data engineer works in a variety of settings to build systems that collect, manage, and convert raw data into usable information for data scientists and business analysts to interpret. |
co_w_duplicate_jobs |
---|
1 |
Because job IDs 945 and 164 are at the same company (345), and the jobs have the same title and description, there is exactly one company with a duplicate job.
The dataset you are querying against may have different input & output - this is just an example!
The first step to solving this LinkedIn question correctly is connecting with me on LinkedIn 🥺
But seriously, the first step is to find all the companies with job listings that has the same title and description. We can do that by ing the number of s grouped by , and .
Output (showing first 5 rows with total of 7 rows):
company_id | title | description | job_count |
---|---|---|---|
827 | Data Scientist | Data scientist uses data to understand and explain the phenomena around them, and help organizations make better decisions. | 2 |
244 | Data Engineer | Data engineer works in a variety of settings to build systems that collect, manage, and convert raw data into usable information for data scientists and business analysts to interpret. | 1 |
845 | Business Analyst | Business analyst evaluates past and current business data with the primary goal of improving decision-making processes within organizations. | 1 |
244 | Software Engineer | Software engineers design and create computer systems and applications to solve real-world problems. | 2 |
345 | Data Analyst | Data analyst reviews data to identify key insights into a business's customers and ways the data can be used to solve problems. | 2 |
Next, we convert the previous query into a CTE and filter for when is more than 1 meaning we only want where there are 2 or more duplicate job listings. Then, we apply a on to get the unique and count them.
Results:
co_w_duplicate_jobs |
---|
3 |
Solution #1: Using CTE
Solution #2: Using Subquery
PostgreSQL 14