At LinkedIn, SQL is used to analyze customer behavior patterns, such as identifying trends in job searches, user engagement, and content sharing, as well as to manage large datasets related to job markets, including job postings, resumes, and user profiles. This is the reason why LinkedIn often uses SQL questions in interviews for Data Analytics, Data Science, and Data Engineering jobs.
So, to help you prepare, here's 8 LinkedIn SQL interview questions – can you solve them?
Given a table of candidates and their skills, you're tasked with finding the candidates best suited for an open Data Science job. You want to find candidates who are proficient in Python, Tableau, and PostgreSQL.
Write a query to list the candidates who possess all of the required skills for the job. Sort the output by candidate ID in ascending order.
Column Name | Type |
---|---|
candidate_id | integer |
skill | varchar |
candidate_id | skill |
---|---|
123 | Python |
123 | Tableau |
123 | PostgreSQL |
234 | R |
234 | PowerBI |
234 | SQL Server |
345 | Python |
345 | Tableau |
candidate_id |
---|
123 |
To solve this LinkedIn SQL Interview Question go to our FREE interactive coding environment!
The LinkedIn Creator team is seeking out individuals who have a strong influence on the platform, utilizing their personal profiles as a company or influencer page. To identify such power creators, we can compare the number of followers on their LinkedIn page with the number of followers on the company they work for. If a person's LinkedIn page has more followers than their company, we consider them to be a power creator.
Write a query to retrieve the profile IDs of these LinkedIn power creators ordered in ascending order based on their IDs.
Column Name | Type |
---|---|
profile_id | integer |
name | string |
followers | integer |
employer_id | integer |
profile_id | name | followers | employer_id |
---|---|---|---|
1 | Nick Singh | 92,000 | 4 |
2 | Zach Wilson | 199,000 | 2 |
3 | Daliana Liu | 171,000 | 1 |
4 | Ravit Jain | 107,000 | 3 |
5 | Vin Vashishta | 139,000 | 6 |
6 | Susan Wojcicki | 39,000 | 5 |
Column Name | Type |
---|---|
company_id | integer |
name | string |
followers | integer |
company_id | name | followers |
---|---|---|
1 | The Data Science Podcast | 8,000 |
2 | Airbnb | 700,000 |
3 | The Ravit Show | 6,000 |
4 | DataLemur | 200 |
5 | YouTube | 1,6000,000 |
6 | DataScience.Vin | 4,500 |
profile_id |
---|
1 |
3 |
4 |
5 |
To see hints and additional solutions try this LinkedIn SQL Interview question on our interactive coging environment.
Stored procedures are a lot like functions in programming. They're used to encapsulate and organize business logic into one unit of code, and they can accept multiple input parameters and return multiple output values.
For example, if you were a Data Analyst at LinkedIn working on a HR analytics project, you might create a stored procedure to calculate the average salary for a given department:
To call this stored procedure and find the average salary for the Data Analytics department you'd write the following query:
As a data analyst for LinkedIn, you are given a dataset containing details about user activity on job postings. Your task is to use SQL and write a query that calculates the average number of job applications by users for each month of every year.
Assume you have been provided with the following table:
application_id | user_id | job_id | apply_date |
---|---|---|---|
6871 | 876 | 4051 | 02/05/2022 |
9862 | 375 | 9562 | 02/11/2022 |
3793 | 962 | 3861 | 02/17/2022 |
8752 | 792 | 4051 | 03/26/2022 |
6217 | 481 | 9562 | 03/05/2022 |
Your output should be a table with the following format:
year | month | average_applications |
---|---|---|
2022 | 2 | 3 |
2022 | 3 | 2 |
Formulate a PostgreSQL query to solve the problem.
This query calculates the average number of job applications per month by each user. It uses the Window function 'COUNT' with 'PARTITION BY' to count the number of job applications and unique users for each year and month. The count of applications is then divided by the count of unique users to get the average applications.
For more window function practice, try this Uber SQL Interview Question on DataLemur's online SQL code editor:
Both left and right joins in SQL allow you to combine data from different tables based on a shared key or set of keys. For a concrete example of the difference between these two join types, say you had sales data exported from LinkedIn's Salesforce CRM stored in a PostgreSQL database, and had access to two tables: and .
: retrieves all rows from the left table (in this case, the sales table) and any matching rows from the right table (the table). If there is no match in the right table, NULL values will be returned for the right table's columns.
: retrieves all rows from the right table (in this case, the customers table) and any matching rows from the left table (the sales table). If there is no match in the left table, NULL values will be returned for the left table's columns.
You are provided with two tables namely and . The table contains data about each user including their and . The table contains information about the connections made by the users, including a , and .
Find the average number of connections per user on LinkedIn.
user_id | name |
---|---|
1 | John Smith |
2 | Jane Doe |
3 | Mark Johnson |
4 | Sarah Robinson |
connection_id | user_id | connected_user_id |
---|---|---|
101 | 1 | 2 |
102 | 1 | 3 |
103 | 1 | 4 |
104 | 2 | 1 |
105 | 2 | 3 |
106 | 3 | 1 |
107 | 3 | 2 |
108 | 3 | 4 |
109 | 4 | 1 |
PostgreSQL Query:
This query first counts the number of connections each user has by grouping the table by and counting the resulting groups. Then, it calculates the average of these counts to find the average number of connections per user. This result is then returned as .
To practice a very similar question try this interactive LinkedIn Duplicate Job Listings Question which is similar for manipulating LinkedIn data to find aggregated counts or this Twitter Histogram of Tweets Question which is similar for calculating user-based metrics.
A correlated sub-query is one that is linked to the outer query and cannot be executed on its own. It uses the outer query to filter or transform data by referencing a column from the outer query, while the outer query uses the results of the inner query. On the other hand, a non-correlated sub-query is independent of the outer query and can be executed on its own. It does not reference any columns from the outer query and is used to retrieve additional data for the outer query.
Correlated sub-queries are slower to execute, as they have to be re-evaluated for each row of the outer query, while non-correlated sub-queries are faster, as they only have to be executed once.
Consider the following scenario for LinkedIn. As part of LinkedIn's recruitment analytics, they often need to find profiles where users have mentioned a specific skill in their profiles. This information is mostly used by the marketing team to strategize their campaigns or for recruiters wanting to find suitable candidates for a particular job.
For this question, write a SQL query that filters the User Profiles database to find profiles that contain "Python" as a skill. The "skills" column consists of a comma-separated string of various skills that a user has added to his/her profile.
user_id | first_name | last_name | skills |
---|---|---|---|
1001 | John | Doe | SQL, Python, Machine Learning |
1002 | Jane | Doe | Java, C++, Python |
1003 | Bob | Allen | C++, Java |
1004 | Alice | Johnson | Python, Machine Learning |
1005 | Sam | Green | HTML, CSS, JavaScript |
user_id | first_name | last_name | skills |
---|---|---|---|
1001 | John | Doe | Python |
1002 | Jane | Doe | Python |
1004 | Alice | Johnson | Python |
Here's a PostgreSQL query that solves this problem:
This query uses the keyword in its clause to filter rows where the column contains "Python". This is done using the pattern which matches any string that contains the substring "Python". The character is a wildcard that matches any sequence of characters, including an empty sequence. Therefore, if "Python" is anywhere in the skills list, that record will be returned by the SELECT statement.
The key to acing a LinkedIn SQL interview is to practice, practice, and then practice some more! Beyond just solving the above LinkedIn SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Microsoft, Google, and Meta.
Each exercise has hints to guide you, fully explained answers along with a discussion board to see how others solved it and best of all, there's an online SQL coding environment so you can easily right in the browser your query and have it graded.
To prep for the LinkedIn SQL interview it is also useful to solve SQL problems from other big-tech companies like:
Stay ahead of the competition with the latest AI-powered recruitment tools and trends on LinkedIn!
In case your SQL coding skills are weak, forget about diving straight into solving questions – refresh your SQL knowledge with this SQL tutorial for Data Analytics.
This tutorial covers SQL topics like sorting data with ORDER BY and handling NULLs in SQL – both of which pop up often during SQL interviews at LinkedIn.
In addition to SQL query questions, the other types of questions tested in the LinkedIn Data Science Interview include:
To prepare for LinkedIn Data Science interviews read the book Ace the Data Science Interview because it's got:
Don't forget about the behavioral interview – prepare for it with this behavioral interview question bank.