At Okta, SQL is used quite frequently for analyzing user authentication data, and generating reports as part of their identity & security analytics service. Because of this, Okta almost always asks SQL query questions in interviews for Data Science, Analytics, and & Data Engineering jobs.
As such, to help you practice for the Okta SQL interview, here’s 8 Okta SQL interview questions – how many can you solve?
Your company, Okta, is a leading identity management system and provides software for user authentication. In this case, you have a table of user login dates called and the goal is to calculate the rolling 7-day average of daily accesses per user.
The table has the following fields:
Here is some sample data for the table:
user_id | login_date |
---|---|
101 | 2022-09-04 |
102 | 2022-09-04 |
101 | 2022-09-05 |
101 | 2022-09-05 |
102 | 2022-09-07 |
101 | 2022-09-08 |
102 | 2022-09-09 |
101 | 2022-09-09 |
The query should return the user_id, login_date and the 7-day rolling average of user logins on and before that date. An output example would be:
user_id | login_date | avg_login_7_days |
---|---|---|
101 | 2022-09-04 | 1.00 |
101 | 2022-09-05 | 1.50 |
101 | 2022-09-08 | 1.33 |
101 | 2022-09-09 | 1.50 |
102 | 2022-09-04 | 1.00 |
102 | 2022-09-07 | 1.00 |
102 | 2022-09-09 | 1.00 |
Here is a solution in PostgreSQL:
The inner query groups the table by and to count the number of logins per user per day. The outer query then calculates a rolling 7-day average of these counts using a window function. The window function is partitioned by and orders by to ensure that the averages are calculated separately for each user and only include dates on or before the current row. The clause in the window function specifies a 7-day window.
p.s. Window functions show up super often during SQL interviews, so practice the 27+ window function questions on DataLemur
Okta is a company that provides identity and access management solutions for enterprise users. As a software engineer at Okta, our team is designing a feature to keep track of all the services that each enterprise user has access to, and we need to make sure our database can handle this data in the most efficient manner.
Suppose you have two tables, , which keeps track of all the enterprise users, and , which keeps track of all the services provided by Okta.
user_id | username | firm_name | signup_date |
---|---|---|---|
1 | johndoe | Acme Corp | 2022-01-01 |
2 | janedoe | Beta Inc | 2022-02-05 |
3 | billsmith | Gamma LLC | 2022-03-15 |
service_id | service_name |
---|---|
1 | Single Sign-On |
2 | Adaptive MFA |
3 | API Access Management |
We also have a junction table, , which records which services each user has access to.
user_id | service_id | grant_date |
---|---|---|
1 | 1 | 2022-01-02 |
1 | 3 | 2022-03-01 |
2 | 1 | 2022-02-06 |
2 | 3 | 2022-04-01 |
3 | 2 | 2022-03-20 |
The question is: Write a PostgreSQL query that will return the usernames and firm names of all users who have been granted access to the "Single Sign-On" service but not the "API Access Management" service.
Here is the PostgreSQL query that would return the desired result:
This query firstly joins the , , and tables together to get a complete view of which user has access to which service. It then filters for users that have the 'Single Sign-On' service. The subquery then finds the users who have the 'API Access Management' service, and the outer query ensures that these users are excluded from the final results.
Normalization involves breaking up your tables into smaller, more specialized ones and using primary and foreign keys to define relationships between them. Not only does this make your database more flexible and scalable, it also makes it easier to maintain. Plus, normalization helps to keep your data accurate by reducing the chance of inconsistencies and errors.
The only downside is now is that your queries will involve more joins, which are slow AF and often a DB performance botteleneck.
Okta is a company that provides identity and access management solutions. One important metric for them could be the click-through rate from viewing a product description to requesting a demo.
Here is your task:
Given a table that logs every time a user views a product page and a table that logs when a user requests a demo, write a query to find the click-through rate for each product. Assume that a product is uniquely identified by a .
view_id | user_id | view_date | product_id |
---|---|---|---|
120 | 1 | 01/01/2022 00:00:00 | 111 |
121 | 2 | 01/01/2022 00:00:00 | 112 |
122 | 3 | 01/02/2022 00:00:00 | 111 |
123 | 4 | 01/03/2022 00:00:00 | 113 |
124 | 5 | 01/03/2022 00:00:00 | 112 |
request_id | user_id | request_date | product_id |
---|---|---|---|
1 | 1 | 01/01/2022 00:05:00 | 111 |
2 | 2 | 01/02/2022 00:00:00 | 112 |
3 | 5 | 01/03/2022 00:15:00 | 112 |
This query joins the table with the table on the and . Then it counts unique users who requested a demo for each product divided by the total number of unique users who viewed that product. This provides a click-through rate for each product.
To practice another question about calculating rates, solve this TikTok SQL question within DataLemur's online SQL coding environment:
A database view is a virtual table that is created based on the results of a SELECT statement, and provides you a customized, read-only version of your data that you can query just like a regular table.
You'd want to use a view for a few reasons:
As an Okta administrator, you are tasked with observing the company's growth in terms of monthly new user creation. Okta being an identity management product, its crucial to track user growth. Write a SQL query to find the average number of users created each month.
user_id | user_created_date |
---|---|
1 | 2022-06-01 |
2 | 2022-06-04 |
3 | 2022-07-01 |
4 | 2022-07-03 |
5 | 2022-07-22 |
6 | 2022-07-23 |
7 | 2022-07-30 |
8 | 2022-08-01 |
9 | 2022-08-04 |
10 | 2022-08-20 |
year_month | avg_users_created |
---|---|
2022-06 | 2 |
2022-07 | 4 |
2022-08 | 3 |
With PostgreSQL, you can use the function to truncate the date to the year-month format and the function to get the total number of users created in each month.
Here's the SQL query you would use:
This query groups the data by the month and year of user creation, then counts the number of users created in each of those groupings. The results are then sorted by date for easier reading. This query helps Okta to track their growth and understand the creation of new users by month.
While knowing this answer is beyond the scope of most Data Analyst & Data Science interviews, Data Engineers at Okta should be at least aware of SQL vs. NoSQL databases.
Relational databases and non-relational (NoSQL) databases have some key differences, particularly in terms of how data is stored. Whereas relational databases have tables, made up of rows and columns, NoSQL databases use various data models like:
This added flexibility makes NoSQL databases well-suited for handling non-tabular data or data with a constantly changing format. However, this flexibility comes at the cost of ACID compliance, which is a set of properties (atomic, consistent, isolated, and durable) that ensure the reliability and integrity of data in a database. While most relational databases are ACID-compliant, NoSQL databases may not provide the same level of guarantees.
You are given a customer records database for Okta, a software company that provides identity and access management services. You are specifically interested in customer email addresses that feature the domain and are told to filter down the records to this end.
Your database is structured as follows:
customer_id | first_name | last_name | |
---|---|---|---|
001 | James | Smith | james.smith@okta.com |
002 | Lisa | Johnson | lisa.j@otherdomain.com |
003 | Robert | Williams | r.williams@okta.com |
004 | Maria | Brown | m.brown@abc.com |
005 | Charlie | Jones | charlie.okta@okta.com |
The task at hand is to write a SQL query to filter the customer records that have their email domain as .
This query utilizes the operator in SQL to filter out rows from the 'Customers' table where the 'email' column contains the string '@okta.com'. The percentage symbols around '@okta.com' in the query are wildcards that match any sequence of characters. Therefore, this query will return all entries where the email address contains '@okta.com' anywhere in the string.
Expected Output:
customer_id | first_name | last_name | |
---|---|---|---|
001 | James | Smith | james.smith@okta.com |
003 | Robert | Williams | r.williams@okta.com |
005 | Charlie | Jones | charlie.okta@okta.com |
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Besides solving the earlier Okta SQL interview questions, you should also solve the 200+ SQL exercises on DataLemur which come from companies like Microsoft, Google, Amazon, and tech startups.
Each exercise has multiple hints, full answers and crucially, there is an interactive SQL code editor so you can right in the browser run your query and have it checked.
To prep for the Okta SQL interview it is also wise to practice SQL problems from other tech companies like:
However, if your SQL coding skills are weak, don't worry about jumping right into solving questions – go learn SQL with this free SQL for Data Analytics course.
Beyond writing SQL queries, the other types of questions tested in the Okta Data Science Interview are:
To prepare for Okta Data Science interviews read the book Ace the Data Science Interview because it's got: