At Arista Networks, SQL crucial for analyzing network data trends, so much so that they brag about being a leader in "Data-Driven Cloud Networking".
Because of this, Arista Networks almost always asks SQL problems during Data Science, Analytics, and & Data Engineering job interviews.
So, if you're preparing for a SQL Assessment, here’s 10 Arista Networks SQL interview questions to practice – able to answer them all?
Arista Networks is interested in analyzing their product sales. They would like you to write a SQL query to calculate the total monthly sales and quantity sold for each product.
Given two tables and :
sales_id | sale_date | product_id | quantity | sale_price |
---|---|---|---|---|
101 | 01/03/2021 | 301 | 10 | 100 |
102 | 01/05/2021 | 302 | 5 | 200 |
103 | 02/10/2021 | 301 | 15 | 100 |
104 | 02/20/2021 | 302 | 7 | 200 |
105 | 03/15/2021 | 301 | 20 | 100 |
product_id | product_name |
---|---|
301 | Switch |
302 | Router |
Arista wants the output to present the product name, month_of_sale (in 'yyyymm' format), total_quantity_sold, and total_sales_revenue. Sort the data by product name and then month of sale.
product_name | month_of_sale | total_quantity_sold | total_sales_revenue |
---|---|---|---|
Router | 202102 | 7 | 1400 |
Router | 202105 | 5 | 1000 |
Switch | 202101 | 10 | 1000 |
Switch | 202102 | 15 | 1500 |
Switch | 202103 | 20 | 2000 |
This PostgreSQL query joins the two tables and on . It then groups the result by and . For each group, it calculates the sum of the to find the total quantity sold and the sum of the product of and to find the total sales revenue. The function is used to format the into 'yyyymm' format. The final result is sorted by and .
To practice a related window function SQL problem on DataLemur's free online SQL coding environment, try this Amazon SQL question asked in a BI Engineer interview:
Arista Networks deploys a vast number of networking devices across its global infrastructure. Each device generates a significant amount of diagnostic data. Suppose you are tasked to design a PostgreSQL database to help Arista manage this information. More specifically, Arista needs to track how often each device reports an error each day and the severity of those errors.
Create appropriate tables and relationships to manage this. Then, write a SQL query that lists each device, the number of errors it reported on a specific day (let's say 2022-09-30), and the average severity of those errors.
Assume the concept of 'severity' is represented as an integer, where higher values mean more severe errors.
Sample input data might look like this:
device_id | device_name | model | installation_date |
---|---|---|---|
1 | Router_A | Model_X | 01/01/2020 |
2 | Switch_B | Model_Y | 02/02/2020 |
3 | Server_C | Model_Z | 03/03/2020 |
log_id | device_id | error_reported_date | error_severity |
---|---|---|---|
1 | 1 | 09/30/2022 | 4 |
2 | 2 | 09/30/2022 | 5 |
3 | 1 | 09/30/2022 | 7 |
4 | 3 | 09/30/2022 | 6 |
5 | 1 | 09/30/2022 | 5 |
The SQL query joins the 'devices' and 'error_logs' tables on the 'device_id' column. It only includes rows where the 'error_reported_date' is '09/30/2022'. The 'GROUP BY' clause groups these rows by device_name, and for each group, it calculates the total number of errors (using COUNT) and the average severity of those errors (using AVG).
A foreign key is a column or group of columns in a table that refers to the primary key in another table. The foreign key constraint helps maintain referential integrity between the two tables.
Let's examine employee data from Arista Networks's HR database:
:
+------------+------------+------------+------------+ | employee_id| first_name | last_name | manager_id | +------------+------------+------------+------------+ | 1 | Aubrey | Graham | 3 | | 2 | Marshal | Mathers | 3 | | 3 | Dwayne | Carter | 4 | | 4 | Shawn | Carter | | +------------+------------+------------+------------+
In this table, serves as the primary key and functions as a foreign key because it links to the of the employee's manager. This establishes a relationship between Arista Networks employees and their managers, allowing for easy querying to find an employee's manager or see which employees report to a specific manager.
The table may also have multiple foreign keys that reference primary keys in other tables. For example, and foreign keys could be used to connect each employee to their respective department and location.
Arista Networks is interested in filtering their customer records to find customers experiencing network issues within a specific date range. The relevant result should present customer_id, customer_name, connectivity_issue (boolean if they had one, true or false) and issue_reported date.
Given the and tables formatted as below, write a SQL query that fulfills this requirement. The date range for the filtering should be between '2022-01-01' and '2022-06-30'.
customer_id | customer_name |
---|---|
101 | James |
102 | Emily |
103 | Anthony |
104 | Sophie |
105 | Liam |
issue_id | customer_id | connectivity_issue | issue_reported |
---|---|---|---|
5001 | 101 | true | 2022-02-01 |
6002 | 102 | false | 2022-03-15 |
7003 | 103 | true | 2022-05-10 |
8004 | 104 | false | 2022-07-20 |
9005 | 105 | true | 2021-12-22 |
The following PostgreSQL query should solve your problem:
This query will return all the customer records with id, name who had connectivity issues (true) reported between the dates '2022-01-01' and '2022-06-30'. The in the query is used to combine rows from both and tables based on the matching customer_id. Then we use the clause to filter out only those records where connectivity_issue happened within the specified date range.
undefined
Both the and window functions are used to access a row at a specific offset from the current row.
However, the function retrieves a value from a row that follows the current row, whereas the function retrieves a value from a row that precedes the current row.
Say you had a table of salary data for Arista Networks employees:
name | salary |
---|---|
Amanda | 130000 |
Brandon | 90000 |
Carlita | 80000 |
You could use the function to output the salary of each employee, along with the next highest-paid employee:
This would yield the following output:
name | salary | next_salary |
---|---|---|
Amanda | 130000 | 90000 |
Brandon | 90000 | 80000 |
Carlita | 80000 | NULL |
Swapping for would get you the salary of the person who made just more than you:
name | salary | next_salary |
---|---|---|
Amanda | 130000 | NULL |
Brandon | 90000 | 130000 |
Carlita | 80000 | 90000 |
Arista Networks has a large database with information about their customers. The company is interested in finding any customer records where the customer's name starts with "Aris". Write a SQL query to filter these customers out.
customer_id | first_name | last_name |
---|---|---|
7 | Arista | Smith |
14 | Arista | Johnson |
3 | Alice | Brown |
22 | Arista | Taylor |
11 | Bob | Miller |
customer_id | first_name | last_name |
---|---|---|
7 | Arista | Smith |
14 | Arista | Johnson |
22 | Arista | Taylor |
This query starts by selecting all columns from the table. It then applies a condition that filters for only those rows where begins with the string 'Aris'. The sign is a wildcard that matches zero, one, or multiple characters, so any customer whose first name starts with 'Aris' will be included in the result. undefined
Database normalization is the process of breaking down a table into smaller and more specific tables and defining relationships between them via foreign keys. This minimizes redundancy, and creates a database that's more flexible, scalable, and easier to maintain. It also helps to ensure the integrity of the data by minimizing the risk of data inconsistencies and anomalies.
Given tables 'customers' and 'orders', write a SQL query to find the total number of orders and the total amount spent for each customer.
The table 'customers' has the following schema:
Example Input:
customer_id | first_name | last_name |
---|---|---|
101 | John | Doe |
102 | Jane | Doe |
103 | Mary | Johnson |
104 | James | Smith |
The table 'orders' has the following schema:
Example Input:
order_id | customer_id | order_amount | order_date |
---|---|---|---|
5101 | 101 | 250.50 | 07/20/2022 |
5102 | 102 | 175.00 | 08/12/2022 |
5103 | 103 | 120.00 | 07/26/2022 |
5104 | 101 | 300.00 | 07/30/2022 |
In this query, we join the customers and orders tables on the customer_id field. We then use the GROUP BY clause to group the results by customer, and count the number of orders and calculate the total amount spent for each customer. The results are ordered by the total number of orders in descending order.
Since joins come up frequently during SQL interviews, try an interactive SQL join question from Spotify:
In Arista Networks' system monitoring database, each record logs the sent packets and received packets of a unique route in the network at a recorded time. The network team wants to know the total load (total sent packets + total received packets) on the network for every interval of 10 minutes. They also want this data to be rounded to the nearest integer, because fractional packets do not make sense. The total sent and received packets are in millions. You need to calculate this from the table.
In this query, we first extract the 10-minute interval in which each record falls, then we group by this interval. The total load for each interval is the sum of the sent and received packets within that interval, rounded to the nearest integer.
To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating total utilization time over periods or this Stripe Repeated Payments Question which is similar for detecting repeated events within a certain timeframe.
Some similarities between unique and non-unique indexes include:
Some differences between unique and non-unique indexes include:
The key to acing a Arista Networks SQL interview is to practice, practice, and then practice some more! Besides solving the above Arista Networks SQL interview questions, you should also solve the 200+ SQL Interview Questions on DataLemur which come from companies like Facebook, Google and unicorn tech startups.
Each interview question has multiple hints, fully explained answers along with a discussion board to see how others solved it and most importantly, there's an online SQL coding environment so you can right in the browser run your query and have it checked.
To prep for the Arista Networks SQL interview it is also a great idea to practice interview questions from other tech companies like:
However, if your SQL coding skills are weak, don't worry about jumping right into solving questions – improve your SQL foundations with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers things like creating summary stats with GROUP BY and handling strings – both of which pop up routinely in Arista Networks SQL interviews.
For the Arista Networks Data Science Interview, besides SQL questions, the other types of questions to practice:
The best way to prepare for Arista Networks Data Science interviews is by reading Ace the Data Science Interview. The book's got: