Workforce & Store Performance Analysis
A collaborative data analysis project combining SQL querying and Python visualisation to answer key business questions.
A collaborative data analysis project combining SQL querying and Python visualisation to answer key business questions.
This project was completed as part of a collaborative data analysis assignment using the AdventureWorks 2019 database. I acted as the group lead and was responsible for Questions 4 and 5, which focused on workforce sick leave and store performance.
To use SQL and Python to investigate:
The relationship between sick leave and job role (Q4)
The relationship between store trading duration and revenue (Q5)
The goal was to turn raw operational data into insights that support workforce planning and business strategy.
SQL (SSMS + AdventureWorks Database) - data extraction and aggregation
Python (Pandas, Matplotlib) - data analysis and visualisation
VS Code- scripting and analysis
CSV exports - data transfer between SQL and Python
Queried the AdventureWorks database to aggregate total sick leave hours by department
Filtered data to analyse roles within the Production Department
Created a query to calculate store trading duration and match it with total revenue
Exported SQL outputs to CSV files for further analysis in Python
Loaded CSV files in Pandas DataFrames
Sorted and filtered results to focus on the most relevant categories
Structured data for clear visual comparisons between roles, departments an store performance
Used horizontal bar charts to compare sick leave across job roles and to explore variation within production roles
Created a scatter plot to assess the relationship between store age and revenue
These were created in Python using Matplotlib after importing SQL query outputs using Pandas:
Figure 1 (Q4): Total sick leave hours by department showing significantly higher absences within production roles
Figure 2 (Q4): Sick leave distribution across Production roles, with technician positons contributing the highest total.
Figure 3 (Q3): Store age vs total revenue, showing no clear relationship between store longevity and performance
The Production department recorded significantly higher total sick leave hours than other departments.
Within Production, specific job titles showed much higher sick leave totals, suggesting role-related physical or environmental factors.
The scatter plot showed no clear relationship between how long a store has been operating and its total revenue.
Below are code snippets demonstrating how data was extracted using SQL and analysed using Python:
SELECT
d.Name AS Job_Type,
SUM(e.SickLeaveHours) AS Total_Sick_Hours
FROM HumanResources.Employee AS e
INNER JOIN HumanResources.EmployeeDepartmentHistory AS dh
ON e.BusinessEntityID = dh.BusinessEntityID
INNER JOIN HumanResources.Department AS d
ON dh.DepartmentID = d.DepartmentID
GROUP BY d.Name;
SELECT
d.Name AS Job_type,
e.JobTitle AS Job_Title,
SUM(e.SickLeaveHours AS Total_Sick_Hours
FROM HumanResources.Employee AS e
INNER JOIN HumanResources.EmployeeDepartmentHistory AS d
ON dh.DepartmentID = d.DepartmentID
WHERE d.Name = 'Production'
GROUP BY d.Name.e.JobTitle;
import pandas as pd
import matplotlib.pyplot as plt
sickleave = pd.read_csv("data/dickleave.csv")
sickleave = sickleave.sort_values("Total_Sick_Hours", ascending = False)
x = sickleave ['Job_Type']
y = sickleave ['Total_Sick_Hours']
plt.barh (x,y)
plt.xlabel ('Days of sick leave', size = 15)
plt.ylabel ('Job Title', size = 15)
plt.title ('The relationship between job roles and sick leave', size = 15)
plt.show ()
Import pandas as pd
import matplotlib.pyplot as plt
production = pd.read_csv("data/production.csv")
production = production.sort_values ("Total_Sick_Hours", ascending = False)
y = production ['Job_Title']
x = production ['Total_Sick_Hours']
plt.barh (x,y)
plt.xlabel ('Total amount of hours', size = 15)
plt.ylabel ('Job Title', size = 15)
plt.title ('Total Hours of sick leave in Production Department', size = 15)
plt.show ()
SELECT DISTINCT
SD.Name AS Store_Name,
2014 - SD.YearsOpened AS YearsOpenedToDate,
SOH.TotalDue
FROM Sales.vStoreWithDemographics AS SD
JOIN Sales.Customer AS C
ON SD.BusinessEntityID = C.StoreID
JOIN Sales.SalesOrderHeader AS SOH
ON C.CustomerID = SOH.CustomerID
ORDER BY YearsOpenedToDate;
import pandas as pd
import matplotlib.pyplot as plt
revenue_years_open = pd.read_csv("data/revenueyearsopen.csv")
Print(revenue_years_open)
y = revenue_years_open ['TotalDue']
x = revenue_years_open ['Years_Operating']
plt.scatter (x,y)
plt.ylabel ('Total Revenue', size = 15)
plt.xlabel ('Store's Operating Years', size = 15)
plt.title('Stores Operating years and total revenue')
plt.show()
Based on the analysis, the following recommendations were identified:
Review working conditions, workload, number of shifts and ergonomics in roles with consistently high sick leave in hopes of finding a cause.
Track sick leave patterns across all departments over time to identify any emerging issues.
Since store age alone doesn't explain revenue differences, investigate other factors such as location, customer demographics, or management practices.
This project strengthened my ability to work across multiple tools in one workflow, moving from SQL data extraction to Python-based analysis and visualisation. Acting as group lead also developed my communication and coordination skills, ensuring that technical findings were clearly presented and aligned across the team.
Translate business questions into database queries
Combining structured SQL outputs with flexible Python analysis
Choosing the right visual for the type of relationship being explored