Workforce & Store Performance Analysis

A collaborative data analysis project combining SQL querying and Python visualisation to answer key business questions.

Context

This project was completed as part of a collaborative data analysis assignment using the AdventureWorks 2019 database. I acted as the group lead and was responsible for Questions 4 and 5, which focused on workforce sick leave and store performance.

Aim

To use SQL and Python to investigate:

The relationship between sick leave and job role (Q4)
The relationship between store trading duration and revenue (Q5)

The goal was to turn raw operational data into insights that support workforce planning and business strategy.

Tools used

SQL (SSMS + AdventureWorks Database) - data extraction and aggregation
Python (Pandas, Matplotlib) - data analysis and visualisation
VS Code- scripting and analysis
CSV exports - data transfer between SQL and Python

Approach

🛠️Data Extraction (SQL)

Queried the AdventureWorks database to aggregate total sick leave hours by department
Filtered data to analyse roles within the Production Department
Created a query to calculate store trading duration and match it with total revenue
Exported SQL outputs to CSV files for further analysis in Python

💪Data Preparation (Python)

Loaded CSV files in Pandas DataFrames
Sorted and filtered results to focus on the most relevant categories
Structured data for clear visual comparisons between roles, departments an store performance

📊Visual Analysis

Used horizontal bar charts to compare sick leave across job roles and to explore variation within production roles
Created a scatter plot to assess the relationship between store age and revenue

Visualisations

These were created in Python using Matplotlib after importing SQL query outputs using Pandas:

Figure 1 (Q4): Total sick leave hours by department showing significantly higher absences within production roles

Figure 2 (Q4): Sick leave distribution across Production roles, with technician positons contributing the highest total.

Figure 3 (Q3): Store age vs total revenue, showing no clear relationship between store longevity and performance

Key Insights

Sick Leave Varies Strongly by Department

The Production department recorded significantly higher total sick leave hours than other departments.

Certain Roles Drive this Pattern

Within Production, specific job titles showed much higher sick leave totals, suggesting role-related physical or environmental factors.

Store Age Does Not Equal Higher Revenue

The scatter plot showed no clear relationship between how long a store has been operating and its total revenue.

SQL and Python code

Below are code snippets demonstrating how data was extracted using SQL and analysed using Python:

SQL Queries Q4

Sick leave amongst departments:

SELECT

d.Name AS Job_Type,

SUM(e.SickLeaveHours) AS Total_Sick_Hours

FROM HumanResources.Employee AS e

INNER JOIN HumanResources.EmployeeDepartmentHistory AS dh

ON e.BusinessEntityID = dh.BusinessEntityID

INNER JOIN HumanResources.Department AS d

ON dh.DepartmentID = d.DepartmentID

GROUP BY d.Name;

Roles within departments:

SELECT

d.Name AS Job_type,

e.JobTitle AS Job_Title,

SUM(e.SickLeaveHours AS Total_Sick_Hours

FROM HumanResources.Employee AS e

INNER JOIN HumanResources.EmployeeDepartmentHistory AS d

ON dh.DepartmentID = d.DepartmentID

WHERE d.Name = 'Production'

GROUP BY d.Name.e.JobTitle;

Python Code Q4

Department bar chart:

import pandas as pd

import matplotlib.pyplot as plt

sickleave = pd.read_csv("data/dickleave.csv")

sickleave = sickleave.sort_values("Total_Sick_Hours", ascending = False)

x = sickleave ['Job_Type']

y = sickleave ['Total_Sick_Hours']

plt.barh (x,y)

plt.xlabel ('Days of sick leave', size = 15)

plt.ylabel ('Job Title', size = 15)

plt.title ('The relationship between job roles and sick leave', size = 15)

plt.show ()

Production roles:

Import pandas as pd

import matplotlib.pyplot as plt

production = pd.read_csv("data/production.csv")

production = production.sort_values ("Total_Sick_Hours", ascending = False)

y = production ['Job_Title']

x = production ['Total_Sick_Hours']

plt.barh (x,y)

plt.xlabel ('Total amount of hours', size = 15)

plt.ylabel ('Job Title', size = 15)

plt.title ('Total Hours of sick leave in Production Department', size = 15)

plt.show ()

SQL Queries Q5

Store duration vs revenue:

SELECT DISTINCT

SD.Name AS Store_Name,

2014 - SD.YearsOpened AS YearsOpenedToDate,

SOH.TotalDue

FROM Sales.vStoreWithDemographics AS SD

JOIN Sales.Customer AS C

ON SD.BusinessEntityID = C.StoreID

JOIN Sales.SalesOrderHeader AS SOH

ON C.CustomerID = SOH.CustomerID

ORDER BY YearsOpenedToDate;

Python Code Q5

Store duration vs revenue:

import pandas as pd

import matplotlib.pyplot as plt

revenue_years_open = pd.read_csv("data/revenueyearsopen.csv")

Print(revenue_years_open)

y = revenue_years_open ['TotalDue']

x = revenue_years_open ['Years_Operating']

plt.scatter (x,y)

plt.ylabel ('Total Revenue', size = 15)

plt.xlabel ('Store's Operating Years', size = 15)

plt.title('Stores Operating years and total revenue')

plt.show()

Recommendations

Based on the analysis, the following recommendations were identified:

🔎Investigate High Sick Leave Roles

Review working conditions, workload, number of shifts and ergonomics in roles with consistently high sick leave in hopes of finding a cause.

📊Monitor Workforce Trends

Track sick leave patterns across all departments over time to identify any emerging issues.

🏬Focus on Performance Drivers

Since store age alone doesn't explain revenue differences, investigate other factors such as location, customer demographics, or management practices.

Reflections

This project strengthened my ability to work across multiple tools in one workflow, moving from SQL data extraction to Python-based analysis and visualisation. Acting as group lead also developed my communication and coordination skills, ensuring that technical findings were clearly presented and aligned across the team.

It was particularly valuable in learning how to:

Translate business questions into database queries
Combining structured SQL outputs with flexible Python analysis
Choosing the right visual for the type of relationship being explored

Back to Projects

Page updated

Google Sites

Report abuse