Exploring Global Population Trends
A python analysis task using a population dataset to answer business style questions through filtering, aggregation and visualisation.
A python analysis task using a population dataset to answer business style questions through filtering, aggregation and visualisation.
This project was completed as part of the Generation UK Data Analyst programme, using a global population dataset. The objective was to explore demographic data across countries and continents and answer structured analytical questions using Python.
The aim of this project was to use Python to uncover regional population trends, identify outliers, and measure growth patterns over time to support demographic comparisons.
Imported CSV data into Pandas DataFrames
Inspected dataset structure and column type
Filtered data by year, continent, and population thresholds
Identified missing or zero-value population records
Calculated total population values using aggregate functions
Computed average population values for regional comparisons
Created metrics such as population growth between years
Used conditional logic to classify countries above or below regional averages
Created horizontal bar charts to compare population distribution across many countries
Used scatter plots to highlight population outliers
Selected visual formats based on dataset size and comparison goals
The following visualisations were created to support key analytical questions and highlight patterns and outliers within the population data.
Figure 1: Country population in 2007, highlighting countires with populations >1000.
Figure 2: Population distribution across Africa in 2010, it highlights substantial variation in population size.
Several countries recorded population values of zero in 2000, suggesting missing or incomplete data
Africa's population in 2010 was unevenly distributed, with a small number of countries accounting for a large share
Population levels across South America varied depending on the regional average
Only a small number of countries exceeded a population of 1000 in 2007
Europe experienced an overall decline in population growth between 2000 and 2010
Selected Python code snippets demonstrating data filtering, aggregation and time based analysis.
pop_2000 = population[population['year'] == 2000]
no_data = pop_2000[pop_2000['population'] == 0]
countries_no_data = no_data[['country name', 'continent']].drop_duplicates()
print(countries_no_data)
pop_year_continent = population[(population['year'] == 2010) & (population['continent'] == 'Africa')]
total_pop_2010 = pop_year_continent['population'].sum()
print(total_pop_2010)
africa_2010 = population[(population['year'] == 2010) & (population['continent'] == 'Africa')]
africa_2010 = africa_2010.sort_values('population', ascending=False).head(10)
countries = africa_2010['country name']
populations = africa_2010['population']
plt.barh(countries, populations)
plt.xlabel('Population')
plt.ylabel('Country')
plt.title('Population across Africa in 2010')
plt.show()
europe_pop = population[population['continent'] == 'Europe']
europe_pop_2000 = europe_pop[europe_pop['year'] == 2000]['population'].sum()
europe_pop_2010 = europe_pop[europe_pop['year'] == 2010]['population'].sum()
growth = europe_pop_2010 - europe_pop_2000
growth_percent = growth/europe_pop_2000 * 100
print(growth_percent)
pivot = europe_pop.pivot(index = 'country name', columns = 'year', values = 'population')
pivot['growth'] = pivot[2010] - pivot[2000]
top = pivot.sort_values(by ='growth').head(5)
print(top)
This project strengthened my ability to use Python, Pandas, and Matplotlib to explore large datasets and translate analytical questions into insights. While I am still learning Python, this task demonstrated my ability to translate analytical questions into code and communicate findings clearly through structured analysis and visuals.