ETL

Obesity and Percapita Income - ETL

The goal of our project was to look at the relationship between a country’s economic conditions and its level of obesity. In order to draw the relationship, GDP as an indication of a country’s economic standing and BMI as an indication of the rate of obesity in said country were used.

ETL Process
  • Data Import/Extract Sources
  • Data Transform
  • Data Integrity
  • Data Loading and Availability


Data Import/Extract Sources
Our dataset from Kaggle examines country’s levels of obesity. These values are given as a percentage and show an average across the time period of 1975-2016. Our data set from Wikipedia shows a country’s GDP per capita. These values are listed in dollars. The links to the two data sets are:
  • https://www.kaggle.com/amanarora/obesity-among-adults-by-country-19752016
  • https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita


Data Transform
Data was imported to python and cleaned the data. Renamed the headers with more appropriate names. Some of the country names were adjusted so that they were consistent in both data sets. Finally the two data sets were merged inot sinngle Pandas dataframe and null values were dropped.

Data Integrity
The final dataset had 165 countries with values for obesity and GDP. It was an acceptable dataset for further analysis. The data was saved in a PostgreSQL Database called Obesity.

Data Loading and Availability Our client can access the database using both python dataframes and PostgreSQL.


Used: Python, Pandas, PostgreSQL
Github: https://github.com/lumindak/ETL-project-Global-Obesity-data