Data Science for All Colombia (DS4A)

Data Science train program

What I learned...

In the DS4A program I learned hard skills like:

  • Use python efficiently to perform ETL processes.
  • Reinforce learning about SQL and how to have a PostgreSQL database on the cloud (AWS).
  • Use visualization tools like matplotlib, seaborn and dash to look the data and have a best idea of it's behavior and meaning.
  • To use robust statistical methods to find correlation and causilities from the data.
  • To know about different machine learning algorithms and realize which is the best model to perform a certain task.

Also I had the opportunity to reinforce some soft skills like:

  • Asking the right questions.
  • Address different types of problems with a scientific methodology.
  • Assertive communication with coworkers with different professional backgrounds.
  • Team work with high pressure deadlines.
  • Teaching others.

Final Project

In the DS4A program we had to present a final project. It had to be a product that helps with the solution of an specific problems that people were having in the colombian context.

About the project

My team and I did a deep analysis of the inequality in Colombia. We made an interactive dashboard to see the main causes of social inequality based on the DANE 2018 population census, containing variables such as working status, education, marital status, number of children, healthcare, geographic and location to perform a non-supervised clustering technique (K-Means) to cluster all colombian municipalities (1122) into just 5 clusters.

It was an ambitious project. Performing querys efficently on a 45+ million rows database was difficult. Using python and dash to create an interactive dashboard that shown the insights we found was exciting.Having the dashboard and the database on the cloud and being accesible from anywhere in the web was amazing. It was an astonishing experience.

You can find the repository of the project Here!