My Projects

Topics Extraction from Conference Speeches

The objective of this Unsupervised Learning Project is to propose a way to extract key insights and topics efficiently from a group of conference speeches. I implemented a unsupervised learning model using the algorithm K-means, created wordcloud visualizations to represent the most frequent and relevant words in each cluster and finally, I utilized a topic modeling algorithm called Latent Dirichlet Allocation (LDA) to discover the underlying topics within the clusters.

Flight Delays Predictions 

This project consists of two parts: flight & weather datasets analysis and, flight delays prediction. One dataset has a year’s worth of all US flight delay info retrieved from Kaggle and the other dataset has been gathered by web-scraping weather site. My team and I implemented a model to predict weather-induced airline delays using ML algorithm Random Forest and built a streamlit application to provide an user-interactive interfce.

Principal Components Analysis with numpy

Applying PCA to a dataset without using any of the popular machine learning libraries such as scikit-learn and statsmodels. The goal of this document is to have a deeper understanding of the PCA fundamentals using functions just from numpy library.

Online News Popularity Prediction 
This is a Supervised Learning Project which objective is to predict the popularity of articles published by Mashable website. The machine learning algorithms used for this project were: Random Forest, Support Vector Classification and KNN / K-Nearest Neighbor.

Shopper Segmentation 
The objective of this project is to segment shoppers from a dataset given. K-Means, Agglomerative and DBSCAN are the three different unsupervised machine learning algorithms used for the project. At the end of the notebook, you can find the evaluation of those models comparing metrics as ARS (Adjusted Rand Score), NMI (Normalized Mutual Information) and Average Score.

Women Legal Rights in the World
This is an Analytic Report of legal gender differentiation around the world. The analysis of data collected in 187 countries, from 2009 to 2018, highlights the inequity in terms of laws and regulations.

Predictions of Admissions to Master’s Degree 
Using a Linear Regression Algorithm, this project was developed to predict the chance of admission of foreign students to Master’s Degree Programs in American Colleges.

Experimental Design 
From a dataset with historical sales data for 45 stores located in different regions, this project is an Experimental Design to identify the department which the highest number in lost sales, execute some actions to reverse those numbers and evaluate the results.

Book Recommender 
The book recommender has been a good experience to explore Neural Networks. Two approaches were used to create this project, the first one was Matrix Factorization (using Keras) and Restricted Boltzmann Machines (using TensorFlow).