Valuable Matplotlib & Seaborn Visualization Handbook, Part II

This post summarizes the top 50 most valuable Matplotlib & Seaborn data visualizations in data science. It can be taken as a data visualization handbook for you to look up for useful visulaization. The 50 visualizations are categorized into 7 different application scenarios, and this post would mainly focuses on the first two categories, shown as follows: Correlation, Deviation, Ranking, Distribution, Composition, Change, and Groups. The whole content is divided into three parts, and this post is Part II. We will cover Ranking and Distribution in this post.

more ...

Valuable Matplotlib & Seaborn Visualization Handbook, Part I

This post summarizes the top 50 most valuable Matplotlib & Seaborn data visualizations in data science. It can be taken as a data visualization handbook for you to look up for useful visulaization. The 50 visualizations are categorized into 7 different application scenarios, and this post would mainly focuses on the first two categories, shown as follows: Correlation, Deviation, Ranking, Distribution, Composition, Change, and Groups. The whole content is divided into three parts, and this post is Part I. We will cover the first two categories in Part I.

more ...

New Airbnb User Booking Prediction

The basic aim of this notebook is to predict new Airbnb users' first destination country based a historical dataset. This work involves a considerable amount of data cleansing work. After the dataset is cleaned and preprocessed, I use a popular xgboost classifier as the prediction model, and grid searching with 3-fold cross-validation to find the most suitable parameters for the classifier. If you are interested in finding more about the dataset or the xgboost classifier, please follow along this article.

more ...

Indian diabetes database modeling using Naive Bayes

Coding from scratch. To better understand the process of Naive Bayes model training. And in the end, we make another Naive Bayes Classifier through normal workflow using sklearn. In machine learning, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features. A naive Bayes classifier considers each of these features to be independent to each other. That why it is referred as "Naive".

more ...

Churn Rate Prediction Using Neural Networks

The basic aim of this post is to predict customer churn for a certain bank i.e. which customer is going to leave this bank service. Fully-connected neural networks with different architectures will be explored and compared. Since this is a binary prediction , logistic regression may also works pretty well. Feel free to give it a shot~

more ...