Author Archives

Logistic Regression
Objective: Quick way to learn how to apply logistic regression using R packages. Logistic regression is used when dependent variable (Y) is a binary and to test the relationship between Y and other independent variables (X). Source:http://ftp.ics.uci.edu/pub/machinelearningdatabases/pimaindiansdiabetes We need to… Read More ›

SMOTE Technique – Oversample Minority Class
Objective:Using SMOTE technique to oversample minority class in your dataset. Quick look into the dataset. We will be using the same breast diagnosis dataset as the previous blogpost on PCA dimensionality reduction. This is a skewed dataset, with 37% for… Read More ›

How to calculate ratio for over sampling or under sampling?
We often need to deal with large imbalanced dataset, and it is so common that we usually need to analyze and do predictive modeling on minority cases. There are many methods to handle such imbalanced data, and here is just… Read More ›

PCA Dimensionality Reduction
Objective: To reduce dimensionality in the dataset, for instance from 3D to 2D while keeping the trends and patterns. Certain preprocessing required such as feature scaling/mean normalization. Dataset: Breast Cancer Wisconsin (Diagnostic) Source: UCI Machine Learning df < read.csv(“data.csv”, header=TRUE)… Read More ›

Factor Analysis Using Simple Dataset
Author: dotoku Title: Application of Factor Analysis in Marketing Research Factor analysis is considered as one of the data reduction methods and there are several ways to conduct factor analysis such as principal axis factor, maximum likelihood, generalized least squares… Read More ›

SQL – DROP OR DELETE
DROP can be used to delete an entire table or view in a database, whereas DELETE just removes the rows/records in a table or view but its structure will still remain. So be careful out there when you want to… Read More ›

SQL – CONSTRAINT
Constraint can be used to restrict the data stored in an instance of a database. Only valid instance will be stored in the database if it satisfies all constrain conditions stated on the database schema. Example of constraints – PRIMARY… Read More ›