Machine Learning File List - Feature Engineering

Name Description Date
Exploratory Data Analysis (EDA) & Pre-Processing
MICE.html MICE (Multivariate Imputation via Chained Equations) is one of the commonly used package by R users. Creating multiple imputations as compared to a single imputation (such as mean) takes care of uncertainty in missing values. MICE assumes that the missing data are Missing at Random (MAR), which means that the probability that a value is missing depends only on observed value and can be predicted using them. Here is a link to other methods. 6/29/2017
BIG UPADTE - Pre-Processing.html One of several documents I plan on the topic of preparing data before developing machine learning models. I spend more time collecting, cleaning, pre-processing an feature engineering data then I ever do building the models. 90% of my time is preparing the data. 8/19/2017
ScalingAndSkew.html More depth on scaling and skew. Originally created to demostrate Box-Cox Transformation 7/1/2017
OutliersSpatialSign.html A unique outlier transformation that is eay to perform. I need to try this more! 7/1/2017
ImputeMissingData1.html Two basic methods to impute missing data. There will be more articles on this topic. 7/1/2017
tidyr.html Lear how to use tidyr to tidy your data - an essential R operation. 7/1/2017
Feature Engineering
featureSelectionCaret.html Selecting the right features in your data can mean the difference between mediocre performance with long training times and great performance with short training times. The caret R package provides tools automatically report on the relevance and importance of attributes in your data and even select the most important features for you.
Model Selection, Bulding & Performance
IntroLinearRegression.html This document was part of the Duke Statistics Course I took on Coursera. The materials were so good I provide it here for you to learn simple linear regression. I simply could not develop something as useful as this. I hope you like this as a learnnig tool as much as I. 7/2/2017
kMeansClustering.html An introduction to K-Means Clustering 7/2/2017
NeuralNetworkBasic.html A basic, easy-to-understand neural network model using neuralnet 7/14/2017
Keras & CNN in R!! The ONLY R gap - CNNs - is now resolved. Use Keras in R! 8/18/2017