Machine Learning

Applying R & Statistics To Predict

Machine learning is a branch in computer science that studies the design of algorithms that can learn. Typical machine learning tasks are concept learning, function learning or “predictive modeling”, clustering and finding predictive patterns. These tasks are learned through available data that were observed through experiences or instructions, for example. Machine learning hopes that including the experience into its tasks will eventually improve the learning. The ultimate goal is to improve the learning in such a way that it becomes automatic, so that humans like ourselves don’t need to interfere any more.

Machine Learning File List - Feature Engineering

Name	Description	Date
*Exploratory Data Analysis (EDA) & Pre-Processing*
MICE.html	MICE (Multivariate Imputation via Chained Equations) is one of the commonly used package by R users. Creating multiple imputations as compared to a single imputation (such as mean) takes care of uncertainty in missing values. MICE assumes that the missing data are Missing at Random (MAR), which means that the probability that a value is missing depends only on observed value and can be predicted using them. Here is a link to other methods.	6/29/2017
BIG UPADTE - Pre-Processing.html	One of several documents I plan on the topic of preparing data before developing machine learning models. I spend more time collecting, cleaning, pre-processing an feature engineering data then I ever do building the models. 90% of my time is preparing the data.	8/19/2017
ScalingAndSkew.html	More depth on scaling and skew. Originally created to demostrate Box-Cox Transformation	7/1/2017
OutliersSpatialSign.html	A unique outlier transformation that is eay to perform. I need to try this more!	7/1/2017
ImputeMissingData1.html	Two basic methods to impute missing data. There will be more articles on this topic.	7/1/2017
tidyr.html	Lear how to use tidyr to tidy your data - an essential R operation.	7/1/2017
*Feature Engineering*
featureSelectionCaret.html	Selecting the right features in your data can mean the difference between mediocre performance with long training times and great performance with short training times. The caret R package provides tools automatically report on the relevance and importance of attributes in your data and even select the most important features for you.
*Model Selection, Bulding & Performance*
IntroLinearRegression.html	This document was part of the Duke Statistics Course I took on Coursera. The materials were so good I provide it here for you to learn simple linear regression. I simply could not develop something as useful as this. I hope you like this as a learnnig tool as much as I.	7/2/2017
kMeansClustering.html	An introduction to K-Means Clustering	7/2/2017
NeuralNetworkBasic.html	A basic, easy-to-understand neural network model using neuralnet	7/14/2017
Keras & CNN in R!!	The ONLY R gap - CNNs - is now resolved. Use Keras in R!	8/18/2017