Machine learning is a branch in computer science that studies the design of algorithms that can learn. Typical machine learning tasks are concept learning, function learning or “predictive modeling”, clustering and finding predictive patterns. These tasks are learned through available data that were observed through experiences or instructions, for example. Machine learning hopes that including the experience into its tasks will eventually improve the learning. The ultimate goal is to improve the learning in such a way that it becomes automatic, so that humans like ourselves don’t need to interfere any more.
Machine Learning File List - Feature Engineering
|Exploratory Data Analysis (EDA) & Pre-Processing
|MICE (Multivariate Imputation via Chained Equations) is one of the commonly used package by R users. Creating multiple imputations as compared to a single imputation (such as mean) takes care of uncertainty in missing values. MICE assumes that the missing data are Missing at Random (MAR), which means that the probability that a value is missing depends only on observed value and can be predicted using them. Here is a link to other methods.
|BIG UPADTE - Pre-Processing.html
|One of several documents I plan on the topic of preparing data before developing machine learning models. I spend more time collecting, cleaning, pre-processing an feature engineering data then I ever do building the models. 90% of my time is preparing the data.
|More depth on scaling and skew. Originally created to demostrate Box-Cox Transformation
|A unique outlier transformation that is eay to perform. I need to try this more!
|Two basic methods to impute missing data. There will be more articles on this topic.
|Lear how to use tidyr to tidy your data - an essential R operation.
|Selecting the right features in your data can mean the difference between mediocre performance with long training times and great performance with short training times. The caret R package provides tools automatically report on the relevance and importance of attributes in your data and even select the most important features for you.
|Model Selection, Bulding & Performance
|This document was part of the Duke Statistics Course I took on Coursera. The materials were so good I provide it here for you to learn simple linear regression. I simply could not develop something as useful as this. I hope you like this as a learnnig tool as much as I.
|An introduction to K-Means Clustering
|A basic, easy-to-understand neural network model using neuralnet
|Keras & CNN in R!!
|The ONLY R gap - CNNs - is now resolved. Use Keras in R!