Quick Hits

What is R?

The Second Machine Age is coming to life through the explosive growth of data science and machine learning. How do many of these technologies get built? It is simple. A powerful computer programming language called R provides the statistical, data munging, visualization and machine learning capabilities driving many of these technology advancements. Start learning R.

What is Data Science?

Data Science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, data mining., and predictive analytics. Data science has become a fourth approach to scientific discovery, in addition to experimentation, modeling, and computation

What is Machine Learning?

Machine learning is a field of science that focusses on mathematically describing patterns in data. It is a skill set that combines computer science, statistics, operations research, engineering, business insights and strategy — and the impact it can have on a business.

What is Deep Learning (Neural Nets)?

Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. Please, do yourself a favor and take a minute to know the difference between deep learning, machine learning and AI. While the general public uses the terms interchangeable does not mean you should!

Key Concepts

A learning method must have low variance and low bias. Variance is the error amount that changes when used with a different training set. The more flexible the method, the higher the variance. Bias is the error introduced by approximating a real life problem by a simpler model. More flexible models result in less bias.

Learning Methods

Supervised learning builds a model for predicting or estimating an output based on one or more inputs. Unsupervised learning involves inputs but no supervising outputs – learn relationships and structure of data. (The model lacks the response variable that can supervise the analysis.)

Fundamental Data Science Tools

Missing RStudio image

RStudio

RStudio is a free and open source integrated development environment (IDE) for R, a programming language for statistical computing and graphics.

R Markdown

Dynamic Documents for R. R Markdown is an authoring format that enables easy creation of dynamic documents, presentations, and reports from R. It combines the core syntax of markdown (an easy to write plain text format) with embedded R code chunks that are run so their output can be included in the final document.

RStudio's Shiny

Shiny is an open source R package that provides an elegant and powerful web framework for building web applications using R. Shiny helps you turn your analyses into interactive web applications without requiring HTML, CSS, or JavaScript knowledge.

Data Science

This is the website for “R for Data Science”. You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. You’ll learn how to clean data and draw plots—and many other things besides. Find the best practices for doing each of these things with R. Learn how to use the grammar of graphics, literate programming, and reproducible research to save time.