Oreilly - Practical Data Science with R Video Edition
by Nina Zumel, John Mount | Released March 2014 | ISBN: None
"A unique and important addition to any data scientist's library." Jim Porzak, Cofounder Bay Area R Users Group Practical Data Science with R lives up to its name. It explains basic principles without the theoretical mumbo-jumbo and jumps right to the real use cases you'll face as you collect, curate, and analyze the data crucial to the success of your business. It shows you how to design experiments (such as A/B tests), build predictive models, and present results to audiences of all levels. You'll apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business intelligence, and decision support. Business analysts and developers are increasingly collecting, curating, analyzing, and reporting on crucial business data. The R language and its associated tools provide a straightforward way to tackle day-to-day data science tasks without a lot of academic theory or advanced mathematics. Inside: Data science for the business professional Statistical analysis using the R language Project lifecycle, from planning to delivery Numerous instantly familiar use cases Keys to effective data presentations This book is accessible to readers without a background in data science. Some familiarity with basic statistics, R, or another scripting language is assumed. Nina Zumel and John Mount are cofounders of a San Francisco-based data science consulting firm. Both hold PhDs from Carnegie Mellon and blog on statistics, probability, and computer science at win-vector.com. Covers the process end-to-end, from data exploration to modeling to delivering the results. Nezih Yigitbasi, Intel Full of useful gems for both aspiring and experienced data scientists. Fred Rahmanian, Siemens Healthcare Hands-on data analysis with real-world examples. Highly recommended. Dr. Kostas Passadis, IPTO NARRATED BY JOSEF GAGNIER Show and hide more
- Chapter 1. The data science process 00:06:59
- Chapter 1. Stages of a data science project 00:08:10
- Chapter 1. Modeling 00:09:17
- Chapter 1. Setting expectations 00:06:56
- Chapter 2. Loading data into R 00:07:00
- Chapter 2. Using R on less-structured data 00:04:04
- Chapter 2. Working with relational databases 00:07:30
- Chapter 2. Loading data from a database into R 00:07:36
- Chapter 3. Exploring data 00:04:58
- Chapter 3. Typical problems revealed by data summaries 00:07:08
- Chapter 3. Spotting problems using graphics and visualization 00:04:20
- Chapter 3. Visually checking distributions for a single variable 00:09:32
- Chapter 3. Visually checking relationships between two variables 00:11:07
- Chapter 4. Managing data 00:08:10
- Chapter 4. Data transformations 00:09:39
- Chapter 4. Sampling for modeling and validation 00:08:30
- Chapter 5. Choosing and evaluating models 00:06:17
- Chapter 5. Solving scoring problems 00:07:04
- Chapter 5. Evaluating models 00:12:07
- Chapter 5. Evaluating scoring models 00:06:26
- Chapter 5. Evaluating probability models 00:07:48
- Chapter 5. Evaluating ranking models 00:03:56
- Chapter 5. Validating models 00:05:22
- Chapter 5. Ensuring model quality 00:10:04
- Chapter 6. Memorization methods 00:07:09
- Chapter 6. Building single-variable models 00:07:25
- Chapter 6. Using cross-validation to estimate effects of overfitting 00:04:10
- Chapter 6. Building models using many variables 00:08:28
- Chapter 6. Using nearest neighbor methods 00:03:18
- Chapter 6. Using Naive Bayes 00:07:29
- Chapter 6. Summary 00:04:04
- Chapter 7. Linear and logistic regression 00:11:43
- Chapter 7. Building a linear regression model 00:09:17
- Chapter 7. Finding relations and extracting advice 00:05:05
- Chapter 7. Reading the model summary and characterizing coefficient quality 00:05:15
- Chapter 7. Statistics as an attempt to correct bad experimental design 00:08:19
- Chapter 7. Using logistic regression 00:06:39
- Chapter 7. Building a logistic regression model 00:08:03
- Chapter 7. Finding relations and extracting advice from logistic models 00:04:39
- Chapter 7. Reading the model summary and characterizing coefficients 00:06:30
- Chapter 7. Null and residual deviances 00:07:27
- Chapter 7. Logistic regression takeaways 00:03:35
- Chapter 8. Unsupervised methods 00:09:22
- Chapter 8. Hierarchical clustering with hclust() 00:08:17
- Chapter 8. Picking the number of clusters 00:04:42
- Chapter 8. The k-means algorithm 00:06:41
- Chapter 8. Association rules 00:07:08
- Chapter 8. Mining association rules with the arules package 00:09:53
- Chapter 8. Association rule takeaways 00:02:18
- Chapter 9. Exploring advanced methods 00:05:41
- Chapter 9. Using bagging to improve prediction 00:04:33
- Chapter 9. Using random forests to further improve prediction 00:07:09
- Chapter 9. Using generalized additive models (GAMs) to learn non-monotone relationships 00:07:24
- Chapter 9. Extracting the nonlinear relationships 00:06:36
- Chapter 9. Using kernel methods to increase data separation 00:10:08
- Chapter 9. Using an explicit kernel on a problem 00:04:25
- Chapter 9. Using SVMs to model complicated decision boundaries 00:08:04
- Chapter 9. Trying an SVM on artificial example data 00:07:07
- Chapter 9. Support vector machine takeaways 00:03:12
- Chapter 10. Documentation and deployment 00:03:59
- Chapter 10. Using knitr to produce milestone documentation 00:06:19
- Chapter 10. Using knitr to document the buzz data 00:06:09
- Chapter 10. Using comments and version control for running documentation 00:06:42
- Chapter 10. Using version control to record history 00:05:18
- Chapter 10. Using version control to explore your project 00:06:02
- Chapter 10. Using version control to share work 00:08:17
- Chapter 10. Deploying models 00:08:35
- Chapter 11. Producing effective presentations 00:05:29
- Chapter 11. Summarizing the project’s goals 00:07:10
- Chapter 11. Presenting your model to end users 00:06:00
- Chapter 11. Presenting your work to other data scientists 00:07:49
Show and hide more