->
Oreilly - Debugging Apache Spark - 9781492039174
Oreilly - Debugging Apache Spark
by Holden Karau | Released November 2018 | ISBN: 9781492039167


Apache Spark is an extremely powerful general purpose distributed system that also happens to be extremely difficult to debug. This video, designed for intermediate-level Spark developers and data scientists, looks at some of the most common (and baffling) ways Spark can explode (e.g., out of memory exceptions, unbalanced partitioning, strange serialization errors, debugging errors inside your own code, etc. ) and then provides a set of remedies for keeping those blow-ups under control. You'll pick up techniques for improving your own logging (and reducing your dependence on Spark's verbose logs); learn how to deal with fuzzy data; discover how to connect and use a debugger in a distributed environment; and gain the ability to know which Spark error messages are actually relevant.Understand why Spark is difficult to debug, the types of Spark failures, and how to recognize themExplore the differences between debugging single node and distributed systemsLearn the best debugging techniques for Spark and a framework for debuggingHolden Karau is an open source developer advocate at Google focusing on Apache Spark, Beam, and related big data tools. She is an in-demand speaker at O'Reilly Media's Strata + Hadoop conferences, a committer on the Apache Spark, SystemML, and Mahout projects, and the author of multiple O'Reilly titles including High Performance Spark and Learning Spark. She holds a bachelor's degree in math and computer science from the University of Waterloo. Show and hide more
  1. Debugging Apache Spark
    • Introduction 00:09:15
    • A Quick Re-cap of Spark's Design 00:09:01
    • Finding Your Logs in Spark (and Finding the Right Ones) 00:17:13
    • The DAG (Not to Be Confused with Dog) and Query Plan 00:12:49
    • Finding the Root Cause of an Error in Spark with Lazy Evaluation 00:19:48
    • A Summary of Common Spark Errors 00:04:49
    • Diagnosing Key-Skew Problems with Spark 00:15:38
    • Out of Memory Exceptions in Spark 00:07:54
    • Reading JVM stack traces for non-JVM developers 00:16:08
    • Serialization Errors in Spark 00:20:02
    • It's Not Always Spark's Fault: Debugging Errors inside of Transformations 00:05:24
    • Adding your own logging and using accumulators 00:02:51
    • Attaching Remote Debuggers to Spark 00:02:18
    • Next Steps: Testing and Monitoring 00:02:58
  2. Show and hide more

    Oreilly - Debugging Apache Spark


 TO MAC USERS: If RAR password doesn't work, use this archive program: 

RAR Expander 0.8.5 Beta 4  and extract password protected files without error.


 TO WIN USERS: If RAR password doesn't work, use this archive program: 

Latest Winrar  and extract password protected files without error.


 Coktum   |  

Information
Members of Guests cannot leave comments.




rss