Oreilly - Strata Data Conference 2017 - New York, New York
by O'Reilly Media Inc. | Released September 2017 | ISBN: 9781491976319
The Strata Data Conference New York 2017 gathered more than 340 of the world's brightest data visionaries, practitioners, and strategists to speak about today's most effective big data technologies, techniques, and business practices. This video compilation gives you a best-in-house seat for each of the conference's 21 keynotes, 22 tutorials, and 189 individual sessions, which covered topics about AI, predictive analytics, security, cloud strategy, data engineering, Hadoop, machine learning, IoT, stream processing, visualization, and more.It includes all of the exclusive programming delivered at the conference's Strata Business Summit (aka, the missing MBA for data-driven business) with 25 sessions specifically tailored for C-suite executives, business leaders, and strategists. You'll hear Microsoft researcher danah boyd's revelations of the unseen ways our data systems are being gamed; author Cathy O'Neil's (Weapons of Math Destruction) exposé of the ways mathematical models shape our future; and Tim O'Reilly's (O'Reilly Media) call-to-action on business's role in choosing an AI-centered future that works best for everyone. You'll get to see Evan Levy (SAS) describe the five essential components of a data strategy, Atul Dalmia (American Express) recount how Amex drove enterprise adoption of big data, and Tobi Bosede (John Hopkins) explain Apache Spark's usefulness in futures trading. And you'll also get privileged access to eleven "Executive Briefings" from top attorneys, like Alysa Hutnik (Kelley Drye & Warren LLP) on the best legal practices for making data work and top execs like Bill Schmarzo (Dell EMC) on how to properly determine the economic value of your business data.Every Findata session, Strata's deep dive into the finance world's most disruptive data technologies is part of this compilation. You'll hear insider reports by quants and strategists such as Jason Morton (Ascendant), Robert Passarella (Protégé Partners), Jike Chong (Tsinghua University), Jessica Stauth (Quantopian), Abraham Thomas (Quandl), Tanvi Singh (Credit Suisse), José Ribau (CIBC), and Leigh Drogen (Estimize) on topics ranging from crowd-sourced investment research and algorithms for modeling real estate offerings to big data techniques that detect financially manipulative practices such as spoofing and layering.In addition, you'll gain access to dozens of case studies detailing the story on Spotify's transition from data centers to the cloud; Salesforce's Einstein AI platform; geospatial big data analysis at Uber; Comcast's use of Apache Avro for end-to-end data governance; Danske Bank's use of AI to fight financial fraud; the Portland Trail Blazers use of Azure machine learning to boost ticket sales; FINRA's implementation of a data lake in the AWS cloud; Cloudera's guide to using Hadoop and machine learning to spot cybersecurity incidents at scale; and a HIPPA-friendly online doctor marketplace and booking tool called Zocdoc.And, like all Strata video compilations, this one is packed with revelations about big data's emerging technologies. Get this compilation and you'll be one of the first to learn about Apache Parquet, TimescaleDB, Stanford University's Weld, Twitter Heron, relational storage, Prophet probabilistic programming, PyTextRank natural language processing, the Elastic Data Platform, and much more. A best-in-house seat for all of Strata NY 2017's keynotes, tutorials, and sessionsTotal access to the exclusive Strata Business Summit and its Executive BriefingsComplete entry to the FinData Day sessions on the most disruptive data technologies in financeDeep dives into data practices at American Express, Dell EMC, Credit Suisse, Danske Bank, and SASVisionary keynotes by Ziya Ma (Intel), Cesar Delgado (Apple Siri), and Manuela Veloso (Carnegie Mellon)Explorations of emerging tech like Apache Parquet, Twitter Heron, and Stanford University's WeldBest practice overviews from Cloudera, MapR, Microsoft, Nvidia, Cisco, Google Cloud, SAS, and AWSExplorations of Apache Parquet, Kylin, Arrow, Apex, Atlas, Avro, Kudu, Kafka, Flink, Beam, and GriffinIntensives on AI, stream processing, cloud strategy, visualization, data engineering, and architectureAll of it available on Safari Show and hide more
- Keynotes
- Journey to consolidation - Mike Olson (Cloudera), Cesar Delgado (Apple) 00:11:09
- White Collar Crime Risk Zones - Sam Lavigne (The New Inquiry) 00:05:00
- A whole new way to think about your next-gen applications (sponsored by MapR Technologies) - Anil Gadre (MapR) 00:10:30
- The age of machine learning - Ben Lorica (O'Reilly Media) 00:09:05
- Teaching databases to learn in the world of AI (sponsored by MemSQL) - Nikita Shamgunov (MemSQL) 00:04:51
- Music, the window into your soul - Christine Hung (Spotify) 00:11:36
- Unleashing intelligence and data analytics at scale (sponsored by Intel) - Ziya Ma (Intel) 00:06:13
- Data science for the most vulnerable at UNICEF Innovation - Manuel García-Herranz (UNICEF Office of Innovation) 00:09:12
- The US EPA: Digital transformation through data science - Robin Thottungal (US Environmental Protection Agency) 00:09:12
- Emotional arithmetic: A deep dive into how machine learning and big data help you understand customers in real time (sponsored by Google) - Chad W. Jennings (Google) 00:45:37
- A tale of two cafeterias: Focus on the line of business - Tanvi Singh (Credit Suisse) 00:08:33
- How the IoT and machine learning keep America truckin' - Mike Olson (Cloudera), Terry Kline (Navistar) 00:09:58
- Will AI help save the snow leopard? (sponsored by Microsoft) - Joseph Sirosh (Microsoft) 00:12:08
- Human-AI interaction: Autonomous service robots - Manuela Veloso (Carnegie Mellon University) 00:14:25
- Edge to enterprise: New challenges and opportunities (sponsored by Cisco) - Raghunath Nambiar (Cisco) 00:04:49
- Your data is being manipulated. - danah boyd (Microsoft Research | Data & Society) 00:13:16
- WTF? What's the future and why it's up to us - Tim O'Reilly (O'Reilly Media) 00:16:11
- Big data & the Cloud
- Cloud data lakes: Analytic data warehouses in the cloud - John Hitchingham (FINRA) 00:41:22
- A deep dive into Apache Kafka core internals - Jun Rao (Confluent) 00:45:20
- Rethinking data marts in the cloud: Common architectural patterns for analytics - Greg Rahn (Cloudera) 00:39:50
- Spotify in the cloud: The next evolution of data at Spotify - Josh Baer (Spotify), Alison Gilles (Spotify) 00:45:56
- Lessons from an AWS migration - Chris Mills (The Meet Group) 00:55:54
- Automating cloud cluster deployment: Beyond the book - Bill Havanki (Cloudera) 00:41:50
- Analytics at Wikipedia - Andrew Otto (Wikimedia Foundation), Fangjin Yang (Imply) 00:39:41
- From notebooks to cloud native: A modern path for data-driven applications - Michael McCune (Red Hat) 00:41:55
- How to successfully run data pipelines in the cloud - Jennifer Wu (Cloudera), Philip Langdale (Cloudera), Kostas Sakellis (Cloudera) 00:35:55
- Deploying deep learning to assist the digital pathologist - Jon Fuller (KNIME), Olivia Klose (Microsoft) 00:46:12
- Artificial Intelligence
- Fighting financial fraud at Danske Bank with artificial intelligence - Nadeem Gulzar (Danske Bank Group), Sune Askjær (Think Big Analytics, a Teradata Company) 00:41:48
- What is deep learning? - Mikio Braun (Zalando SE) 00:27:09
- Practical deep learning for understanding images - Leo Dirac (Amazon Web Services) 00:44:12
- Data Engineering & Architecture
- Apache Spark in the hands of data scientists - Neelesh Srinivas Salian (Stitch Fix) 00:38:36
- State-of-the-art robot predictive maintenance with real-time sensor data - Mateusz Dymczyk (H2O.ai), Mathieu Dumoulin (MapR Technologies) 00:36:25
- Stream all the things! - Dean Wampler (Lightbend) 00:29:58
- Geospatial big data analysis at Uber - Zhenxiao Luo (Uber), Wei Yan (Uber) 00:34:22
- How T-Mobile built a massive-scale network performance management platform on Hadoop - Travis Bakeman (T-Mobile) 00:40:53
- When boring is awesome: Making PostgreSQL scale for time series data - Michael Freedman (TimescaleDB | Princeton) 00:39:41
- Mistakes were made, but not by us: Lessons from a year of supporting Apache Kafka - Dustin Cote (Confluent) 00:32:33
- A brave new world in mutable big data: Relational storage - Todd Lipcon (Cloudera) 00:41:27
- Extending Spark ML: Adding your own tools and algorithms - Holden Karau (IBM), Seth Hendrickson (Cloudera) 00:41:21
- Project Rainier: Saving lives one insight at a time - Marc Carlson (Seattle Children's Research Institute), Sean Taylor (Seattle Children's Research Institute) 00:41:30
- The journey to Einstein: Building a multitenancy AI platform that powers hundreds of thousands of businesses - Simon Chan (Salesforce) 00:38:41
- Why containers and microservices need streaming data - Paul Curtis (MapR Technologies) 00:43:30
- Scaling database and analytic workloads with Apache Kudu - Zbigniew Baranowski (CERN) 00:43:30
- End-to-end data discovery and lineage in a heterogeneous big data environment with Apache Atlas and Avro - Barbara Eckman (Comcast) 00:40:33
- Stream analytics with SQL on Apache Flink - Fabian Hueske (data Artisans) 00:38:54
- Working within the Hadoop ecosystem to build a live-streaming data pipeline - Stephen Devine (Big Fish Games), Kalah Brown (Big Fish Games) 00:30:09
- An open source architecture for the IoT - Dave Shuman (Cloudera), James Kirkland (Red Hat) 00:37:08
- Low-latency streaming: Twitter Heron on Infiniband - Karthik Ramasamy (Streamlio), Supun Kamburugamuve (Indiana University) 00:38:33
- Solving data cleaning and unification using human-guided machine learning - Ihab Ilyas (University of Waterloo | Tamr) 00:40:07
- Using ML to solve failure problems with ML and AI apps in Spark - Adrian Popescu (Unravel Data Systems), Shivnath Babu (Unravel Data Systems) 00:32:54
- PyTextRank: Graph algorithms for enhanced natural language processing - Paco Nathan (O'Reilly Media) 00:41:29
- Deep learning for recommender systems - Mo Patel (Teradata), Junxia Li (Think Big Analytics) 00:36:45
- Using R and Spark to analyze data on Amazon S3 - Edgar Ruiz (RStudio) 00:39:50
- Griffin: Fast-tracking model development in Hadoop - Steven Totman (Cloudera), Faraz Rasheed (TD Bank) 00:41:45
- Tensor abuse in the workplace - Ted Dunning (MapR Technologies) 00:41:46
- Considerations for hardware-accelerated machine learning platforms - Mike Pittaro (Dell EMC) 00:40:11
- Real-time image classification: Using convolutional neural networks on real-time streaming data - Josh Patterson (Skymind), Kirit Basu (StreamSets ) 00:33:43
- How machine learning with open source tools helps everyone build better products - Michelle Casbon (Qordoba) 00:38:32
- Anomaly detection on live data - Arun Kejariwal (MZ), Francois Orsini (MZ), Dhruv Choudhary (MZ) 00:44:04
- Strata Business Summit
- Enterprise digital transformation using big data - Atul Dalmia (American Express) 00:41:57
- Big data analysis of futures trades - Tobi Bosede (Johns Hopkins) 00:38:44
- Spark clinical surveillance: Saving lives and improving patient care - Charles Boicey (Clearsense) 00:41:27
- What I learned from teaching 1,500 analytics students - Jerrard Gaertner (University of Toronto School of Continuing Studies) 00:42:00
- Data science for good: Benefit the world and your business at the same time - Derek Ruths (CAI) 00:36:44
- A tale of two cafeterias: Focus on the line of business - Tanvi Singh (Credit Suisse) 00:08:33
- Learning from customers, keeping humans in the loop - Elsie Kenyon (Nara Logics) 00:31:05
- Putting data to work: How to optimize workforce staffing to improve organization profitability - Francesca Lazzeri (Microsoft), Hong Lu (Microsoft) 00:37:36
- Executive Briefing: Determining the economic value of your data (EvD) - Bill Schmarzo (EMC) 00:41:16
- Executive Briefing: Managing successful data projects—Technology selection and team building - Ted Malaska (Blizzard Entertainment), Jonathan Seidman (Cloudera) 00:36:06
- Executive Briefing: Preparing your infrastructure for AI - Edd Wilder-James (Google) 00:40:53
- Executive Briefing: Legal best practices for making data work - Alysa Z. Hutnik (Kelley Drye & Warren LLP) 00:39:50
- Executive Briefing: Talking to machines—Natural language today - Hilary Mason (Fast Forward Labs) 00:39:48
- Executive Briefing: From data insights to action—Developing a data-driven company culture - Ashish Verma (Deloitte) 00:43:13
- Executive Briefing: Conversational marketing for brands—Why it's better to talk to your customers than monitor them - Andy Mauro (Automat) 00:39:36
- Executive Briefing: Data ecosystem strategy - Jason McIntyre (Accenture), Mark Milazzo (Accenture) 00:45:18
- Executive Briefing: Analytics centers of excellence as a way to accelerate big data adoption by business - Carme Artigas (Synergic Partners) 00:41:26
- WTF? What's the future and why it's up to us - Tim O'Reilly (O'Reilly Media) 00:16:11
- The real project of AI ethics - Joanna Bryson (University of Bath | Princeton Center for Information Technology Policy) 00:13:17
- The EOI framework for big data analytics to drive business impact at scale - Michael Li (LinkedIn), Chi-Yi Kuan (LinkedIn) 00:39:40
- From the weeds to the stars: How and why to think about bigger problems - David Boyle (BBC Worldwide) 00:41:49
- 20 Netflix-style principles and practices to get the most out of your data platform - Kurt Brown (Netflix) 00:37:29
- The pitfalls of running a self-service big data platform - Sander Kieft (Sanoma Media) 00:45:35
- The five dysfunctions of a data engineering team - Jesse Anderson (Big Data Institute) 00:38:56
- How to hire and test for data skills: A one-size-fits-all interview kit - Tanya Cashorali (TCB Analytics) 00:41:29
- Machine Learning & Data Science
- Deep learning in practice - Mikio Braun (Zalando SE) 00:39:11
- Probabilistic programming in finance using Prophet - Justin Bleich (Coatue Management) 00:39:56
- Differentiating by data science - Eric Colson (Stitch Fix) 00:38:18
- A spike in sales is not always good news: On the importance of learning the relationships between time series metrics at scale - Inbal Tadeski (Anodot) 00:24:02
- Building a Rosetta Stone for business data - Matthew Roche (Microsoft), Jennifer Marie Stevens (Microsoft) 00:40:59
- Learning location: Real-time feature extraction for mobile analytics - Sander Pick (Set), Andrew Hill (Set), Carson Farmer (Set) 00:34:53
- Automatic comments moderation with ModBot at the Washington Post - Eui-Hong Han (The Washington Post), Ling Jiang (The Washington Post) 00:39:09
- GPU-accelerating a deep learning anomaly detection platform - Joshua Patterson (NVIDIA), Michael Balint (NVIDIA), Satish Varma Dandu (NVIDIA) 00:38:11
- Challenges in using machine learning to direct healthcare services - Brian Dalessandro (Zocdoc) 00:58:38
- Interpretable AI: Not just for regulators - Patrick Hall (H2O.ai | George Washington University), Sri Satish (H2O.ai) 00:42:03
- Boosting Spark MLlib performance with rich optimization algorithms - Seth Hendrickson (Cloudera), DB Tsai (Netflix) 00:32:00
- When models go rogue: Hard-earned lessons about using machine learning in production - David Talby (Pacific AI) 00:38:03
- Visualization & user experience
- Improve business decision making with the science of human perception - Sebastian Gutierrez (DashingD3js.com) 00:42:24
- Design for nondesigners: Increasing revenue, usability, and utility within data analytics products - Brian O'Neill (Designing for Analytics) 00:39:30
- Text analytics and new visualization techniques - Richard Brath (Uncharted Software), Scott Langevin (Uncharted Software) 00:38:06
- Interactive data exploration and analysis at enterprise scale - Sean Kandel (Trifacta), Kaushal Gandhi (Trifacta) 00:36:20
- Discovering insights in financial data with immersive reality - John Horcher (Virtual Cove) 00:30:58
- Seeing everything so managers can act on anything: The IoT in DHL Supply Chain operations - Javier Esplugas (DHL Supply Chain), Kevin Parent (Conduce) 00:36:01
- Stream processing & analytics
- Exactly once, more than once: Apache Kafka, Heron, and Apache Apex - Dean Wampler (Lightbend), Jun Rao (Confluent), Karthik Ramasamy (Streamlio), Pramod Immaneni (DataTorrent) 00:39:22
- Realizing the promise of portability with Apache Beam - Reuven Lax (Google) 00:41:10
- Foundations of streaming SQL; or, How I learned to love stream and table theory - Tyler Akidau (Google) 00:46:24
- MacroBase: A search engine for fast data streams - Sahaana Suri (Stanford University) 00:33:48
- Streaming visual analytics: What's possible today and what's coming tomorrow - Shant Hovsepian (Arcadia Data) 00:46:47
- Spark & beyond
- HDFS on Kubernetes: Lessons learned - Kimoon Kim (Pepperdata) 00:40:29
- SETL: An efficient and predictable way to do Spark ETL - Thiruvalluvan M G (Aqfer) 00:33:50
- The business case for AI, Spark, and friends - Edd Wilder-James (Google) 00:30:49
- Security
- Machine learning to spot cybersecurity incidents at scale - Eddie Garcia (Cloudera) 00:35:56
- An authenticated journey through big data security at Walmart - Matt Bolte (Walmart), Toni LeTempt (Walmart) 00:40:41
- Confounding factors galore: Using software ecosystem data to risk-rate code - J. C. Herz (Ion Channel) 00:39:06
- Sponsored
- Data science platforms: Your key to actionable analytics (sponsored by DataScience.com) - William Merchan (DataScience.com) 00:28:11
- Building the IoT data lifecycle (sponsored by Cisco) - Han Yang (Cisco Systems) 00:36:52
- Building a real-time feedback loop for education (sponsored by MemSQL) - David Mellor (Curriculum Associates) 00:47:58
- Real-time recommendation engines using SAS technology (sponsored by SAS) - Juthika Khargharia (SAS) 00:38:10
- The essentials for digital growth (sponsored by MapR) - Jack Norris (MapR Technologies) 00:38:09
- Enabling data science self-service with the Elastic Data Platform (sponsored by Dell EMC) - Bala Chandrasekaran (Barclays) 00:39:26
- How the separation of compute and storage impacts your big data analytics way of life (sponsored by Micro Focus Security and Big Data Analytics) - Deepak Majeti (Vertica) 00:40:06
- Architect and operationalize your enterprise data lake (sponsored by Zaloni) - Ben Sharma (Zaloni), Carlos Matos (AIG) 00:42:33
- (Big) data team productivity: A balancing act (sponsored by Dataiku) - Kenneth Sanford (Dataiku) 00:36:15
- Orchestrating your complex data pipeline across your enterprise (sponsored by SAP) - Michelle Mensing (SAP) 00:33:01
- The converging world of big data and the IoT (sponsored by Pentaho) - Chuck Yarbrough (Pentaho) 00:35:38
- Using an AI-driven approach to managing data lakes in the cloud or on-premises (sponsored by Informatica) - Murthy Mathiprakasam (Informatica), Sravan Kasarla (Fidelity Investments) 00:38:35
- Building enterprise OLAP on Hadoop in finance with Apache Kylin (sponsored by Kyligence) - Luke Han (Kyligence) 00:36:30
- Hybrid data lakes: Unlocking the inevitable (sponsored by Cask) - Jonathan Gray (Cask) 00:42:54
- Protect IoT data and monetize it with analytics (sponsored by Micro Focus Security and Big Data Analytics) - Phil Sewell (Micro Focus) 00:42:17
- Smarter business apps with a modern GPU database (sponsored by Kinetica) - Mate' Radalj (Kinetica) 00:35:18
- Data science beyond the sandbox (sponsored by Anaconda) - Peter Wang (Anaconda) 00:40:35
- Big data, location analytics, and geoenrichment to drive better business outcomes (sponsored by Pitney Bowes) - Tim McKenzie (Pitney Bowes) 00:39:16
- AIG: Creating a data-driven customer service organization (sponsored by Talend) - Kevin Stallings (AIG) 00:48:46
- Powering business outcomes with data science in a connected world (sponsored by Hortonworks) - Piet Loubser (Hortonworks) 00:26:37
- How JW Player is powering the online video revolution with data analytics (sponsored by Snowflake Computing) - Rick Okin (JW Player) 00:37:32
- Continuous integration at scale: Streaming 50 billion events per day for real-time feedback with Kafka and Spark (sponsored by Pure Storage) - Ivan Jibaja (Pure Storage) 00:39:59
- Deploying to the edge, bringing AI everywhere (sponsored by Microsoft) - Matt Winkler (Microsoft) 00:38:52
- Extend on-premises Hadoop and Spark deployments across data centers and the cloud, including Microsoft Azure (sponsored by Microsoft and WANdisco) - Jagane Sundar (WANdisco), Pranav Rastogi (Microsoft) 00:32:50
- Adaptive analytics: Transitioning from legacy systems to a modern platform with MicroStrategy and Cloudera (sponsored by MicroStrategy) - Alex Gutow (Cloudera), David Harsh (Microstrategy) 00:44:51
- Key big data architectural considerations for deploying in the cloud and on-premises (sponsored by NetApp) - Karthikeyan Nagalingam (NetApp) 00:40:06
- Deploying an automated data platform, from data ingestion to consumption: A real-world enterprise example (sponsored by Infoworks) - Ramesh Menon (Infoworks) 00:38:21
- A comprehensive, enterprise-grade, open Hadoop solution from Hewlett Packard Enterprise (sponsored by Hewlett Packard Enterprise) - Bob Patterson (Hewlett Packard Enterprise (HPE)) 00:17:27
- A governance checklist for making your big data into trusted data (sponsored by Syncsort) - Keith Kohl (Syncsort) 00:27:07
- Analytics everywhere, from things to cities (sponsored by Cisco) - Raghunath Nambiar (Cisco) 00:04:49
- A whole new way to think about your next-gen applications (sponsored by MapR Technologies) - Anil Gadre (MapR) 00:10:30
- The essentials for digital growth (sponsored by MapR) - Jack Norris (MapR Technologies) 00:38:09
- Emotional arithmetic: A deep dive into how machine learning and big data help you understand customers in real time (sponsored by Google) - Chad W. Jennings (Google), Eric Schmidt (Google) 00:45:37
- Other Data topics
- Pomegranate: Flexible probabilistic modeling for Python - Jacob Schreiber (University of Washington) 00:24:43
- The cognitive design principles of interactive analytics - Mike Driscoll (Metamarkets) 00:36:50
- What can we learn from 750 billion GitHub events and 42 TB of code? - Felipe Hoffa (Google) 00:40:20
- Data futures: Exploring the everyday implications of increasing access to our personal data - Daniel Goddemeyer (OFFC NYC), Dominikus Baur (Freelance) 00:37:36
- Messaging, storage, or both: The real-time story of Pulsar and Apache DistributedLog - Matteo Merli (Streamlio), Sijie Guo (Streamlio) 00:38:10
- NoScope: Querying videos 1,000x faster with deep learning - Daniel Kang (Stanford University) 00:26:48
- Benefits of big data geoenrichment for better business outcomes - Rose Winterton (Pitney Bowes) 00:23:51
- IIoT data fusion: Bridging the gap from data to value - Alexandra Gunderson (Arundo Analytics) 00:41:42
- How to build a digital twin - Lloyd Palum (Vnomics) 00:58:55
- Show me my data, and I’ll tell you who I am. - Majken Sander (TimeXtender) 00:35:48
- Business operations in Expedia through real time metric trends, predictions, correlations and anomaly detection - Brandon O'Brien (Expedia, Inc) 00:26:16
- GDPR: Getting your data ready for heavy, new EU privacy regulations - Steven Ross (Cloudera), Mark Donsky (Cloudera) 00:36:39
- Creating a DevOps practice for analytics - Bob Eilbacher (Caserta) 00:42:11
- The columnar roadmap: Apache Parquet and Apache Arrow - Julien Le Dem (Apache Parquet) 00:42:22
- How the Portland Trail Blazers increase conversion rates with Azure Machine Learning - Audrey Spencer-Alvarado (Portland Trail Blazers) 00:25:54
- Learning meaning from web-scale big data - Gerard de Melo (Rutgers University) 00:27:48
- Topic modeling openNASA data - Noemi Derzsy (Rensselaer Polytechnic Institute) 00:39:53
- Implementing Hadoop to save lives - Tony McAllister (Be the Match (National Marrow Donor Program)) 00:36:53
- How Pinterest uses machine learning to achieve ~200M monthly active users - Yunsong Guo (Pinterest) 00:28:40
- Data programming: Creating large training sets quickly - Alex Ratner (Stanford University) 00:31:38
- Machine learning for healthcare data - Katherine Heller (Duke University) 00:29:12
- How to leverage the cloud for business solutions - Jim Scott (MapR Technologies) 00:26:51
- Filling in missing data with generalized low-rank models - Madeleine Udell (Cornell University) 00:30:03
- Efficient neural networks for perception for autonomous vehicles - Bichen Wu (UC Berkeley) 00:28:18
- The context of contacts: Seeking root causes of racial disparity in Texas traffic-summons fines - Nick Selby (CJX, Inc. | Midlothian Police Department) 00:34:27
- Using data to play (and forecast) the future - Parisa Foster (Play The Future) 00:24:22
- Failures of gradient-based deep learning - Shaked Shammah (Hebrew University) 00:26:44
- Findata welcome - Alistair Croll (Solve For Interesting), Robert Passarella (Protégé Partners) 00:02:08
- What is AI? - Melanie Warrick (Google) 00:28:17
- Executive Briefing: Machine learning—Why you need it, why it's hard, and what to do about it - Mike Olson (Cloudera) 00:40:07
- Deep learning for understanding language and holding conversations - Alan Nichol (Rasa) 00:21:02
- Data science is for everyone: Making data science work in low-tech environments - Derek Ruths (CAI) 00:25:27
- Implementing a successful real-time project - Javier Esplugas (DHL Supply Chain), Kevin Parent (Conduce) 00:27:44
- Empowering quants to trade faster: From Excel files to data packages - Aneesh Karve (Quilt) 00:30:32
- Creating public value through data collaboratives - Natalia Adler (UNICEF HQ) 00:26:12
- Increasing velocity, accuracy and learning at scale - Sarah Manning (Etsy) 00:24:03
- The Solutions Showcase Theater
- Driven Enterprise; turning business on its head 00:10:06
- Virtualizing Big Data – Real World Customer Architectures 00:15:08
- Big Data Anti-Patterns 00:07:53
- Interactive BI on Hadoop 00:08:36
- Seamless integration of data preparation, exploration, modeling and deployment 00:15:53
- 5 Ways Object Storage Helps Machines Learn More 00:10:17
- Deep Learning on big images - how to revolutionize an industry 00:09:13
- Data Driven Banking 00:13:47
- New Era of AI-enabled Real-time Big Data Analytics 00:12:24
- An Open Source Architecture for IoT 00:11:35
- Deep learning Solutions Powered by Intel and BigDL 00:10:05
- Operations for Analytics and the Cloud: What Your Administrators Won't Tell You 00:07:05
- Composable infrastructure lets Clearsense move from public to private cloud 00:10:14
- The Future of Interactive Exploration At Scale 00:09:29
- Stream analytics on GCP: How Traveloka’s multi-cloud, fully-managed data stack keeps the focus on revolutionizing human mobility 00:10:41
- An ultra-scalable Full SQL Full ACID database with analytical capabilities 00:10:09
- Dstillery: 150 Billion Transactions per Day on Lenovo Servers 00:10:14
- AI to Mine Enterprise Voice Data 00:11:09
- Business context: the linchpin to any big data solution 00:07:57
- Act Locally, Learn Globally: A NEW WAY TO Harness Data from the Edge to the Cloud and Back 00:11:48
- Today's Best Architecture for Fast Data 00:09:38
- From Curing Cancer to Game of Drones – How Search & Big Data are Solving Global Issues 00:07:54
- It’s Like Amazon. But for Data 00:11:49
- Case Study: Scaling Data Science with an enterprise ready platform 00:09:56
- Real-Time Machine Learning with Tensor Flow, Kafka and MemSQL 00:11:45
- Inside the Cisco IT Hadoop Journey: Secure, Scale, and Derive Value with Big Data 00:10:58
- Doing BI on Azure HDInsight 00:10:34
- The New Enterprise Data Brokers 00:11:05
- Better data lineage for the financial industry with graph databases 00:07:55
- Empowering Quants to Trade Faster: from Excel Files to Data Packages 00:10:32
- The uncomfortable truth about deploying and scaling ML in production 00:07:45
- How to Do Data Engineering in the Cloud 00:16:57
- The Benefits and Imperative of using Trusted Data Objects when ingesting data into EDH 00:10:20
- Smart data discovery helps financial services firm streamline data relationships for greater value 00:08:51
- Making Self-Service BI a Reality with Intelligent Data Discovery 00:10:09
- Objective and Collaborative Data Science with CrewSpark 00:08:57
- Quantify your intuition with cognitive decision-making 00:08:44
- Using Apache Arrow, Parquet, And Calcite To Bring Self-Service To Your Entire Data Analytics Stack 00:03:23
- A Comprehensive Enterprise-grade Hadoop Solution from Hewlett Packard Enterprise 00:05:51
- Ask me anything sessions
- Ask me anything: Running data science in the enterprise and architecting data platforms - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science), Heather Nelson (Silicon Valley Data Science) 00:40:33
- Tutorials
- Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 1 00:52:33
- Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 2 00:38:43
- Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 3 00:55:49
- Architecting a data platform - John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science) - Part 4 00:29:44
- Managing data science in the enterprise - John Akred (Silicon Valley Data Science), Heather Nelson (Silicon Valley Data Science) - Part 1 00:56:03
- Managing data science in the enterprise - John Akred (Silicon Valley Data Science), Heather Nelson (Silicon Valley Data Science) - Part 2 00:59:56
- Managing data science in the enterprise - John Akred (Silicon Valley Data Science), Heather Nelson (Silicon Valley Data Science) - Part 3 1:03:06
- Building real-time data pipelines with Apache Kafka - Ian Wrigley (StreamSets) - Part 1 00:42:03
- Building real-time data pipelines with Apache Kafka - Ian Wrigley (StreamSets) - Part 2 00:47:22
- Building real-time data pipelines with Apache Kafka - Ian Wrigley (StreamSets) - Part 3 00:39:44
- Building real-time data pipelines with Apache Kafka - Ian Wrigley (StreamSets) - Part 4 00:40:04
- Deep learning for recommender systems - Mo Patel (Teradata), Junxia Li (Think Big Analytics) - Part 1 00:40:15
- Deep learning for recommender systems - Mo Patel (Teradata), Junxia Li (Think Big Analytics) - Part 2 00:43:50
- Deep learning for recommender systems - Mo Patel (Teradata), Junxia Li (Think Big Analytics) - Part 3 00:59:44
- Natural language understanding at scale with spaCy, Spark ML, and TensorFlow - David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed) - Part 1 00:46:25
- Natural language understanding at scale with spaCy, Spark ML, and TensorFlow - David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed) - Part 2 00:40:57
- Natural language understanding at scale with spaCy, Spark ML, and TensorFlow - David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed) - Part 3 00:31:02
- Natural language understanding at scale with spaCy, Spark ML, and TensorFlow - David Talby (Pacific AI), Claudiu Branzan (G2 Web Services), Alexander Thomas (Indeed) - Part 4 00:46:38
- Machine learning in R - Jared Lander (Lander Analytics) - Part 1 00:50:30
- Machine learning in R - Jared Lander (Lander Analytics) - Part 2 00:38:44
- Machine learning in R - Jared Lander (Lander Analytics) - Part 3 00:48:06
- Machine learning in R - Jared Lander (Lander Analytics) - Part 4 00:34:22
- A practitioner’s guide to Hadoop security for the hybrid cloud - Mark Donsky (Cloudera), Manish Ahluwalia (Nerdwallet), Andre Araujo (Cloudera), Syed Rafice (Cloudera) - Part 1 00:57:14
- A practitioner’s guide to Hadoop security for the hybrid cloud - Mark Donsky (Cloudera), Manish Ahluwalia (Nerdwallet), Andre Araujo (Cloudera), Syed Rafice (Cloudera) - Part 2 00:51:16
- A practitioner’s guide to Hadoop security for the hybrid cloud - Mark Donsky (Cloudera), Manish Ahluwalia (Nerdwallet), Andre Araujo (Cloudera), Syed Rafice (Cloudera) - Part 3 00:34:30
- A practitioner’s guide to Hadoop security for the hybrid cloud - Mark Donsky (Cloudera), Manish Ahluwalia (Nerdwallet), Andre Araujo (Cloudera), Syed Rafice (Cloudera) - Part 4 00:43:44
- Getting started with TensorFlow - Yufeng Guo (Google), Amy Unruh (Google) - Part 1 00:42:30
- Getting started with TensorFlow - Yufeng Guo (Google), Amy Unruh (Google) - Part 2 00:39:47
- Getting started with TensorFlow - Yufeng Guo (Google), Amy Unruh (Google) - Part 3 00:27:46
- Getting started with TensorFlow - Yufeng Guo (Google), Amy Unruh (Google) - Part 4 00:49:49
- A deep dive into running data engineering workloads in AWS - Jennifer Wu (Cloudera), Fahd Siddiqui (Cloudera), Paul George (Cloudera), Eugene Fratkin (Cloudera) - Part 1 00:38:05
- A deep dive into running data engineering workloads in AWS - Jennifer Wu (Cloudera), Fahd Siddiqui (Cloudera), Paul George (Cloudera), Eugene Fratkin (Cloudera) - Part 2 00:45:43
- A deep dive into running data engineering workloads in AWS - Jennifer Wu (Cloudera), Fahd Siddiqui (Cloudera), Paul George (Cloudera), Eugene Fratkin (Cloudera) - Part 3 00:33:08
- A deep dive into running data engineering workloads in AWS - Jennifer Wu (Cloudera), Fahd Siddiqui (Cloudera), Paul George (Cloudera), Eugene Fratkin (Cloudera) - Part 4 00:31:18
- A deep dive into deep learning with Keras - Julia Lintern (Metis) - Part 1 00:42:19
- A deep dive into deep learning with Keras - Julia Lintern (Metis) - Part 2 00:49:51
- Building big data applications on Azure - Pranav Rastogi (Microsoft) - Part 1 00:45:12
- Building big data applications on Azure - Pranav Rastogi (Microsoft) - Part 2 00:45:20
- Building big data applications on Azure - Pranav Rastogi (Microsoft) - Part 3 00:34:11
- Building big data applications on Azure - Pranav Rastogi (Microsoft) - Part 4 00:40:49
- Unraveling data with Spark using deep learning and other algorithms from machine learning, Part 1 00:38:21
- Unraveling data with Spark using deep learning and other algorithms from machine learning, Part 2 00:35:31
- Unraveling data with Spark using deep learning and other algorithms from machine learning, Part 3 00:34:24
- Unraveling data with Spark using deep learning and other algorithms from machine learning, Part 4 00:39:13
- A deep dive into deep learning with Keras - julia lintern (Metis) - Part 1 00:42:19
- A deep dive into deep learning with Keras - julia lintern (Metis) - Part 2 00:49:51
- A practitioner’s guide to Hadoop security for the hybrid cloud - Mark Donsky (Cloudera), Manish Ahluwalia (Nerdwallet), Andre Araujo (Cloudera), Syed Rafice (Cloudera) - Part 1 00:57:14
- A practitioner’s guide to Hadoop security for the hybrid cloud - Mark Donsky (Cloudera), Manish Ahluwalia (Nerdwallet), Andre Araujo (Cloudera), Syed Rafice (Cloudera) - Part 2 00:51:16
- A practitioner’s guide to Hadoop security for the hybrid cloud - Mark Donsky (Cloudera), Manish Ahluwalia (Nerdwallet), Andre Araujo (Cloudera), Syed Rafice (Cloudera) - Part 3 00:34:30
- A practitioner’s guide to Hadoop security for the hybrid cloud - Mark Donsky (Cloudera), Manish Ahluwalia (Nerdwallet), Andre Araujo (Cloudera), Syed Rafice (Cloudera) - Part 4 00:43:44
- Spark Camp
- Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML, Part 1 00:23:51
- Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML, Part 2 00:41:17
- Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML, Part 3 00:44:48
- Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML, Part 4 00:32:10
- Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML, Part 5 00:53:33
- Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML, Part 6 00:52:14
- Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML, Part 7 00:35:32
- Findata Day
- How machine learning is used in fintech - Bradford Cross (DCVC) 00:32:55
- Rumpelstiltskin and the financial markets - Robert Passarella (Protégé Partners) 00:28:06
- Detecting a spoofing overlay - Jason Morton (Ascendant) 00:23:07
- Deploying AI in mobile-first consumer-facing financial products: A tale of two cycles - Jike Chong (Tsinghua University | Acorns) 00:31:36
- Crowdsourced alpha: The future of investment research - Leigh Drogen (Estimize) 00:28:06
- The limits of human cognition - Bob Levy (Virtual Cove, Inc.) 00:19:31
- Oh buoy! How data science improves shipping intelligence for hedge funds - Abraham Thomas (Quandl) 00:27:04
- Mapping cities through data to model risk in retail and real estate - Vincent-Charles Hodder (Local Logic) 00:25:37
- From segmentation to personalization for compliance risk monitoring: A segment of one - Tanvi Singh (Credit Suisse) 00:31:45
- Delivering alpha: Artificial intelligence in capital markets investing - Michael Beal (Data Capital Management) 00:34:32
Show and hide more 9781491976326.strata.data.conference.OR.part01.rar
9781491976326.strata.data.conference.OR.part02.rar
9781491976326.strata.data.conference.OR.part03.rar
9781491976326.strata.data.conference.OR.part04.rar
9781491976326.strata.data.conference.OR.part05.rar
9781491976326.strata.data.conference.OR.part06.rar
9781491976326.strata.data.conference.OR.part07.rar
9781491976326.strata.data.conference.OR.part08.rar
9781491976326.strata.data.conference.OR.part09.rar
9781491976326.strata.data.conference.OR.part10.rar
9781491976326.strata.data.conference.OR.part11.rar
9781491976326.strata.data.conference.OR.part12.rar
9781491976326.strata.data.conference.OR.part13.rar
9781491976326.strata.data.conference.OR.part14.rar
9781491976326.strata.data.conference.OR.part15.rar
9781491976326.strata.data.conference.OR.part16.rar
9781491976326.strata.data.conference.OR.part17.rar
9781491976326.strata.data.conference.OR.part18.rar
9781491976326.strata.data.conference.OR.part19.rar
9781491976326.strata.data.conference.OR.part20.rar
9781491976326.strata.data.conference.OR.part21.rar
9781491976326.strata.data.conference.OR.part22.rar
9781491976326.strata.data.conference.OR.part23.rar
9781491976326.strata.data.conference.OR.part24.rar
9781491976326.strata.data.conference.OR.part25.rar
9781491976326.strata.data.conference.OR.part26.rar
9781491976326.strata.data.conference.OR.part27.rar
9781491976326.strata.data.conference.OR.part28.rar
9781491976326.strata.data.conference.OR.part29.rar
9781491976326.strata.data.conference.OR.part30.rar
9781491976326.strata.data.conference.OR.part31.rar
9781491976326.strata.data.conference.OR.part32.rar
9781491976326.strata.data.conference.OR.part33.rar
9781491976326.strata.data.conference.OR.part34.rar
9781491976326.strata.data.conference.OR.part35.rar
9781491976326.strata.data.conference.OR.part36.rar
9781491976326.strata.data.conference.OR.part37.rar
9781491976326.strata.data.conference.OR.part38.rar
9781491976326.strata.data.conference.OR.part39.rar
9781491976326.strata.data.conference.OR.part40.rar
9781491976326.strata.data.conference.OR.part41.rar
9781491976326.strata.data.conference.OR.part42.rar
9781491976326.strata.data.conference.OR.part43.rar
9781491976326.strata.data.conference.OR.part44.rar
9781491976326.strata.data.conference.OR.part45.rar
9781491976326.strata.data.conference.OR.part46.rar
9781491976326.strata.data.conference.OR.part47.rar
9781491976326.strata.data.conference.OR.part48.rar
9781491976326.strata.data.conference.OR.part49.rar