Oreilly - O'Reilly Strata Data Conference 2019 - New York, New York

Oreilly - O'Reilly Strata Data Conference 2019 - New York, New York

by O'Reilly Media, Inc. | Released September 2019 | ISBN: 9781492050674

https://www.oreilly.com/library/view/oreilly-strata-data/9781492050681/

The 2019 Strata Data Conference NYC, the biggest Big Data conference in the world, was a massive success. Packed with thousands of attendees, Strata gathered the world's top data practitioners to provide expert guidance on the tools and technologies you need to make your data strategies and projects work today. This video compilation holds the best of Strata NYC 2019's keynotes, tutorials, and technical sessions. Looking for a head start on the data techniques and technologies you need to succeed? This compilation points the way forward by offering you hours of material to study and absorb at your own rate.The compilation includes such gems as Data Engineering and Architecture sessions, where you'll learn how to select the right type of data infrastructure and architecture to streamline your workflows, reduce costs, and scale your data analysis; Data Science, Machine Learning (ML) and AI sessions, where you'll learn how to use text mining, real-time analytics, large-scale anomaly detection and other techniques to discover the hidden insights in your data; and all of the best talks from the Strata Business Summit, where you'll receive an insiders-only look at the processes and technologies some of the world's most successful companies used to develop their own data strategies.Highlights include:A front row seat at 2019's Strata Data Conference NYC best keynotes, tutorials, and technical sessions—contains hundreds of hours of material to study and absorb at your own pace.Keynote speeches from Big Data's most inspiring business visionaries, such as Sara Menker (CEO, Gro Intelligence), Cassie Kozyrkov (Chief Decision Scientist, Google Cloud), Swatee Singh (VP Big Data/ML, American Express), and Robert D. Thomas (GM, IBM Data and AI).Deep dive tutorials including Jules Damji's (Databricks) sold out session on managing the complete ML lifecycle with MLflow; Karthik Ramasamy's (Streamlio) review of serverless streaming architectures and algorithms for the enterprise; and Mark Donsky (Okera) on how to secure your data lakes to meet the rigors of CCPA privacy regulations.Data Engineering and Architecture sessions, including Navinder Pal Singh Brar (Walmart Labs) on building multitenant data processing and model inferencing platforms with Kafka Streams; Paige Roberts (Vertica) on the whys and hows of putting large stateful applications into containers and Kubernetes; Tomer Levi (Fundbox) on using AWS Step Functions, Docker containers, and ECS Fargate to build serverless data workflow platforms; and Tomer Shiran (Dremio) on how to build best-in-class data lakes on AWS and Azure.Data Science, ML, & AI sessions, including John Allen on how Deutsche Bank uses AI and ML to drive revenues; Anirudh Koul (Microsoft) on how to bring deep learning to smart-phones; Moty Fania's description of Intel's AI driven sales cycle support platform; and Stavros Kontopoulos's (Lightbend) review of the best practices for deploying online ML-based streaming applications.Unrestricted access to all of the exclusive Business Summit's Executive Briefings, technical sessions, and tutorials where data experts like Ross Schalmo (GE Aviation), Andrew Reiskind (Mastercard), Alex Beutel (Google Brain), Susan Israel (Loeb & Loeb), and others reveal the data strategies deployed by the world's most successful companies.FinData Day - Sessions detailing how Wells Fargo ECS, Capital One, TBC Bank, Tokyo Century USA, and other financial industry leaders use data to analyze risk, detect fraud, predict payments and improve customer experience.Dozens of sessions on Data Law and Ethics, Security and Privacy, Automation in Data Science and Data, Business Analytics and Visualization, Streaming and IoT, and Data Culture and Organization. Show and hide more

Keynotes
- Highlights from the Keynotes at Strata Conference, New York 2019 00:22:43
- Recent trends in data and machine learning technologies - Ben Lorica (O'Reilly Media) 00:10:52
- Everything is connected and the clock is ticking: AI and big ag data for food security - Sara Menker & Nemo Semret (Gro Intelligence) 00:14:32
- The future of Google Cloud data processing (sponsored by Google Cloud) - James Malone (Google) 00:10:07
- AI isn't magic. It’s computer science. - Robert Thomas (IBM), Tim O'Reilly (O'Reilly Media) 00:20:40
- Unleash the power of data at scale (sponsored by Intel) - Jeremy Rader (Intel) 00:05:14
- How disruptive tech is reshaping the financial services industry - Swatee Singh (American Express) 00:11:58
- Cisco Data Intelligence Platform (sponsored by Cisco) - Siva Sivakumar (Cisco) 00:05:15
- Interactive sports analytics - Patrick Lucey (Stats Perform) 00:12:57
- Staying safe in the AI era - Cassie Kozyrkov (Google) 00:20:49
- Unlocking the value of your data (sponsored by IBM) - Daniel Hernandez (IBM) 00:09:17
- Delivering the enterprise data cloud - Arun Murthy (Cloudera ) 00:09:53
- Postrevolutionary big data: Promoting the general welfare (sponsored by Io-Tahoe) - Barbara Eckman (Comcast) 00:05:54
- RL in real life: Bringing reinforcement learning to the enterprise (sponsored by Microsoft) - Edward Jezierski (Microsoft) 00:05:35
- Strata Data Awards 00:01:54
- Say what? The ethical challenges of designing for humanlike interaction - Jonathan Foster (Microsoft) 00:15:56
- Data Science Pioneers: Conquering the next frontier, a documentary investigating the future of data science (sponsored by Dataiku) - Jed Dougherty (Dataiku) 00:03:47
- Data sonification: Making music from the yield curve - Alan Smith (Financial Times) 00:20:34
Sponsored
- DevOps in the cloud: Deploy, monitor, manage and automate (sponsored by Impetus) - Amit Assudani (Impetus) 00:39:26
- Semantics and graph data models in the enterprise data fabric (sponsored by Cambridge Semantics) - Barbara Petrocelli (Cambridge Semantics) 00:37:25
- How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE (BlueData)) - Anant Chintamaneni (HPE (BlueData)), Matt Maccaux (HPE (BlueData)) 00:41:14
- How Nuveen rapidly integrated ESG data to advance its platform value (sponsored by Zaloni) - Ben Sharma (Zaloni), Santanu Sengupta (Nuveen) 00:34:22
- The future? Data, AI, and multicloud: It’s time to modernize (sponsored by IBM) - Madhu Kochar (IBM) 00:37:44
- Navigating the Transition to a Data First Enterprise: an Intel perspective (sponsored by Intel) - Jeremy Rader (Intel) 00:39:14
- AI/ML on Oracle Cloud with Kinetica and H2O.ai (sponsored by Oracle Cloud Infrastructure) - Ben Lackey (Oracle) 00:36:41
- Next-generation serverless data architecture for insights at the speed of thought (sponsored by Actian) - Paul Wolmering (Actian Corporation) 00:27:03
- Deliver personalized experiences and content like Xbox with Cognitive Services Personalizer (sponsored by Microsoft) - Edward Jezierski (Microsoft), Jackie Nichols (Microsoft) 00:41:14
- Take the bias out of big data insights with augmented analytics (sponsored by Kyligence) - Dong Li (Kyligence), Hongbin Ma (Kyligence) 00:43:23
- Mastercard and Pitney Bowes: Creating a data-driven business (sponsored by Pitney Bowes) - Olga Lagunova (Pitney Bowes), John Derrico (Mastercard) 00:37:53
- See what others can’t with spatial analysis and data science (sponsored by Esri) - Alberto Nieto (Esri), Shannon Kalisky (Esri) 00:41:51
- 10 things to know about running and migrating Hadoop to GCP (sponsored by Google Cloud) - Blake DuBois (Google Cloud) 00:38:01
- Solve tomorrow’s business challenges with a modern data warehouse (sponsored by Matillion) - Daniel D'Orazio (Matillion) 00:38:47
- The key to climbing the AI ladder (sponsored by IBM) - Daniel Hernandez (IBM) 00:34:25
- Powering the future with data intelligence (sponsored by Collibra) - Jim Cushman (Collibra), Piyush Jain (Progressive) 00:41:47
- Migrating Hadoop analytics to Spark in the cloud without disruption (sponsored by WANdisco) - Paul Scott-Murphy (WANdisco) 00:36:24
- Architecting a data analytics service both in the public cloud and in the on-premise private cloud: ETL, BI, and machine learning (sponsored by SK Holdings) - Jungwook Seo (SK Holdings) 00:36:34
- The end of applications: How data collaboration is changing everything (sponsored by Cinchy) - Dan DeMers (Cinchy) 00:42:29
- Running AI workloads in containers (sponsored by BMC Software) - Darren Chinen (Malwarebytes) 00:31:56
- Bringing together machine and human intelligence in business applications at enterprise scale (sponsored by SAP) - Kevin Poskitt (SAP), Andreas Wesselmann (SAP) 00:39:18
- Mass migration: Tales of moving on-premises Hadoop to Google Cloud (sponsored by Google Cloud) - James Malone (Google) 00:39:45
- Clean the swamp: Gain greater visibility, speed, and governance with data ops (sponsored by Hitachi Vantara) - Chuck Yarbrough (Hitachi Vantara) 00:32:25
- Building a fast, scalable, efficient operational analytics and reporting application using MemSQL, Docker, Airflow, and Prometheus (sponsored by MemSQL) - Praveen Chitrada (Akamai Technologies) 00:40:19
- Data science isn't just another job (sponsored by Anaconda) - Peter Wang (Anaconda) 00:36:10
- The ugly truth about making analytics actionable (sponsored by SAS) - Diana Shaw (SAS) 00:38:02
- Organizing the chaos of healthcare with smart data discovery (sponsored by Io-Tahoe) - Charles Boicey (Clearsense) 00:25:35
- The future of Hadoop in an era of exponentially growing data (sponsored by SQream) - David Leichner (SQream) 00:34:50
Expo Hall
- Feature engineering with Spark NLP to accelerate clinical trial recruitment - Saif Addin Ellafi (John Snow Labs), Scott Hoch (BlackBox Engineering) 00:32:19
- Mind the semantic gap: How "talking semantics" can help you perform better data science - Panos Alexopoulos (Textkernel) 00:36:39
- Toward more fine-grained sentiment and emotion analysis of text - Gerard de Melo (Rutgers University) 00:43:09
- Search logs + machine learning = autotagged inventory - John Berryman (Eventbrite) 00:37:40
- ML is not enough: Decision automation in the real world - Brian Keng (Rubikloud) 00:36:21
- Why AI fails: Overcoming AI challenges (sponsored by IBM) - Brittany Bogle (IBM) 00:28:42
- Handtrack.js: Building gesture-based interactions in the browser using TensorFlow - Victor Dibia (Cloudera Fast Forward Labs) 00:30:09
- Machine learning for streaming data: Practical insights - Heitor Murilo Gomes (Télécom ParisTech), Albert Bifet (Télécom ParisTech) 00:38:27
Data Science, Machine Learning, & AI
- Scalable anomaly detection with Spark and SOS - Jeroen Janssens (Data Science Workshops) 00:41:53
- Scaling Apache Spark at Facebook - Sameer Agarwal (Facebook), Ankit Agarwal (Facebook Inc.) 00:39:03
- Deep learning technologies for giant hogweed eradication - Naoto Umemori (NTT DATA), Masaru Dobashi (NTT DATA) 00:36:32
- Getting to know the elephant: Real-time debugging and visualization for deep learning - Shital Shah (Microsoft Research) 00:32:15
- Working with time series: Denoising and imputation frameworks to improve data density - Anjali Samani (CircleUp) 00:39:54
- Harnessing graph-native algorithms to enhance machine learning: A primer - Brandy Freitas (Pitney Bowes) 00:40:23
- Practical feature engineering - Ted Dunning (MapR) 00:38:49
- Deep learning on Apache Spark at CERN’s Large Hadron Collider with Analytics Zoo - Sajan Govindan (Intel) 00:35:05
- Real-time anomaly detection on observability data using neural networks - Keshav Peswani (Expedia Group), Ashish Aggarwal (Expedia Group) 00:43:10
- Soss: Lightweight probabilistic programming in Julia - Chad Scherrer (Metis) 00:26:09
- Data science and the business of Major League Baseball - Aaron Owen (Major League Baseball), Matthew Horton (Major League Baseball), Josh Hamilton (Major League Baseball) 00:41:20
- Predicting Criteo’s internet traffic load using Bayesian structural time series models - Hamlet Jesse Medina Ruiz (Criteo) 00:41:48
- Handling data gaps in time series using imputation - Alfred Whitehead (Klick), Clare Jeon (Klick) 00:32:49
- Causal inference 101: Answering the crucial "why" in your analysis - Subhasish Misra (Walmart Labs) 00:36:45
- Data need not be a moat: Mixed formal learning enables zero- and low-shot learning - Sandra Carrico (GLYNT) 00:32:06
- Learning asset naming patterns to find risky unmanaged devices - Ryan Foltz (Exabeam) 00:29:18
- Building a machine learning framework to measure TV advertising attribution - Fei Wang (CarGurus) 00:28:56
- Lightning-fast time series modeling and prediction: (S)ARIMA on steroids - Meir Toledano (Anodot) 00:41:38
- From whiteboard to production: A demand forecasting system for an online grocery shop - Robert Pesch (inovex), Robin Senge (inovex) 00:42:35
- Spark on Kubernetes for data science - Jordan Volz (Dataiku) 00:59:40
- When Holt-Winters is better than machine learning - Anais Dotis (InfluxData) 00:26:38
- Learning with limited labeled data - Shioulin Sam (Cloudera Fast Forward Labs) 00:37:03
- A practical guide to algorithmic bias and explainability in machine learning - Alejandro Saucedo (The Institute for Ethical AI & Machine Learning) 00:40:48
- Deploying end-to-end deep learning pipelines with ONNX - Nick Pentreath (IBM) 00:40:31
- Improving OCR quality of documents using generative adversarial networks - Nagendra Shishodia (EXL), Chaithanya Manda (EXL), Solmaz Torabi (EXL) 00:39:13
- Deep learning on mobile - Meher Kasam (Square), Anirudh Koul (Microsoft) 00:39:07
- An introduction to machine learning on graphs - David Mack (Octavian) 00:36:37
- How machine learning meets optimization - Jari Koister (FICO) 00:41:13
- Data science versus engineering: Does it really have to be this way? - Ann Spencer (Domino), Paco Nathan (Derwen), Amy Heineike (Primer), Chris Wiggins (NYT | Columbia) 00:40:19
Data Engineering and Architecture
- The why and how of data lineage - Neelesh Salian (Stitch Fix) 00:29:55
- The hitchhiker’s guide to the cloud: Architecting for the cloud through customer stories - Sushant Rao (Cloudera) 00:39:17
- Sharing is caring: Using Egeria to establish true enterprise metadata governance - Wim Stoop (Cloudera), Srikanth Venkat (Cloudera) 00:39:10
- Problems taking AI to production and how to fix them - Jim Scott (NVIDIA) 00:38:35
- Enabling big data and AI workloads on the object store at DBS Bank - Vitaliy Baklikov (DBS Bank), Dipti Borkar (Alluxio ) 00:37:00
- Your cloud, your ML, but more and more scale? How SurveyMonkey did it - Jing Huang (SurveyMonkey), Jessica Mong (SurveyMonkey) 00:40:26
- Turning big data into knowledge: Managing metadata and data relationships at Uber's scale - Kaan Onuk (Uber), Luyao Li (Uber), Atul Gupte (Uber) 00:34:45
- The evolution of metadata: LinkedIn’s story - Shirshanka Das (LinkedIn), Mars Lan (LinkedIn) 00:41:25
- How to performance-tune Spark applications in large clusters - Bo Yang (Uber), Omkar Joshi (Uber) 00:40:30
- Apache Hadoop 3.x state of the union and upgrade guidance - Wangda Tan (Cloudera), Wei-Chiu Chuang (Cloudera) 00:38:13
- Time travel for data pipelines: Solving the mystery of what changed - Sunil Goplani (Intuit), Sandeep Uttamchandani (Intuit), Shradha Ambekar (Intuit) 00:37:59
- From raw data to informed intelligence: Democratizing data science and ML at Uber - Atul Gupte (Uber) 00:37:30
- Now you see me; now you compute: Building event-driven architectures with Apache Kafka - Michael Noll (Confluent) 00:40:03
- Downscaling: The Achilles heel of autoscaling Spark clusters - Prakhar Jain (Qubole), Sourabh Goyal (Qubole) 00:35:09
- Improving Spark by taking advantage of disaggregated architecture - Chenzhao Guo (Intel) 00:22:56
- Securing your cloud data lake with a "defense in depth" approach - Tomer Shiran (Dremio), Jacques Nadeau (Dremio) 00:41:12
- Bridging the gap between big data computing and high-performance computing - Supun Kamburugamuve (Indiana University) 00:28:31
- Lessons learned from scaling the tech stack of a modern analytics platform - Scott Castle (Sisense) 00:37:26
- Performant time series data management and analytics with PostgreSQL - Michael Freedman (TimescaleDB | Princeton University) 00:39:27
- Protecting the healthcare enterprise from PHI breaches using streaming and NLP - Jeff Zemerick (Mountain Fog) 00:37:29
- A productive data science platform: Beyond a hosted-notebooks solution at LinkedIn - Swasti Kakker (LinkedIn), Manu Ram Pandit (LinkedIn), Vidya Ravivarma (LinkedIn) 00:44:43
- Scaling data engineers - Evgeny Vinogradov (Yandex.Money) 00:40:56
- Your easy move to serverless computing and radically simplified data processing - Gil Vernik (IBM) 00:35:32
- The case for a common metadata layer for machine learning platforms - Max Neunhöffer (ArangoDB), Joerg Schad (Suki) 00:28:24
- Kubernetes for stateful MPP systems - Paige Roberts (Vertica), Deepak Majeti (Vertica) 00:42:35
- Fast data with the KISSS stack - Bas Geerdink (ING) 00:39:33
- Managing your Kafka in an explosive growth environment - Alon Gavra (AppsFlyer) 00:42:20
- Building a multitenant data processing and model inferencing platform with Kafka Streams - Navinder Pal Singh Brar (Walmart Labs) 00:40:46
- Finding your needle in a haystack - Naghman Waheed (Bayer Crop Science), John Cooper (Bayer) 00:39:12
- Orchestrating data workflows using a fully serverless architecture - Tomer Levi (Fundbox) 00:42:06
- Using Spark for crunching astronomical data on the LSST scale - Petar Zecevic (SV Group) 00:36:20
- Where's my lookup table? Modeling relational data in a denormalized world - Rick Houlihan (Amazon Web Services) 00:41:08
- Creating an extensible 100+ PB real-time big data platform by unifying storage and serving - Reza Shiftehfar (Uber) 00:41:49
Findata Day
- Findata Day welcome - Alistair Croll (Solve For Interesting) 00:17:38
- Assumed risk versus actual risk: The new world of behavior-based risk modeling - Viridiana Lourdes (Ayasdi) 00:40:42
- Creating a data-driven team culture - Brian Lynch (TD Bank Group) 00:30:13
- Creating a data culture at a 150-year-old nonprofit - Dan Barker (RSA Security) 00:24:03
- How S&P’s Trucost empowered analysts with modern, interactive data reporting tools - Rochelle March (Trucost) 00:22:12
- The future of stablecoin - Catherine Gu (Stanford University) 00:21:22
- Anti-smuggling, trade insurance, and trade finance: Risk management via machine learning - Peter Swartz (Altana Trade) 00:24:43
- Banking on change: Data collaboration and enterprise financial services - Karan Jaswal (Cinchy) 00:26:56
- Democratization of data science: Using machine learning to build credit risk models - Moto Tohda (Tokyo Century (USA)) 00:31:37
Strata Business Summit
- Executive Briefing: Why machine-learned models crash and burn in production and what to do about it - David Talby (Pacific AI) 00:37:22
- Improve your data science ROI with a portfolio and risk management lens - Brian Dalessandro (SparkBeyond) 00:45:36
- Embrace complexity: The new rules of AI - Janet Haven (Data & Society) 00:40:21
- Executive Briefing: Top 10 big data blunders - Michael Stonebraker (Tamr) 00:40:53
- Turning petabytes of data from millions of vehicles into open data with Geotab - Felipe Hoffa (Google), Bob Bradley (Geotab) 00:38:58
- Executive Briefing: Usable machine learning—Lessons from Stanford and beyond - Peter Bailis (Sisu | Stanford University) 00:38:19
- Executive Briefing: Understanding the cult of prediction - Farrah Bostic (The Difference Engine) 00:46:46
- Enabling 5G use cases through location intelligence - Tim McKenzie (Pitney Bowes) 00:37:16
- Combining creativity and analytics - David Boyle (Audience Strategies) 00:40:14
- Executive Briefing: Data catalogs—Concepts, capabilities, and key platforms - Andrew Brust (Blue Badge Insights | ZDNet) 00:42:46
- What does the public say? A computational analysis of regulatory comments - Vlad Eidelman (FiscalNote) 00:28:33
- Executive Briefing: Making intelligent insights at the edge—The demise of big data? - Alasdair Allan (Babilim Light Industries) 00:38:13
- Looking beyond the binary: How the lack of sufficient gender data impacts users? - Brindaalakshmi K (Independent Consultant) 00:41:58
- Communication breakdown: Facing machine learning’s all-too-human failure - James Kotecki (Infinia ML) 00:41:44
- Executive Briefing: Unpacking AutoML - Paco Nathan (derwen.ai) 00:47:04
- T-Mobile's journey to turn crowdsourced big data into actionable insights - Alex Yoon (T-Mobile) 00:42:24
- Executive Briefing: Say what? The ethical challenges of designing for humanlike interaction - Jonathan Foster (Microsoft) 00:39:12
- Migrating millions of users from voice- and email-based customer support to a chatbot - Madhu Gopinathan (MakeMyTrip), Sanjay Mohan (MakeMyTrip) 00:41:48
- Executive Briefing: What it takes to use machine learning in fast data pipelines - Dean Wampler (Lightbend) 00:42:30
- Combining creativity and analytics - David Boyle (Audience Strategies) 00:03:46
- Executive Briefing: Big data in the era of heavy worldwide privacy regulations - Mark Donsky (Okera) 00:40:44
Data Case Studies
- Practicing data science: A collection of case studies - Rosaria Silipo (KNIME) 00:26:18
- Implementing ML models into production at Statistics Canada - Richard Evans (Statistics Canada) 00:27:49
- Social services 2.0: Atar and the future of (social) work - Muhammed Idris (Capria VC | TeraCrunch) 00:25:56
- Spotify Wrapped: Product, design, and deadlines - Leah Xu (Spotify) 00:25:19
- Predictive maintenance: How does data science revolutionize the world of machines? - Victoriya Kalmanovich (Navy) 00:27:00
- Driving adoption of data - Moderated by: David Boyle (Audience Strategies) Panelists: Richard Evans (Statistics Canada), Leah Xu (Spotify), Victoriya Kalmanovich (Navy), Moise Convolbo (Rakuten) 00:26:53
- Gaining new insight into online customer behavior using AI - Moise Convolbo (Rakuten) 00:24:54
- The attribution problem - Tusharadri Mukherjee (Lenovo) 00:28:13
- Optimizing the ROI of a geospatial platform in the cloud - Martin Mendez-Costabel (Bayer Crop Science) 00:29:01
- AI and health: Achieving regulatory compliance - Gloria Macia (Roche AG) 00:26:20
- From isolated to connected: The metamorphosis of Revibe - Gwen Campbell (Revibe Technologies) 00:28:54
Security and Privacy
- Apache Metron: Open source cybersecurity at scale - Carolyn Duby (Cloudera) 00:53:47
- Data security and privacy anti-patterns - Steven Touw (Immuta) 00:48:54
- Parquet modular encryption: Confidentiality and integrity of sensitive column data - Gidon Gershinsky (IBM) 00:40:48
- Fair, privacy-preserving, and secure ML - Mikio Braun (Zalando) 00:40:06
- Regulations and the future of data - Andrew Burt (Immuta), Brenda Leong (Future of Privacy Forum), Boris Segalis (Cooley), Susan Israel (Loeb & Loeb, LLP) 00:43:24
- Are your privacy practices auditor approved? - Mark Hinely (KirkpatrickPrice) 00:42:01
- Protect your private data in your Hadoop clusters with ORC column encryption - Owen O'Malley (Cloudera) 00:44:01
Streaming and IoT
- Stream processing beyond streaming data - Stephan Ewen (Ververica) 00:39:48
- Trill: The crown jewel of Microsoft’s streaming pipeline explained - James Terwilliger (Microsoft Corporation), Badrish Chandramouli (Microsoft Research), Jonathan Goldstein (Microsoft Research) 00:34:16
- HBase 2.0 and beyond - Krishna Maheshwari (Cloudera) 00:46:10
- Online machine learning in streaming applications - Stavros Kontopoulos (Lightbend), Debasish Ghosh (Lightbend ) 00:35:37
- Posttransaction processing using Apache Pulsar at Narvar - Karthik Ramasamy (Streamlio), Anand Madhavan (Narvar) 00:39:22
- SK Telecom's 5G network monitoring and 3D visualization on streaming technologies - Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom) 00:45:55
Automation in Data Science and Data
- Building an AI platform: Key principles and lessons learned - Moty Fania (Intel) 00:42:06
- Challenges faced in machine learning infrastructure in traditional large enterprises - Venkata Gunnu (Comcast), Harish Doddi (Datatron) 00:38:34
- Automating ML model training and deployments via metadata-driven data, infrastructure, feature engineering, and model management - Mumin Ransom (Comcast), Nick Pinckernell (Comcast) 00:36:35
- The new SDLC: CI/CD in the age of machine learning - Diego Oppenheimer (Algorithmia) 00:34:20
Business Analytics and Visualization
- Building a best-in-class data lake on AWS and Azure - Tomer Shiran (Dremio), Jacques Nadeau (Dremio) 00:37:19
- Supercharging Elasticsearch for extended Knowledge Graph use cases - Giovanni Tummarello (Siren) 00:34:59
- Intelligent design patterns for cloud-based analytics and BI - Shant Hovsepian (Arcadia Data) 00:43:27
- ThirdEye: LinkedIn’s business-wide monitoring platform - Akshay Rai (Linkedin) 00:40:41
Culture and Organization
- Executive Briefing: Creating a center for data science from scratch—Lessons from nonprofit research - Gayle Bieler (RTI International) 00:45:54
- An in-depth look at the data science career: Defining roles, assessing skills - Usama Fayyad (Open Insights & OODA Health, Inc.), Hamit Hamutcu (Analytics Center) 00:42:20
- Executive Briefing: Building a culture of self-service from predeployment to continued engagement - Jonathan Tudor (GE Aviation), Ross Schalmo (GE Aviation) 00:39:35
Law and Ethics
- Purposefully designing technology for civic engagement - Audrey Lobo-Pulo (Phoensight), Annette Hester (National Energy Board, Canada) 00:38:50
Pop-up Talks
- From Data Lake to Strategic Asset - Arnon Shimoni (SQream) 00:07:40
- Cloud Native Analytics Modernization - Rupesh Dandekar (Deloitte) 00:12:12
- Disease Detection on X-Rays with Deep Learning on Spark & Analytics Zoo - Sajan Govindan (Intel) 00:10:06
- Why do half of the Top 10 banks in North America choose MemSQL - Mike Czabator (MemSQL) 00:08:10
- How to get value from the 80% of data your not using and make it easier to get started with Enterprise AI - Zachary Jarvinen (OpenText) 00:11:16
- Dash - Operationalizing Python & R models at scale - Chris Parmer (Plotly) 00:11:41
- The End of Applications - How Data Collaboration Changes Everything - Dan DeMers (Cinchy) 00:11:07
- Distributed SQL: A cloud-native evolution of the database - Jim Walker (Cockroach Labs) 00:11:46
- Privacy Complaint datasets with higher analytical value for Data Science - Ravi Pather (CryptoNumerics) 00:09:36
- High Speed Clients - The Missing Link in Big Data Performance - Jerod Johnson (CData Software) 00:09:36
- Real Time Applications in Financial Services - Michael Linchitz (InterSystems) 00:10:23
- An analytics ecosystem to accelerate enterprises’ Analytics and AI journey - Satyamoy Chatterjee (Analyttica Datalab) 00:09:46
- How TimescaleDB enables Grillo to monitor earthquakes in real-time using the power of time-series data - Ajay Kulkarni (Timescale) 00:09:51
- Creating a collaborative environment between data science, IT, and developer teams - Kevin Poskitt (SAP) 00:11:25
- Cazena Introduces the SaaS Data Lake - Lovan Chetty (Cazena) 00:09:41
- Using Dremio and Python Dash to Process and Visualize IoT Data - Ryan Murray (Dremio) 00:08:14
- Smart Hybrid Bursting with AWS EMR Spark & Alluxio - Dipti Borkar (Alluxio), Vitaliy Boklikov (Alluxio) 00:08:27
Tutorials
- Deep learning methods for natural language processing - Garrett Hoffman (StockTwits) - Part 1 00:41:47
- Deep learning methods for natural language processing - Garrett Hoffman (StockTwits) - Part 2 00:45:11
- Deep learning methods for natural language processing - Garrett Hoffman (StockTwits) - Part 3 00:36:53
- Deep learning methods for natural language processing - Garrett Hoffman (StockTwits) - Part 4 00:40:23
- Learning Presto: SQL on anything - Matt Fuller (Starburst) - Part 1 00:33:06
- Learning Presto: SQL on anything - Matt Fuller (Starburst) - Part 2 00:34:45
- Learning Presto: SQL on anything - Matt Fuller (Starburst) - Part 3 00:39:00
- Learning Presto: SQL on anything - Matt Fuller (Starburst) - Part 4 00:37:28
- Kafka and Streams Messaging Manager (SMM) crash course - Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera) - Part 1 00:49:01
- Kafka and Streams Messaging Manager (SMM) crash course - Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera) - Part 2 00:46:55
- Kafka and Streams Messaging Manager (SMM) crash course - Purnima Reddy Kuchikulla (Cloudera), Dan Chaffelson (Cloudera) - Part 3 00:45:45
- Cloudera Edge Management in the IoT - Purnima Reddy Kuchikulla (Cloudera), Timothy Spann (Cloudera), Abdelkrim Hadjidj (Cloudera), Andre Araujo (Cloudera) - Part 1 00:54:35
- Cloudera Edge Management in the IoT - Purnima Reddy Kuchikulla (Cloudera), Timothy Spann (Cloudera), Abdelkrim Hadjidj (Cloudera), Andre Araujo (Cloudera) - Part 2 00:49:10
- Managing the complete machine learning lifecycle with MLflow - Jules Damji (Databricks) - Part 1 00:45:17
- Managing the complete machine learning lifecycle with MLflow - Jules Damji (Databricks) - Part 2 00:45:54
- Managing the complete machine learning lifecycle with MLflow - Jules Damji (Databricks) - Part 3 1:00:26
- Serverless streaming architectures and algorithms for the enterprise - Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (RISELab, UC Berkeley) - Part 1 00:41:46
- Serverless streaming architectures and algorithms for the enterprise - Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (RISELab, UC Berkeley) - Part 2 00:55:09
- Serverless streaming architectures and algorithms for the enterprise - Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (RISELab, UC Berkeley) - Part 3 00:50:44
- Serverless streaming architectures and algorithms for the enterprise - Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (RISELab, UC Berkeley) - Part 4 00:38:17
- Running multidisciplinary big data workloads in the cloud with CDP - James Morantus (Cloudera), Tony Huinker (Cloudera) - Part 1 00:46:22
- Running multidisciplinary big data workloads in the cloud with CDP - James Morantus (Cloudera), Tony Huinker (Cloudera) - Part 2 00:41:30
- Running multidisciplinary big data workloads in the cloud with CDP - James Morantus (Cloudera), Tony Huinker (Cloudera) - Part 3 00:47:21
- Running multidisciplinary big data workloads in the cloud with CDP - James Morantus (Cloudera), Tony Huinker (Cloudera) - Part 4 00:57:41
- Deep learning from scratch - Bruno Goncalves (Data For Science, Inc) - Part 1 00:55:02
- Deep learning from scratch - Bruno Goncalves (Data For Science, Inc) - Part 2 00:55:26
- Deep learning from scratch - Bruno Goncalves (Data For Science, Inc) - Part 3 1:01:14
- Foundations for successful data projects - Ted Malaska (Capital One), Jonathan Seidman (Cloudera) - Part 1 00:40:12
- Foundations for successful data projects - Ted Malaska (Capital One), Jonathan Seidman (Cloudera) - Part 2 00:44:21
- Foundations for successful data projects - Ted Malaska (Capital One), Jonathan Seidman (Cloudera) - Part 3 00:41:49
- Foundations for successful data projects - Ted Malaska (Capital One), Jonathan Seidman (Cloudera) - Part 4 00:43:38
- Sketching data and other magic tricks - Sophie Watson (Red Hat), William Benton (Red Hat) - Part 1 00:48:18
- Sketching data and other magic tricks - Sophie Watson (Red Hat), William Benton (Red Hat) - Part 2 00:50:49
- Efficient ML engineering: Tools and best practices - Sourav Dey (Manifold), Jakov Kucan (Manifold) - Part 1 00:49:30
- Efficient ML engineering: Tools and best practices - Sourav Dey (Manifold), Jakov Kucan (Manifold) - Part 2 00:27:44
- Efficient ML engineering: Tools and best practices - Sourav Dey (Manifold), Jakov Kucan (Manifold) - Part 3 00:45:25
- Efficient ML engineering: Tools and best practices - Sourav Dey (Manifold), Jakov Kucan (Manifold) - Part 4 00:44:15
- Building and leading a successful AI practice for your organization - Rossella Blatt Vital (Wonderlic) - Part 1 00:41:23
- Building and leading a successful AI practice for your organization - Rossella Blatt Vital (Wonderlic) - Part 2 00:34:22
- Building and leading a successful AI practice for your organization - Rossella Blatt Vital (Wonderlic) - Part 3 00:47:33
- Building and leading a successful AI practice for your organization - Rossella Blatt Vital (Wonderlic) - Part 4 00:40:28
- Natural language understanding at scale with Spark NLP - David Talby (Pacific AI), Alex Thomas (John Snow Labs), Saif Addin Ellafi (John Snow Labs), Claudiu Branzan (Accenture) - Part 1 00:45:58
- Natural language understanding at scale with Spark NLP - David Talby (Pacific AI), Alex Thomas (John Snow Labs), Saif Addin Ellafi (John Snow Labs), Claudiu Branzan (Accenture) - Part 2 00:37:51
- Natural language understanding at scale with Spark NLP - David Talby (Pacific AI), Alex Thomas (John Snow Labs), Saif Addin Ellafi (John Snow Labs), Claudiu Branzan (Accenture) - Part 3 00:43:20
- Natural language understanding at scale with Spark NLP - David Talby (Pacific AI), Alex Thomas (John Snow Labs), Saif Addin Ellafi (John Snow Labs), Claudiu Branzan (Accenture) - Part 4 00:34:27
- Introduction to natural language processing in Python - Alice Zhao (Metis) - Part 1 00:51:03
- Introduction to natural language processing in Python - Alice Zhao (Metis) - Part 2 00:34:19
- Introduction to natural language processing in Python - Alice Zhao (Metis) - Part 3 00:48:44
- Introduction to natural language processing in Python - Alice Zhao (Metis) - Part 4 00:30:29
- Hands-on machine learning with Kafka-based streaming pipelines - Boris Lublinsky (Lightbend), Dean Wampler (Lightbend) - Part 1 00:53:24
- Hands-on machine learning with Kafka-based streaming pipelines - Boris Lublinsky (Lightbend), Dean Wampler (Lightbend) - Part 2 00:52:45
- Hands-on machine learning with Kafka-based streaming pipelines - Boris Lublinsky (Lightbend), Dean Wampler (Lightbend) - Part 3 00:59:01
- Getting ready for CCPA: Securing data lakes for heavy privacy regulation - Mark Donsky (Okera), Lars George (Okera), Michael Ernest (Dataiku), Ifigeneia Derekli (Cloudera) - Part 1 00:46:57
- Getting ready for CCPA: Securing data lakes for heavy privacy regulation - Mark Donsky (Okera), Lars George (Okera), Michael Ernest (Dataiku), Ifigeneia Derekli (Cloudera) - Part 2 00:50:41
- Getting ready for CCPA: Securing data lakes for heavy privacy regulation - Mark Donsky (Okera), Lars George (Okera), Michael Ernest (Dataiku), Ifigeneia Derekli (Cloudera) - Part 3 00:50:23
- Getting ready for CCPA: Securing data lakes for heavy privacy regulation - Mark Donsky (Okera), Lars George (Okera), Michael Ernest (Dataiku), Ifigeneia Derekli (Cloudera) - Part 4 00:51:25
- Real-time SQL stream processing at scale with Apache Kafka and KSQL - Viktor Gamov (Confluent) - Part 1 00:59:45
- Real-time SQL stream processing at scale with Apache Kafka and KSQL - Viktor Gamov (Confluent) - Part 2 00:55:50
- Real-time SQL stream processing at scale with Apache Kafka and KSQL - Viktor Gamov (Confluent) - Part 3 00:54:46