The New Stack Podcast

Apache Spark for Artificial Intelligence and AI 2.0

Episode Summary

Today on The New Stack Context we talk with Garima Kapoor, COO and co-founder of MinIO, about using Spark at scale for Artificial Intelligence and Machine Learning (AI/ML) workloads on Kubernetes. The Apache and Hadoop ecosystem hasn’t had much overlap with Kubernetes in the past, but as we learned at KubeCon in Seattle last November, that is quickly changing. As Iguazio’s Yaron Haviv wrote in a contributed article on TNS titled “Will Kubernetes Sink the Hadoop Ship?” “Early adopters are realizing that they can run their big data stack (Spark, Presto, Kafka, etc.) on Kubernetes in a much simpler manner. Furthermore, they can run all of the cool post-Hadoop AI and data science tools like Jupyter, TensorFlow, PyTorch or custom Docker containers on the same cluster.” Fast forward to now and we are approaching the Spark + AI Summit which Databricks is putting on in San Francisco next week and we are curious…How is Spark being used in cloud native architectures these days, with the likes of MinIO -- the open source, container native object store -- to, say, create machine learning data pipelines on Kubernetes? What is driving this trend to high-performance object stores? Kapoor breaks down the trends in the first half of the show.

Episode Notes

Today on The New Stack Context we talk with Garima Kapoor, COO and co-founder of MinIO, about using Spark at scale for Artificial Intelligence and Machine Learning (AI/ML) workloads on Kubernetes. The Apache and Hadoop ecosystem hasn’t had much overlap with Kubernetes in the past, but as we learned at KubeCon in Seattle last November, that is quickly changing. As Iguazio’s Yaron Haviv wrote in a contributed article on TNS titled “Will Kubernetes Sink the Hadoop Ship?”

“Early adopters are realizing that they can run their big data stack (Spark, Presto, Kafka, etc.) on Kubernetes in a much simpler manner. Furthermore, they can run all of the cool post-Hadoop AI and data science tools like Jupyter, TensorFlow, PyTorch or custom Docker containers on the same cluster.”

Fast forward to now and we are approaching the Spark + AI Summit which Databricks is putting on in San Francisco next week and we are curious…How is Spark being used in cloud native architectures these days, with the likes of MinIO -- the open source, container native object store -- to, say, create machine learning data pipelines on Kubernetes? What is driving this trend to high-performance object stores? Kapoor breaks down the trends in the first half of the show.