Version: "3" services: pyspark: image: "jupyter/all-spark-notebook" volumes: - c:/code/pyspark-data:/home/jovyan ports: - 8888:8888 Create a file in that folder and call it docker-compose.yaml with the content given below:. c:\code\pyspark-jupyter or whatever name you want to give Create a new folder on your system, e.g.Now let’s dig into technical details and see how to setup local environment which supports PySpark, Jupyter Notebook and NumPy. NumPy - It is a Python library used to work with multi-dimensionsal arrays, matrices, high-level mathematical functions, etc.Jupyter Notebook - It is an open source web application mostly used by Data Analysts / Engineers to write code, mathematical equations, data visualization, etc.It is almost 100x faster than any other traditional large scale data processing frameworks Apache Spark - It is a very popular framework for handling and working with Big Data.Since it’s written in Python you can use other Python modules to be an efficient Data Analyst It is a Python API built to interact with Apache Spark. PySpark - PySpark programming is the collaboration of Apache Spark and Python.Let’s first understand briefly what I mean by toolset and what I’m going to package in the Docker container. As always, my approach is to make your programs portable and platform independent. In this article, I’ll explain about basic toolset required to write standard Data Analysis programs in the containerized environment using Docker.
0 Comments
Leave a Reply. |