Installing Docker and Creating a Container with Python, Apache NiFi, and Apache ZooKeeper

Image Description

Introduction

    In today's fast-paced world of software development and deployment, efficiency and consistency are paramount. Docker, a leading containerization platform, has revolutionized the way we build, package, and distribute applications. It offers a lightweight, portable, and efficient solution for managing and running applications in isolated environments known as containers.

    Whether you're a developer, system administrator, or IT professional, understanding Docker and how to use it can greatly enhance your workflow. This comprehensive guide will take you through the process of installing Docker on a Linux operating system and creating a Docker container that includes Python, Apache NiFi, and Apache ZooKeeper. By the end, you'll have the knowledge and skills to harness the power of containerization for your projects.

    What is Docker?

    Before we dive into the installation and setup, let's briefly explore what Docker is and why it has become a cornerstone of modern software development and deployment.

    Docker is an open-source platform designed to automate the deployment, scaling, and management of applications within lightweight, portable, and self-sufficient containers. These containers encapsulate everything an application needs to run, including code, libraries, dependencies, and runtime environments. Unlike traditional virtual machines, Docker containers share the host OS kernel, which makes them extremely efficient and fast to start.

    Docker provides an environment that enables developers to build, test, deploy, and run applications by using containerization technology.

    The key benefits of Docker include:

    1. Isolation: Containers run in isolated environments, ensuring that applications don't interfere with each other.
    2. Consistency: Docker ensures that an application behaves the same way in development, testing, and production environments.
    3. Portability: Containers can run on any system that supports Docker, making it easy to move applications between different environments.
    4. Scalability: Docker makes it simple to scale applications up or down as needed, adapting to varying workloads.

    In essence, Docker simplifies the development and deployment process, making it easier to package, distribute, and run applications consistently, whether you're working on a small-scale project or managing complex microservices architectures.

Prerequisites

  • You have installed the latest version of Docker Desktop
  • You have installed a Git client
  • You have an IDE or a text editor to edit files. I normally use VS studio code. You can use nano or vi editor. In ubuntu systems, you can you 'gedit'.
  • On windows systems, you may need to install Windows Subsystem for Linux (WSL) to run the linux like commands and access docker on windows systems.
  • For this, you can just find the linux-distributio (of your choice) in search bar, and click it. A new command prompt will open, where you may have to provide the username and password (set at the time of installation as discussed in above link for WSL).

Docket installation

  • Visit the Docker download page and choose the version compatible to your operating system.
  • After the installation file is downloaded on your local system, install it as per the suggestions given on the above link.
  • Once Docker is installed, launch Docker Desktop. You will see the Docker icon in the system tray when it's running. Next create a account and login into it at the docker desktop.
  • Create a directory on your system where you'll work on your Docker project. You can use File Explorer or the command prompt for this.
  • Inside your project directory, create a file named "Dockerfile" (without a file extension) using a text editor or the command prompt.
  • Open the Dockerfile in your preferred text editor.
  • Create a Dockerfile:
    • Create a directory on your local system where you'll work on your Docker project. You can use the command prompt for this.
    • Inside your project directory, create a file named 'Dockerfile' (without a file extension) using a text editor or the command prompt.
    • Open the 'Dockerfile' in your preferred text editor.
    • In the 'Dockerfile', define the instructions to build your container. Save the Dockerfile. For example:
    • 
                              FROM node:18-alpine
                              WORKDIR /app
                              COPY . .
                              RUN yarn install --production
                              CMD ["node", "src/index.js"]
                              EXPOSE 3000
                            
    • Build the image using the following commands:
    • docker build -t my-image .
    • Start an app container:
    • docker run -d -p 3000:3000 my-image
    • After a few seconds, open your web browser to http://localhost:3000.
  • Installation via a command line on linux operating system (UBUNTU):

    Installation of Jupyerlab, apache nifi, zookeeper on EC2 machine

    We can do this in following steps:

    Step-1: Creating a customized Docketfile.

    A docker-compose.yml file is a configuration file used with Docker Compose, a tool that simplifies the management of multi-container Docker applications. This file defines the services, networks, and volumes for your application, allowing you to specify how different Docker containers interact and share resources

    • To create a Docker container that includes Jupyter Lab with the specified ports, ZooKeeper, and NiFi with the specified ports, you can use a "docker-compose.yml" file to define and run these services together.
    • You can create a docket-compose.yml and save the docker-compose.yml file to your working directory:
    • 
                            version: "3.6"
                            volumes:
                              shared-workspace:
                                name: "hadoop-distributed-file-system"
                                driver: local
                            services:
                              jupyterlab:
                                image: jupyter/datascience-notebook  # Other options:jupyter/base-notebook, or scipy-notebook, or all-spark-notebook or scipy-notebook
                                container_name: jupyterlab
                                ports:
                                  - 4888:4888
                                  - 4040:4040
                                  - 8050:8050
                                  volumes:
                                  - /path/on/host/jupyterlab:/path/in/container
                              zookeeper:
                                image: bitnami/zookeeper
                                container_name: zookeeper
                                ports:
                                  - 2181:2181
                                environment:
                                  - ALLOW_ANONYMOUS_LOGIN=yes
                                  volumes:
                                  - /path/on/host/zookeeper:/path/in/container
                              nifi:
                                image: apache/nifi:1.14.0  # Use an appropriate NiFi image
                                container_name: nifi
                                ports:
                                  - 2080:2080
                                environment:
                                  - NIFI_WEB_HTTP_PORT=2080
                                  - NIFI_CLUSTER_IS_NODE=true
                                  - NIFI_CLUSTER_NODE_PROTOCOL_PORT=2084
                                  - NIFI_ZK_CONNECT_STRING=zookeeper:2181
                                  - NIFI_ELECTION_MAX_WAIT=1 min
                                  - NIFI_SENSITIVE_PROPS_KEY=vvvvvvvvvvvv
                                  volumes:
                                  - /path/on/host/nifi:/path/in/container
                          
      • The version specifies the version of the Docker Compose file.
      • Under the services section, we define three services: jupyterlab, zookeeper, and nifi.
      • Replace `/path/on/host` with the path on your host (EC2 instance) where you want to create the volumes
      • For each service: image specifies the Docker image to use for that service.
      • `container_name` sets the name for the container.
      • `ports` map the host machine's ports to the container's ports. The format is `host_port:container_port`.
      • For the zookeeper service, we set the environment variable ALLOW_ANONYMOUS_LOGIN to "yes" to allow anonymous logins. Adjust environment variables as needed for your specific ZooKeeper and NiFi configurations.

    Step-2: Transfer the docker-compose.yml File

    • Open a terminal and navigate to the directory where the docker-compose.yml is located.
    • Transfer the `docker-compose.yml` file to your EC2 instance using `SCP` or a similar method. Replace [`your-key.pem`] with the path to your private key and [`your-ec2-instance-ip`] with the public IP or DNS name of your instance.
    • Run the following command to start the services:
    • scp -i [your-key.pem] docker-compose.yml ec2-user@[your-ec2-instance-ip]:/path/to/destination
                        

        It is to be noted here that the `.pem` file is obtained, when you set the IAM user at yout AWS account. You need to save this file in a secret place and never shared it with anyone.

        It has normally two values: {"Key": "ExampleKey", "Value": "ExampleValue"}'

    • To start the container, you just need to run following command in the ec2 instance:
    • docker-compose up
    • The Jupyterlab and the APache nifi can be access in your browser using:
    • Jupyter Lab at: http://ip_address:4888/lab? 
      and
      NiFi at: http://ip_address:2080/nifi/
    • Here `ip_address` is the ec2 machine IP.
    Go back