Spark is a useful Big data tool and set up a Spark cluster usually difficult.
In the past, I have tried using Spark as a single node mode. But now with the technology advancement, I can use Docker and Kubernetes to test Spark cluster instead of using Oracle VirtualBox.
If you want to try out Kubernetes. I recommend you to try it on GCP first. Kubernetes is already set up there and you can try out the concept first before troubleshooting incorrect setup. It also has a nice tutorial to walk you through how to create service on GCP with the help of Kubernetes
Then I turn my study to a book Mastering Kubernetes – Gigi Sayfan. Reading first 2 chapter is enough for you to set up Spark cluster on Kubernetes. Why I am writing this is that some contents I found on books and web (Stackoverflow, forums) are outdated.
Setting up Kubernetes on Windows
VirtualBox
Kubectl
Minikube
Downloading the latest executable is important or you end up troubleshooting incompatibility issues.
Put the downloaded executable to C:\Windows may work but I suggest to create a folder and put Kubectl and Minikube inside it, and set Window %PATH% variable to point to this folder.
I am using Windows Powershell.
To build a cluster, use minikube-windows-amd64.exe start.
You can always delete your cluster by minikube-windows-amd64.exe disable and remove all the file inside C:\Users\USER.minikube
Now you have a decent VM in your VirtualBox (default 2 vCPUs and 4 G Ram)
To see the Kubernetes dashboard, use this command: minikube-windows-amd64.exe addons list to ensure that the dashboard module is on.
If it is not, run minikube-windows-amd64.exe addons enable dashboard. Now you should be able to run minikube-windows-amd64.exe dashboard. and use the URL provided to access the dashboard
For the time being, You are forbidden to access to “Setting” page, please go to this URL for setting up an administrator account and login Kubernetes dashboard.
SSH to Minikube machine does not work on Powershell, you have to use it inside normal CMD.
At this time, you can access to both Kubernetes dashboard and Kubernetes host.