HoK is Hadoop on Kubernetes, It helps you to deploy Hadoop stack on Kubernetes
Kubernetes is now proven technology to deploy and distribute modules quickly and efficiently. Many cloud vendors are now offering Hadoop as a service. Companies are moving towards the model where they want to provision an instance of service on the fly and use it for analytics. Usually, it takes weeks to provision a production-ready Hadoop cluster. This platform is container-native platform serves as the Backbone for all other analytical services. This provides the user to spawn Hadoop cluster using the self-serve portal, which helps them to onboard the team quickly and efficiently so their developers can start using the cluster as soon as they join the team
This Helm Chart requires Helm 2.
Depending of the version of the HoKStack, it supports the following platforms:
HoKStack Helm Chart version | Kubernetes | OpenShift Container Platform |
---|---|---|
v1.0.0 | 1.11+ | 3.11+ |
v1.0.0 | 1.11+ | 3.11+ |
The Hok need saperet namespace which can be team-name
or value-stream
anything unique.
It deploys all the Hadoop components in statefulsets.
To install the HokStack via Helm run the following command:
$ helm repo add hok https://raw.githubusercontent.com/hokstack/hok-helm/master/repos/stable
To update simply update your helm repositories and check the latest version
$ helm repo update
You can then check for the latest version by searching your Helm repositories for the HokStack
$ helm search hok
You can then check for the latest version by searching your Helm repositories for the HokStack
$ helm install hok/hokstack --name hok-team1 --set teamname=team1 --namespace team1
You can then check for the latest version by searching your Helm repositories for the HokStack
$ helm install hok/hokstack --name hok-team2 --set teamname=team2 --set metacontroller.crds.create=false --namespace team2
Remove OneAgent custom resources and clean-up all remaining OneAgent Operator specific objects:
$ helm remove <release name> -n <namespace>
By default it will install following statefulsets.
- postgres
- ambariserver
- masternode
- datanode
3 replica
- edgenode
- dante-proxy
- kdcserver
optional
- metacontroller
- HDFS
- Yarn
- Mapreduce
- Tez
- Hive
- Oozie
- ZooKeeper
- Kafka
- Spark
- Spark2
hdfsyarn.json contains the ambari blueprints, add componenets in the host_groups
to add more HDP componenets.
In the stack, you will find dante-proxy pod is running its SOCKS5 proxy running in container using that you can access the Ambari UI and other Hadoop UIs.
Get the NodePort of dante-proxy using below command
$ kubectl get svc dante-proxy
Enter this port and Node IP address to the web-browser in proxy section, preferably Firefox
as nowadays Chrome manages my orgnisations.
Acces cluster using the edgenode port you can login into that using
kubectl exec -it edgenode-0 bash
Aslo SSH is enabled on the edgenode-0 it can be accessed
The following table lists the configurable parameters of the HELM chart and their default values.
Global values
Parameter | Description | Default |
---|---|---|
teamname |
Team name value to seprate the install | team1 |
Parameter | Description | Default |
---|---|---|
postgres.enabled |
Flag to enable and disable Postgres SQL install | true |
postgres.name |
The name used in various properties | postgres |
postgres.replicaCount |
Postgres pod replica count | 1 |
postgres.image.repository |
Image repo URL It uses custom postgres image |
index.docker.io/hokstack/postgresql |
postgres.persistentVolume.enabled |
The PVC enable flag to control volumes | true |
postgres.storageClassName |
The storage class to be used for PVC | gp2 - In the case of EKS |
postgres.persistentVolume.accessModes |
PVC access mode | ReadWriteOnce |
postgres.persistentVolume.storage |
The storage size to be allocated to pod | 10Gi |
Parameter | Description | Default |
---|---|---|
ambariserver.enable |
Flag to enable and disable Ambari Server Install | true |
ambariserver.name |
The name used in various properties | ambariserver |
ambariserver.componentName |
The component name used in various properties | ambari |
ambariserver.image.repository |
Image repo URL It uses custom ambari image |
index.docker.io/hokstack/hdp |
ambariserver.persistentVolume.enabled |
The PVC enable flag to control volumes | true |
ambariserver.persistentVolume.storageClassName |
The storage class to be used for PVC | gp2 In the case of EKS |
ambariserver.persistentVolume.accessModes |
PVC access mode | ReadWriteOnce |
ambariserver.persistentVolume.storage |
The storage size to be allocated to pod | 1Gi |
Parameter | Description | Default |
---|---|---|
masternode.enable |
Flag to enable and disable Master Node Install | true |
masternode.name |
The name used in various properties | masternode |
masternode.componentName |
The component name used in various properties | master |
masternode.image.repository |
Image repo URL It uses custom hdp image |
index.docker.io/hokstack/hdp |
masternode.persistentVolume.enabled |
The PVC enable flag to control volumes | true |
masternode.persistentVolume.storageClassName |
The storage class to be used for PVC | gp2 In the case of EKS |
masternode.persistentVolume.accessModes |
PVC access mode | ReadWriteOnce |
masternode.persistentVolume.storage |
The storage size to be allocated to pod | 10Gi |
masternode.persistentVolume.mountPath |
The storage mount path will be used by HDFS | /hadoop |
Parameter | Description | Default |
---|---|---|
datanode.enable |
Flag to enable and disable Data Node Install | true |
datanode.name |
The name used in various properties | datanoe |
datanode.componentName |
The component name used in various properties | data |
datanode.image.repository |
Image repo URL It uses custom hdp image |
index.docker.io/hokstack/hdp |
datanode.persistentVolume.enabled |
The PVC enable flag to control volumes | true |
datanode.persistentVolume.storageClassName |
The storage class to be used for PVC | gp2 In the case of EKS |
datanode.persistentVolume.accessModes |
PVC access mode | ReadWriteOnce |
datanode.persistentVolume.storage |
The storage size to be allocated to pod | 10Gi |
datanode.persistentVolume.mountPath |
The storage mount path will be used by HDFS | /hadoop |
Parameter | Description | Default |
---|---|---|
edgenode.enable |
Flag to enable and disable Edge Node Install | true |
edgenode.name |
The name used in various properties | edgenode |
edgenode.componentName |
The component name used in various properties | edge |
edgenode.image.repository |
Image repo URL It uses custom hdp image |
index.docker.io/hokstack/hdp |
edgenode.persistentVolume.enabled |
The PVC enable flag to control volumes | true |
edgenode.persistentVolume.storageClassName |
The storage class to be used for PVC | gp2 In the case of EKS |
edgenode.persistentVolume.accessModes |
PVC access mode | ReadWriteOnce |
edgenode.persistentVolume.storage |
The storage size to be allocated to pod | 10Gi |
edgenode.persistentVolume.mountPath |
The storage mount path will be used by HDFS | /hadoop |
Parameter | Description | Default |
---|---|---|
metacontroller.enable |
Flag to enable and disable Metacontroller Install | true |
metacontroller.name |
The name used in various properties | metacontroller |
metacontroller.image.repository |
Image repo URL | metacontroller/metacontroller |
metacontroller.rbac.create |
RBAC creation for metacontroller | true |
metacontroller.serviceAccount.create |
Service Account creation for metacontroller | true |
metacontroller.crds.create |
CRDs install for metacontroller | truw |
Parameter | Description | Default |
---|---|---|
kdc.enable |
Flag to enable and disable Edge Node Instal | true |
kdc.name |
The name used in various properties | kdcserve |
kdc.componentName |
The component name used in various properties | kdc |
kdc.realmName |
The kerberos REALM name | DEMO.COM |
kdc.image.repository |
Image repo URL | index.docker.io/hokstack/kdc |
kdc.persistentVolume.enabled |
The PVC enable flag to control volumes | true |
kdc.persistentVolume.storageClassName |
The storage class to be used for PVC | gp2 In the case of EKS |
kdc.persistentVolume.accessModes |
PVC access mode | ReadWriteOnce |
kdc.persistentVolume.storage |
The storage size to be allocated to pod | 10Gi |
Full and up-to-date documentation can be found in the comments of the values.yaml
file.
HokStack is maintained by:
Initially inspired from https://www.cloudera.com/tutorials/getting-started-with-hdp-sandbox.html
Currently we work on the code in our free time, any assistance is highly appreciated. Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests.
HoKStack is under Apache 2.0 license. See LICENSE.md for details.