Apache Kyuubi on CDH 在竞技世界大数据平台实践

“

为了满足业务大数据架构使用多种sql引擎：spark，flink，trino(同时查询 hive，clickhouse 等)，需要部署一个统一的sql入口，该入口满足多引擎多平台运行。本文的实践是上述需求的一个初始实践（后续部分正在进行），鉴于当前没有找到 Kyuubi on k8s 的实践，所以记录一下。

”

基础环境

1.2 TPCDS 数据集

组件名称	组件版本
kyuubi	v1.6.0
Spark	v3.3.0
CDH	v6.2.1

创建 Docker 镜像

创建 Spark 3.3.0 镜像

1.修改 Spark 的配置文件

修改 Spark-env.sh 文件，添加下面的内容

（路径是未来在容器中路径）：

export HADOOP_CONF_DIR=/opt/spark/conf:/opt/spark/confexport YARN_CONF_DIR=/opt/spark/conf:/opt/spark/conf

Spark-defaults.conf 文件保持不变

2.将 CDH 中 Hadoop 中的配置添加到/data/spark-3.3.0/conf中

（CDH 的 Hadoop 配置在etc下）

Apache Kyuubi on CDH 在竞技世界大数据平台实践

3.编辑初始化脚本

（下面内容需要补充进去，***表示自定义的内容）

#方便后面解析CDH集群ipecho " ***.***.***.***  "  >> /etc/hosts#kerberos认证需要的配置文件echo "***" > /etc/krb5.conf#在镜像中进行认证操作kinit -kt /opt/spark/work-dir/hive.keytab hive/***@****.****.****

‍

编辑好 run.sh 放在/data/spark-3.3.0/路径下，同时 keytab 文件也放在该路径下（任意位置都行，但是放在这里最方便，下面的 Dockerfile 中需要用到）

4.修改 Docker 镜像进入后的脚本文件

/data/spark-3.3.0/kubernetes/dockerfiles/spark/entrypoint.sh

关键内容：添加driver和executor运行时初始化脚本run.sh（图方便使用了777的权限）

#!/bin/bash## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##    http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.#
# echo commands to the terminal outputset -ex
# Check whether there is a passwd entry for the container UID#myuid=$(id -u)myuid=0mygid=$(id -g)# turn off -e for getent because it will return error code in anonymous uid caseset +euidentry=$(getent passwd $myuid)set -e
# If there is no passwd entry for the container UID, attempt to create oneif [ -z "$uidentry" ] ; then    if [ -w /etc/passwd ] ; then        echo "$myuid:x:$myuid:$mygid:${SPARK_USER_NAME:-anonymous uid}:$SPARK_HOME:/bin/false" >> /etc/passwd    else        echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID"    fifi
if [ -z "$JAVA_HOME" ]; then  JAVA_HOME=$(java -XshowSettings:properties -version 2>&1 > /dev/null | grep 'java.home' | awk '{print $3}')fi
SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"env | grep SPARK_JAVA_OPT_ | sort -t_ -k4 -n | sed 's/[^=]*=(.*)/1/g' > /tmp/java_opts.txtreadarray -t SPARK_EXECUTOR_JAVA_OPTS 
if [ -n "$SPARK_EXTRA_CLASSPATH" ]; then  SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_EXTRA_CLASSPATH"fi
if ! [ -z ${PYSPARK_PYTHON+x} ]; then    export PYSPARK_PYTHONfiif ! [ -z ${PYSPARK_DRIVER_PYTHON+x} ]; then    export PYSPARK_DRIVER_PYTHONfi
# If HADOOP_HOME is set and SPARK_DIST_CLASSPATH is not set, set it here so Hadoop jars are available to the executor.# It does not set SPARK_DIST_CLASSPATH if already set, to avoid overriding customizations of this value from elsewhere e.g. Docker/K8s.if [ -n "${HADOOP_HOME}"  ] && [ -z "${SPARK_DIST_CLASSPATH}"  ]; then  export SPARK_DIST_CLASSPATH="$($HADOOP_HOME/bin/hadoop classpath)"fi
if ! [ -z ${HADOOP_CONF_DIR+x} ]; then  SPARK_CLASSPATH="$HADOOP_CONF_DIR:$SPARK_CLASSPATH";fi
if ! [ -z ${SPARK_CONF_DIR+x} ]; then  SPARK_CLASSPATH="$SPARK_CONF_DIR:$SPARK_CLASSPATH";elif ! [ -z ${SPARK_HOME+x} ]; then  SPARK_CLASSPATH="$SPARK_HOME/conf:$SPARK_CLASSPATH";fi


case "$1" in  driver)    shift 1    chmod 777 /opt/spark/work-dir/run.sh    /bin/bash /opt/spark/work-dir/run.sh    cat /etc/hosts    CMD=(      "$SPARK_HOME/bin/spark-submit"      --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS"      --deploy-mode client      "$@"    )    ;;  executor)    shift 1    chmod 777 /opt/spark/work-dir/run.sh    /bin/bash /opt/spark/work-dir/run.sh    cat /etc/hosts    CMD=(      ${JAVA_HOME}/bin/java      "${SPARK_EXECUTOR_JAVA_OPTS[@]}"      -Xms$SPARK_EXECUTOR_MEMORY      -Xmx$SPARK_EXECUTOR_MEMORY      -cp "$SPARK_CLASSPATH:$SPARK_DIST_CLASSPATH"      org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBackend      --driver-url $SPARK_DRIVER_URL      --executor-id $SPARK_EXECUTOR_ID      --cores $SPARK_EXECUTOR_CORES      --app-id $SPARK_APPLICATION_ID      --hostname $SPARK_EXECUTOR_POD_IP      --resourceProfileId $SPARK_RESOURCE_PROFILE_ID      --podName $SPARK_EXECUTOR_POD_NAME    )    ;;
  *)    echo "Non-spark-on-k8s command provided, proceeding in pass-through mode..."    CMD=("$@")    ;;esac
# Execute the container CMD under tini for better hygieneexec /usr/bin/tini -s -- "${CMD[@]}"

‍

5.编辑/data/spark-3.3.0/kubernetes/dockerfiles/spark/Dockerfile

关键点：

修改 openjdk 的源（也可以不修改，但是网络不好的话镜像拉取不下来）
修改拉取 debian 的源（原因同上）
安装vim sudo net-tools lsof bash tini libc6 libpam-modules krb5-user libpam-krb5 libpam-ccreds libkrb5-dev libnss3 procps等软件（方便后续在容器中进行操作）
复制 cong 下文件到/opt/spark/conf下
复制 keytab 文件到/opt/spark/work-dir路径下
复制初始化脚本 run.sh，用来在镜像拉起后进行修改/etc/hosts文件
设置 Spark_uid为0（root）（目的是需要更改hosts文件）

# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##    http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.#ARG java_image_tag=8-jre-slim
FROM ***.***.***.***/bigdata/openjdk:${java_image_tag}
#ARG spark_uid=185ARG spark_uid=0
# Before building the docker image, first build and make a Spark distribution following# the instructions in https://spark.apache.org/docs/latest/building-spark.html.# If this docker file is being used in the context of building your images from a Spark# distribution, the docker build command should be invoked from the top level directory# of the Spark distribution. E.g.:# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
RUN set -ex &&     sed -i 's/http://deb.(.*)/https://deb.1/g' /etc/apt/sources.list &&     sed -i 's/http://security.(.*)/https://security.1/g' /etc/apt/sources.list &&     sed -i s@/security.debian.org/@/mirrors.aliyun.com/@g /etc/apt/sources.list &&     sed -i s@/deb.debian.org/@/mirrors.aliyun.com/@g /etc/apt/sources.list &&     apt-get update &&     ln -s /lib /lib64 &&     apt-get install -y vim sudo net-tools lsof bash tini libc6 libpam-modules krb5-user libpam-krb5 libpam-ccreds libkrb5-dev libnss3 procps &&     mkdir -p /opt/spark &&     mkdir -p /opt/spark/examples &&     mkdir -p /opt/spark/work-dir &&     mkdir -p /opt/hadoop &&     touch /opt/spark/RELEASE &&     rm /bin/sh &&     ln -sv /bin/bash /bin/sh &&     echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su &&     chgrp root /etc/passwd && chmod ug+rw /etc/passwd &&     rm -rf /var/cache/apt/*
COPY jars /opt/spark/jarsCOPY bin /opt/spark/binCOPY sbin /opt/spark/sbinCOPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/COPY kubernetes/dockerfiles/spark/decom.sh /opt/COPY examples /opt/spark/examplesCOPY kubernetes/tests /opt/spark/tests#COPY hadoop/conf /opt/hadoop/confCOPY conf /opt/spark/confCOPY data /opt/spark/dataCOPY hive.keytab /opt/spark/work-dirCOPY run.sh /opt/spark/work-dir
ENV SPARK_HOME /opt/spark
WORKDIR /opt/spark/work-dirRUN chmod 777 /opt/spark/work-dirRUN chmod a+x /opt/decom.shRUN chmod 777 /opt/spark/work-dir/run.shENTRYPOINT [ "/opt/entrypoint.sh" ]
# Specify the User that the actual main process will run asUSER ${spark_uid}

‍

6.回到/data/spark-3.3.0路径下，执行下面的命令

#创建镜像./bin/docker-image-tool.sh -t v3.3.0 build#修改镜像tagdocker tag spark:v3.3.0 ***.***.***.***/bigdata/spark:v3.3.0#将镜像push到内部库中（公司内部自建）docker push ***.***.***.***/bigdata/spark:v3.3.0

‍

创建 Kyuubi 1.6.0镜像

1.kyuubi不需要更改配置文件，官方给了更方便的方法（kyuubi-configmap.yaml）

2.编写初始化脚本 run.sh

（脚本内执行了命令，但是不一定会生效，需要在拉起来的容器中查看kubectl是否可以创建pod，”***”表示自定义的内容）

kubectl需要自己去网上下载，具体操作可百度

mkdir /etc/.kubechmod 777 /root/.kubecp /opt/kyuubi/config /root/.kube#kubectl可用的重要一步echo "export  KUBECONFIG=/etc/.kube/config" >> /etc/profileexport  KUBECONFIG=/etc/.kube/configsource /etc/profile
#将kubectl放入内网方便下载使用wget http://***.***.***.***/yum/k8s/kubectlchmod +x ./kubectlmv ./kubectl /usr/bin/#查看kubectl是否安装成功kubectl version --client
echo "***"  >> /etc/hosts
echo "***" > /etc/krb5.conf
kinit -kt /opt/kyuubi/hive.keytab hive/***@HADOOP.****.***

‍

3.修改/data/kyuubi-1.6.0/bin/kyuubi

在kyuubi run的位置添加

chmod 777 /opt/kyuubi/run.sh/bin/bash /opt/kyuubi/run.sh

‍

#!/usr/bin/env bash## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##    http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.#
## Kyuubi Server Main EntranceCLASS="org.apache.kyuubi.server.KyuubiServer"
function usage() {  echo "Usage: bin/kyuubi command"  echo "  commands:"  echo "    start        - Run a Kyuubi server as a daemon"  echo "    restart      - Restart Kyuubi server as a daemon"  echo "    run          - Run a Kyuubi server in the foreground"  echo "    stop         - Stop the Kyuubi daemon"  echo "    status       - Show status of the Kyuubi daemon"  echo "    -h | --help  - Show this help message"}
if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then  usage  exit 0fi
function kyuubi_logo() {  source ${KYUUBI_HOME}/bin/kyuubi-logo}
function kyuubi_rotate_log() {  log=$1;
  if [[ -z ${KYUUBI_MAX_LOG_FILES} ]]; then    num=5  elif [[ ${KYUUBI_MAX_LOG_FILES} -gt 0 ]]; then    num=${KYUUBI_MAX_LOG_FILES}  else    echo "Error: KYUUBI_MAX_LOG_FILES must be a positive number, but got ${KYUUBI_MAX_LOG_FILES}"    exit -1  fi
  if [ -f "$log" ]; then # rotate logs  while [ ${num} -gt 1 ]; do    prev=expr ${num} - 1    [ -f "$log.$prev" ] && mv "$log.$prev" "$log.$num"    num=${prev}  done  mv "$log" "$log.$num";  fi}
export KYUUBI_HOME="$(cd "$(dirname "$0")"/..; pwd)"
if [[ $1 == "start" ]] || [[ $1 == "run" ]]; then  . "${KYUUBI_HOME}/bin/load-kyuubi-env.sh"else  . "${KYUUBI_HOME}/bin/load-kyuubi-env.sh" -sfi
if [[ -z ${JAVA_HOME} ]]; then  echo "Error: JAVA_HOME IS NOT SET! CANNOT PROCEED."  exit 1fi
RUNNER="${JAVA_HOME}/bin/java"
## Find the Kyuubi Jarif [[ -z "$KYUUBI_JAR_DIR" ]]; then  KYUUBI_JAR_DIR="$KYUUBI_HOME/jars"  if [[ ! -d ${KYUUBI_JAR_DIR} ]]; then  echo -e "nCandidate Kyuubi lib $KYUUBI_JAR_DIR doesn't exist, searching development environment..."    KYUUBI_JAR_DIR="$KYUUBI_HOME/kyuubi-assembly/target/scala-${KYUUBI_SCALA_VERSION}/jars"  fifi
if [[ -z ${YARN_CONF_DIR} ]]; then  KYUUBI_CLASSPATH="${KYUUBI_JAR_DIR}/*:${KYUUBI_CONF_DIR}:${HADOOP_CONF_DIR}"else  KYUUBI_CLASSPATH="${KYUUBI_JAR_DIR}/*:${KYUUBI_CONF_DIR}:${HADOOP_CONF_DIR}:${YARN_CONF_DIR}"fi
cmd="${RUNNER} ${KYUUBI_JAVA_OPTS} -cp ${KYUUBI_CLASSPATH} $CLASS"
pid="${KYUUBI_PID_DIR}/kyuubi-$USER-$CLASS.pid"
function start_kyuubi() {  if [[ ! -w ${KYUUBI_PID_DIR} ]]; then    echo "${USER} does not have 'w' permission to ${KYUUBI_PID_DIR}"    exit 1  fi
  if [[ ! -w ${KYUUBI_LOG_DIR} ]]; then    echo "${USER} does not have 'w' permission to ${KYUUBI_LOG_DIR}"    exit 1  fi
  if [ -f "$pid" ]; then    TARGET_ID="$(cat "$pid")"    if [[ $(ps -p "$TARGET_ID" -o comm=) =~ "java" ]]; then      echo "$CLASS running as process $TARGET_ID  Stop it first."      exit 1    fi  fi
  log="${KYUUBI_LOG_DIR}/kyuubi-$USER-$CLASS-$HOSTNAME.out"  kyuubi_rotate_log ${log}
  echo "Starting $CLASS, logging to $log"  nohup nice -n "${KYUUBI_NICENESS:-0}" ${cmd} >> ${log} 2>&1 /dev/null &  newpid="$!"
  echo "$newpid" > "$pid"
  # Poll for up to 5 seconds for the java process to start  for i in {1..10}  do    if [[ $(ps -p "$newpid" -o comm=) =~ "java" ]]; then       break    fi    sleep 0.5  done
  sleep 2  # Check if the process has died; in that case we'll tail the log so the user can see  if [[ ! $(ps -p "$newpid" -o comm=) =~ "java" ]]; then    echo "Failed to launch: ${cmd}"    tail -2 "$log" | sed 's/^/  /'    echo "Full log in $log"  else    echo "Welcome to"    kyuubi_logo  fi}
function run_kyuubi() {  echo "Starting $CLASS"  nice -n "${KYUUBI_NICENESS:-0}" ${cmd}}
function stop_kyuubi() {  if [ -f ${pid} ]; then    TARGET_ID="$(cat "$pid")"    if [[ $(ps -p "$TARGET_ID" -o comm=) =~ "java" ]]; then      echo "Stopping $CLASS"      kill "$TARGET_ID" && rm -f "$pid"      for i in {1..20}      do        sleep 0.5        if [[ ! $(ps -p "$TARGET_ID" -o comm=) =~ "java" ]]; then          break        fi      done
      if [[ $(ps -p "$TARGET_ID" -o comm=) =~ "java" ]]; then        echo "Failed to stop kyuubi after 10 seconds, try 'kill -9 ${TARGET_ID}' forcefully "      else        kyuubi_logo        echo "Bye!"      fi    else      echo "no $CLASS to stop"    fi  else    echo "no $CLASS to stop"  fi}
function check_kyuubi() {  if [[ -f ${pid} ]]; then    TARGET_ID="$(cat "$pid")"    if [[ $(ps -p "$TARGET_ID" -o comm=) =~ "java" ]]; then      echo "Kyuubi is running (pid: $TARGET_ID)"    else      echo "Kyuubi is not running"    fi  else    echo "Kyuubi is not running"  fi}
case $1 in  (start | "")    start_kyuubi    ;;
  (restart)    echo "Restarting Kyuubi"    stop_kyuubi    start_kyuubi    ;;
  (run)    chmod 777 /opt/kyuubi/run.sh    /bin/bash /opt/kyuubi/run.sh    run_kyuubi    ;;
  (stop)    stop_kyuubi    ;;
  (status)    check_kyuubi    ;;
  (*)    usage    ;;esac

‍

4.编辑/data/kyuubi-1.6.0/docker/Dockerfile

关键内容：

修改 openjdk 的源
修改拉取 debian 的源
安装 wget vim sudo net-tools lsof bash tini libc6 libpam-modules krb5-user libpam-krb5 libpam-ccreds libkrb5-dev libnss3 procps 等软件
复制keytab文件到/opt/kyuubi路径下
复制初始化脚本run.sh，用来在镜像拉起后进行修改/etc/hosts文件
设置user用户为0（root）（使用root，或者0都行）

# Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##    http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.#
# Usage:#   1. use ./build/dist to make binary distributions of Kyuubi or download a release#   2. Untar it and run the docker command below#      docker build -f docker/Dockerfile -t repository/kyuubi:tagname .#   Options:#     -f this docker file#     -t the target repo and tag name#     more options can be found with -h
ARG BASE_IMAGE=***.***.***.***/bigdata/openjdk:8-jre-slimARG spark_provided="spark_builtin"
FROM ${BASE_IMAGE} as builder_spark_providedONBUILD ARG spark_home_in_dockerONBUILD ENV SPARK_HOME ${spark_home_in_docker}
FROM ${BASE_IMAGE} as builder_spark_builtin
ONBUILD ENV SPARK_HOME /opt/sparkONBUILD RUN mkdir -p  ${SPARK_HOME}ONBUILD COPY spark-binary ${SPARK_HOME}
FROM builder_${spark_provided}
ARG kyuubi_uid=10009USER root
ENV KYUUBI_HOME /opt/kyuubiENV KYUUBI_LOG_DIR ${KYUUBI_HOME}/logsENV KYUUBI_PID_DIR ${KYUUBI_HOME}/pidENV KYUUBI_WORK_DIR_ROOT ${KYUUBI_HOME}/work
RUN set -ex &&     sed -i 's/http://deb.(.*)/https://deb.1/g' /etc/apt/sources.list &&     sed -i 's/http://security.(.*)/https://security.1/g' /etc/apt/sources.list &&     sed -i s@/security.debian.org/@/mirrors.aliyun.com/@g /etc/apt/sources.list &&     sed -i s@/deb.debian.org/@/mirrors.aliyun.com/@g /etc/apt/sources.list &&     apt-get update &&     apt-get install -y wget vim sudo net-tools lsof bash tini libc6 libpam-modules krb5-user libpam-krb5 libpam-ccreds libkrb5-dev libnss3 procps &&     useradd -u ${kyuubi_uid} -g root kyuubi &&     mkdir -p ${KYUUBI_HOME} ${KYUUBI_LOG_DIR} ${KYUUBI_PID_DIR} ${KYUUBI_WORK_DIR_ROOT} &&     chmod ug+rw -R ${KYUUBI_HOME} &&     chmod a+rwx -R ${KYUUBI_WORK_DIR_ROOT} &&     rm -rf /var/cache/apt/*
COPY bin ${KYUUBI_HOME}/binCOPY jars ${KYUUBI_HOME}/jarsCOPY beeline-jars ${KYUUBI_HOME}/beeline-jarsCOPY externals/engines/spark ${KYUUBI_HOME}/externals/engines/sparkCOPY hive.keytab /opt/kyuubiCOPY config /opt/kyuubiCOPY run.sh /opt/kyuubi

WORKDIR ${KYUUBI_HOME}
CMD [ "./bin/kyuubi", "run" ]
USER ${kyuubi_uid}

USER root

‍

5.回到/data/kyuubi-1.6.0路径下执行下面的命令

#创建镜像./bin/docker-image-tool.sh -S /opt/spark -b BASE_IMAGE=***.***.***.***/bigdata/spark:v3.3.0 -t v1.6.0  build#修改镜像tagdocker tag kyuubi:v1.6.0 ***.***.***.***/bigdata/kyuubi:v1.6.0#将镜像push到内部库中docker push ***.***.***.***/bigdata/kyuubi:v1.6.0

‍

修改 Kyuubi 服务 yaml 文件

1.2 TPCDS 数据集

修改/Kyuubi/Docker/Kyuubi-configmap.yaml

1.添加ns信息：namespace:2

2.添加 Kyuubi-env.sh 和 Kyuubi-defaults.conf 配置内容

apiVersion: v1kind: ConfigMapmetadata:  namespace: ****-bd-k8s  name: kyuubi-defaultsdata:  kyuubi-env.sh: |     export SPARK_HOME=/opt/spark     export SPARK_CONF_DIR=${SPARK_HOME}/conf     export HADOOP_CONF_DIR=${SPARK_HOME}/conf:${SPARK_HOME}/conf
     export KYUUBI_PID_DIR=/opt/kyuubi/pid     export KYUUBI_LOG_DIR=/opt/kyuubi/logs     export KYUUBI_WORK_DIR_ROOT=/opt/kyuubi/work     export KYUUBI_MAX_LOG_FILES=10  kyuubi-defaults.conf: |    #      ## Kyuubi Configurations                 #     # kyuubi.authentication           NONE    # kyuubi.frontend.bind.host       localhost    # kyuubi.frontend.bind.port       10009    #
    # Details in https://kyuubi.apache.org/docs/latest/deployment/settings.html      kyuubi.authentication=KERBEROS      kyuubi.kinit.principal=hive/****-****-****-****@****.****.****      kyuubi.kinit.keytab=/opt/kyuubi/hive.keytab            #很重要的一个内容，避免kyuubi服务起来后，通过hostname无法链接，使用该参数表示使用ip链接      kyuubi.frontend.connection.url.use.hostname false
      kyuubi.engine.share.level=USER      kyuubi.session.engine.idle.timeout=PT1H
      kyuubi.ha.enabled=true      kyuubi.ha.zookeeper.quorum=***.***.***.***:2181,***.***.***.***:2181,***.***.***.***:2181      kyuubi.ha.zookeeper.namespace=kyuubi_on_k8s
      spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf
      spark.kubernetes.trust.certificates=true
      spark.kubernetes.file.upload.path=hdfs:///user/spark/k8s_upload

‍

修改/kyuubi/docker/kyuubi-deployment.yaml

1.修改元信息：namespace

2.修改镜像信息：image

apiVersion: apps/v1kind: Deploymentmetadata:  namespace: ****-bd-k8s  name: kyuubi-deployment-example  labels:    app: kyuubi-serverspec:  replicas: 1  selector:    matchLabels:      app: kyuubi-server  template:    metadata:      labels:        app: kyuubi-server    spec:      imagePullSecrets:        - name: harbor-pull      containers:        - name: kyuubi-server          # TODO: replace this with the stable tag          image: ***.***.***.***/bigdata/kyuubi:v1.6.0          #image: apache/kyuubi:master-snapshot          imagePullPolicy: Always          env:            - name: KYUUBI_JAVA_OPTS              value: -Dkyuubi.frontend.bind.host=0.0.0.0          ports:            - name: frontend-port              containerPort: 10009              protocol: TCP          volumeMounts:            - name: kyuubi-defaults              mountPath: /opt/kyuubi/conf      volumes:        - name: kyuubi-defaults          configMap:            name: kyuubi-defaults          #secret:                  #secretName: kyuubi-defaults

‍

修改/kyuubi/docker/kyuubi-service.yaml

1.修改元信息：namespace

apiVersion: v1kind: Servicemetadata:  namespace: ****-bd-k8s  name: kyuubi-example-servicespec:  ports:    # The default port limit is 30000-32767    # to change:    #   vim kube-apiserver.yaml (usually under path: /etc/kubernetes/manifests/)    #   add or change line 'service-node-port-range=1-32767' under kube-apiserver    - nodePort: 30009      # same of containerPort in pod yaml      port: 10009      protocol: TCP  type: NodePort  selector:    # same of pod label    app: kyuubi-server

‍

在 k8s 的客户端节点运行 Kyuubi 服务

1.2 TPCDS 数据集 运行configmap

运行 configmap

kubectl apply -f docker/kyuubi-configmap.yaml

运行 deployment

kubectl apply -f docker/kyuubi-deployment.yaml

运行 svc

kubectl apply -f docker/kyuubi-service.yaml

使用 kyuubi 客户端节点本地 beeline 进行连接

1.2 TPCDS 数据集

./bin/beeline -u 'jdbc:hive2://***.***.***.***:30009/default;principal=hive/***.***.***.***@HADOOP.****.TECH?spark.master=k8s://https://****.****.****/****/****/****;spark.submit.deployMode=cluster;spark.kubernetes.namespace=****-bd-k8s;spark.kubernetes.container.image.pullSecrets=harbor-pull;spark.kubernetes.authenticate.driver.serviceAccountName=flink;spark.kubernetes.trust.certificates=true;spark.kubernetes.executor.podNamePrefix=kyuubi-on-k8s;spark.kubernetes.container.image=***.***.***.***/bigdata/spark:v3.3.0;spark.dynamicAllocation.shuffleTracking.enabled=true;spark.dynamicAllocation.enabled=true;spark.dynamicAllocation.maxExecutors=10;spark.dynamicAllocation.minExecutors=5;spark.executor.instances=5;spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf' "$@"