spark没搭建好的可以查看之前的教程https://abytelalala.cn/index.php/2024/06/26/%e5%9f%ba%e4%ba%8eubentu%e4%b8%8adocker%ef%bc%8chbase%e7%9a%84spark%e9%83%a8%e7%bd%b2/
首先进入hadoop01
docker exec -it hadoop01 bash
su -
apt-get update --fix-missing
apt install python3-pip
pip --version
pip install --upgrade six==1.15
pip install happybase
mkdir /py //我们为了方便构造一个脚本目录
exit
然后就可以提交任务了,我们这里以python为例子。
有一个example.py的脚本如下:
from pyspark.sql import SparkSession
def main():
spark = SparkSession.builder.appName("ExampleApp").getOrCreate()
data = [("Alice", 1), ("Bob", 2), ("Cathy", 3)]
df = spark.createDataFrame(data, ["Name", "Value"])
df.show()
spark.stop()
if __name__ == "__main__":
main()
然后创建py文件夹,你需要手动把脚本拖进去py文件夹里
docker cp /home/cust/py/example.py hadoop01:/py/ //移动到docker里
//然后在主节点上直接运行这个
/usr/local/spark/bin/spark-submit --master spark://192.168.1.110:7077 --executor-memory 1G --total-executor-cores 2 /py/example.py