默认配置好了java的环境变量,默认下载了spark的spark-3.0.0-bin-hadoop3.2.tgz安装包和jackson-databind-2.10.1.jar包,默认python版本3.6-3.8
spark包解压到D盘的spark文件夹下,同时在这个文件夹下创建jara文件夹和log文件夹,其中jara文件夹里面装了jar包。
spark-defaults.conf.template文件添加如下:
spark.driver.extraClassPath D:\spark\jara\jackson-databind-2.10.1.jar
spark.executor.extraClassPath D:\spark\jara\jackson-databind-2.10.1.jar
spark.executor.memory 4G
spark.driver.memory 2G
spark.executor.cores 2
spark.driver.cores 1
spark.eventLog.dir file:///D:/spark/log/event_logs
spark.history.fs.logDirectory file:///D:/spark/log/history_logs
spark-env.sh文件添加如下,其中java环境变量需要自己查看自己本机的具体地址
SET JAVA_HOME=D:\java\java8
SET HADOOP_HOME=D:\hadoop\bin\winutils-master\hadoop-3.0.0
SET PATH=%PATH%;%HADOOP_HOME%\bin
SET SPARK_LOG_DIR=D:\spark\log\logs
SET SPARK_EVENT_LOG_DIR=D:\spark\log\event_logs
SET SPARK_HISTORY_OPTS=-Dspark.history.fs.logDirectory=D:\spark\log\history_logs
SET SPARK_EXECUTOR_MEMORY=4G
SET SPARK_DRIVER_MEMORY=2G
SET SPARK_EXECUTOR_CORES=2
SET SPARK_DRIVER_CORES=1
配置spark和hadoop的环境变量,然后执行
D:\spark\spark-3.0.0-bin-hadoop3.2\spark-3.0.0-bin-hadoop3.2\bin\spark-submit --master local[*] --executor-memory 1G --total-executor-cores 2 --conf spark.pyspark.python=C:\Users\张阿春\AppData\Local\Programs\Python\Python36\python.exe "D:\car_data_clean\hadoop\hadoop_call_computation\example.py"