Spark cluster optimization configuration

Feedback


iServer's distributed analysis service is based on the Spark computing platform, providing GIS distributed analysis and processing capabilities. Different hardware environments, Spark cluster environments, and analysis data of different size all affect the performance of distributed analysis. To achieve the best performance, you need to optimize the configuration according to different scenarios. Here are some commonly used optimization methods:

  1. When Spark is running, it will start the executor to perform the task, and you can optimize the data processing efficiency by adjusting the memory allocated to the executor (iServer built-in Spark defaults to 4G) in the Spark configuration file according to the actual situation of your machine. The method is:
  1. When the result data of the analysis is relatively large, it will consume many system hardware resources when the spark cluster master node collects the result data from various sub nodes and then stores the data in the local file or iServer DataStore. In order to improve the efficiency of analysis, you can take the following optimal configuration method:
  1. There are two main scheduling modes  in Spark: FIFO (First In First Out ) and FAIR (Fair Dispatch). The iServer built-in Spark runtime uses FAIR, which can process multiple analysis jobs concurrently. Spark defaults to FIFO mode.You can set according to the actual situation. The method is: