Abstract:
Quality control is essential and indispensable for high-quality meteorological observation data. Data quality control with higher refinement can be achieved by spatial consistency test method. For the classical spatial consistency test method, meteorological elements have to be distributed in a continuous and uniform manner, hence the method is not effective in complex weather systems. This paper proposes a new spatial consistency test method based on data mining technology. The temperature, humidity and 2-minute average wind speed data of five adjacent stations are used as the input of the random forest regression model, and then the model outputs the predicted temperature of the test station. After multiple tests, the mean square error between the predicted value and the real value is calculated. The research results show that the random forest regression algorithm outperforms the spatial regression test algorithm in temperature, humidity and 2-minute average wind speed. Meanwhile, thanks to its high speed and the automatic threshold setting, the random forest method can effectively reduce the time complexity of the algorithm and meet the real-time operational requirements. These advantages are conducive to the application of random forest algorithm in meteorological data quality control.