基于数据挖掘的气象三要素空间一致性检验方法

The Spatial Consistency Test Method for the Three Elements of Meteorology Based on Data Mining

  • 摘要: 质量控制是确保高质量气象观测数据必不可少的重要环节,空间一致性检验有助于更加精细化的数据质量控制。经典的空间一致性检验方法假设气象要素分布具有连续性、均匀性特征,但在复杂天气系统中效果不佳。因此,基于数据挖掘技术提出了一种新的空间一致性检验方法。将邻近5个台站的温度、湿度和2 min平均风速数据分别作为随机森林回归模型的输入,模型输出测试台站的预测温度,通过多次试验计算出预测值与真实值的均方误差。试验表明:随机森林回归算法在温度、湿度和2 min平均风速上都优于空间回归检验算法。同时随机森林算法运行速度快,并且不用人工手动设置阈值,它可以有效地降低算法的时间复杂度,运行时效可以满足实时业务需求,这为将随机森林算法应用到气象数据质量控制中提供了有力的支撑。

     

    Abstract: Quality control is essential and indispensable for high-quality meteorological observation data. Data quality control with higher refinement can be achieved by spatial consistency test method. For the classical spatial consistency test method, meteorological elements have to be distributed in a continuous and uniform manner, hence the method is not effective in complex weather systems. This paper proposes a new spatial consistency test method based on data mining technology. The temperature, humidity and 2-minute average wind speed data of five adjacent stations are used as the input of the random forest regression model, and then the model outputs the predicted temperature of the test station. After multiple tests, the mean square error between the predicted value and the real value is calculated. The research results show that the random forest regression algorithm outperforms the spatial regression test algorithm in temperature, humidity and 2-minute average wind speed. Meanwhile, thanks to its high speed and the automatic threshold setting, the random forest method can effectively reduce the time complexity of the algorithm and meet the real-time operational requirements. These advantages are conducive to the application of random forest algorithm in meteorological data quality control.

     

/

返回文章
返回