Application of the Statistical Error and Quantitative Performance Measures in the Evaluation Process of Short-Term Air Quality Forecasts for Krakow

Currently, the results of mathematical air quality modelling serve many purposes (e.g. global and local pollution prevention strategies, industrial emissions reduction, actions of local governments). Additionally, in the past few years, due to continuous digital technology development, prognostic models have been utilised for drawing up publicly available short-term forecasts of the upcoming air quality. This information is of particular importance in the areas of densely populated cities, which are frequently struggling with poor air quality. Issuing alerts about possible exceedances of the air quality standards [8] helps to mitigate adverse eff ects on the health of residents [6]. Krakow, located in the south-eastern Poland and with more than 750 000 inhabitants [3], has been recently qualifi ed as one of the most polluted European cities due to excessive concentrations of PM2.5 [11]. In 2011, on behalf of the Marshal Offi ce of Małopolska Voivodeship, an air quality forecasting system based on the results of global multiscale chemical weather modelling system GEM-AQ was developed and fully operational since then [23, 24]. The air quality forecasts for the area of Małopolska Voivodeship and major cities within are prepared on the basis of a modelling system created by the EkoForecast foundation, which utilises a global multiscale chemical weather model – GEM-AQ [9]. In order to ensure the correct functioning of the short-term air pollution forecasting system, an appropriate assessment at every stage of this application is essential. Basically, the validation process of a mathematical model consists of scientifi c, operational and statistical evaluation. Scientifi c evaluation requires a thorough knowledge of the model’s basis. It examines the accuracy of described physical and chemical processes, as well as other assumptions in the model. Operational


Introduction
Currently, the results of mathematical air quality modelling serve many purposes (e.g.global and local pollution prevention strategies, industrial emissions reduction, actions of local governments).Additionally, in the past few years, due to continuous digital technology development, prognostic models have been utilised for drawing up publicly available short-term forecasts of the upcoming air quality.This information is of particular importance in the areas of densely populated cities, which are frequently struggling with poor air quality.Issuing alerts about possible exceedances of the air quality standards [8] helps to mitigate adverse eff ects on the health of residents [6].
Krakow, located in the south-eastern Poland and with more than 750 000 inhabitants [3], has been recently qualifi ed as one of the most polluted European cities due to excessive concentrations of PM2.5 [11].In 2011, on behalf of the Marshal Offi ce of Małopolska Voivodeship, an air quality forecasting system based on the results of global multiscale chemical weather modelling system GEM-AQ was developed and fully operational since then [23,24].The air quality forecasts for the area of Małopolska Voivodeship and major cities within are prepared on the basis of a modelling system created by the EkoForecast foundation, which utilises a global multiscale chemical weather model -GEM-AQ [9].
In order to ensure the correct functioning of the short-term air pollution forecasting system, an appropriate assessment at every stage of this application is essential.Basically, the validation process of a mathematical model consists of scientifi c, operational and statistical evaluation.Scientifi c evaluation requires a thorough knowledge of the model's basis.It examines the accuracy of described physical and chemical processes, as well as other assumptions in the model.Operational evaluation considers user-related issues connected with, among others, the user interface, error checking of data and internal model diagnostics.Statistical assessment focuses mainly on the comparison between forecasted and measured observations.This method is rather intuitive and may not provide precise reasons for the divergence between observations, but it gives information about the nature and severity of possible error [2,7,14].Due to its simplicity, this assessment should be carried out regularly to ensure reliable air quality forecasts and to improve general performance of the modelling system.
Basic statistical analysis may be performed with the use of the typical error measurements and correlation coeffi cients (Pearson, Spearman) for assessing model accuracy, which are applicable in many fi elds (economics, weather and air quality prediction).Those can be extended with a set of quantitative performance measures suggested by the U.S. Environmental Protection Agency as a basis for air quality model evaluation.To facilitate the interpretation of the values obtained one can begin with plott ing the data in diff erent ways (scatt er, quantile-quantile, residual or conditional scatt er plots) [4,5].

Description of the Short-Term Air Quality Forecasting System for Krakow
Short-term air quality forecasts drawn up for the area of Krakow are based on the deterministic system modelling dynamics and atmospheric chemistry -GEM-AQ (Global Environmental Multiscale -Air Quality).It was developed as a result of extending the operational weather prediction model GEM by implementing air quality chemistry processes (i.e.transport, deposition, emission, limited wet chemistry).The GEM model was originally created by the Meteorological Services of Canada (MSC) and is presently used for weather prediction over Canada.Current mechanism in the GEM-AQ model contains 50 gas-phase compounds, 116 chemical and 19 photolysis reactions.The CAM module (Canadian Aerosol Model) implemented in this model describes 5 aerosol types (sulphate, sea-salt, organic carbon, black carbon and soil dust) and their physiochemical reactions.Gas-phase chemistry is based on the modifi ed ADOM model (Acid Deposition and Oxidants Model).The chemical module is operating "on-line" -advection of the chemical compounds is performed at each timestep.Advection and vertical diff usion processes are computed with Semi-Lagrangian scheme deriving from the GEM model.The variable resolution capability makes it possible to perform high resolution local simulations based on the global runs [9, 12,15].
Calculations in the short-term air quality forecast system for Małopolska Voivodeship are performed in a two-stage run.Results of the global simulation over Central Europe with variable resolution of 0.135 (3-D meteorological fi elds, chemical composition of the atmosphere) are used as initial and boundary conditions for the nested run.Nested simulation covers the territory of Poland with resolution of 0.05 (Fig. 1).The forecast horizon is 75 hours (from 21:00 UTC on the previous day) [24].The emission input data have been prepared with the EMEP inventories with 0.5° resolution for 2011 and 2012 [19,20].The emission rates of PM coarse , PM2.5, nitrogen oxides, sulphur dioxide, carbon oxide and NMVOCs have been used in the calculations.
The results of the analysed air quality forecasting system are accessible via the web service Wrota Małopolski of the Marshal Offi ce [24] and available for selected cities within the area of Małopolska Voivodeship.Forecasts are presented in the form of daily averaged concentrations of the following compounds: PM10, PM2.5, nitrogen dioxide (NO 2 ), sulphur dioxide (SO 2 ), carbon oxide (CO) and ozone (O 3 ) for the next three days.An additional statistical correction of forecast data is performed to obtain bett er results [19,20,24].
In addition, maps of the distribution of the Common Air Quality Index (CAQI) over Małopolska Voivodeship are presented at this website.This indicator was created in the CITEAIR project and has been used at the Air Quality in Europe web service [1] for assessing air quality among more than 100 European cities from 2006.Its value is computed based on the three main pollutants in Europe: PM10, nitrogen dioxide (NO 2 ) and ozone (O 3 ), which can be extended by additional substances: PM2.5, sulphur dioxide (SO 2 ) and carbon oxide (CO) [1,10].Therefore, an accurate and reliable forecasting of these concentrations is notably signifi cant, as they are forming the overall air quality in the region.

Description of the ex post Forecast Error Measurements
The parameters applied in this study indicate diff erences between the actual and forecasted values and have been selected from a larger set of variables.Those are described below [17-20, 22, 25, 26]: -Mean Bias Error: -Mean Percentage Error: -Mean Absolute Bias Error: -Mean Absolute Percentage Error: -Root Mean Square Error: where C P and C O denotes a predicted and observed concentration, respectively, and is a total amount of observations during the time series (τ = 1, 2, ..., m).Ideally, the value of MBE should be equal or close to zero.A positive or negative value indicates overprediction or underprediction of prognostic model, respectively.A signifi cant diff erence between RMSE and MABE point to presence of high values of discrepancies between observed and predicted concentrations [2,26].

Description of Quantitative Performance Measures (US EPA)
In order to assess the general performance of the analysed prognostic model, a set of quantitative measures developed in 1993 by the National Environmental Research Institute (NERI) of Denmark was applied in this study.They have been recommended by the US Environmental Protection Agency for air quality model evaluation tool and later implemented in a software package BOOT (Statistical Model Evaluation Software Package, Version 2.0).Those measures include [4,5,16,21]: a) Fractional Bias: e) Fraction within a factor of two: where C P and C O denotes a predicted and observed concentration, respectively, and m represents a total amount of observations during the τ time series.An ideal model would have MG, VG and FAC2 = 1.0, and FB and NMSE = 0.0.Based on the evaluation of many models with many fi eld data sets, model acceptance criteria for those measures were developed [4,21]: -fraction within a factor of two should be equal or greater than 50% (FAC2 > 0.5), -|FB| < 0.3 or 0.7 < MG < 1.3, -values of NMSE and VG should be less than 1.5 and 4, respectively.

Statistical Evaluation of Short-Term Forecasts for Krakow
In this study a brief comparison of prognostic and observed concentrations of selected air pollutants in Krakow was carried out.Prognostic observations regarding the upcoming daily average concentrations of PM10, PM2.5, SO 2 , NO 2 and O 3 were systematically collected over a period of April 2014 -March 2015 from the Wrota Małopolski web service of the Marshal Offi ce of Małopolska Voivodeship [24].Those values were subsequently confronted with the corresponding measurements recorded at the urban background measuring station (Bujaka St., Krakow) at the same time.This station belongs to the surveillance grid of the Voivodeship Environmental Protection Inspectorate in Krakow and it has been operational since 2010.Continuous measures at this station are performed for the following substances: PM10, PM2.5, nitrogen oxides (NO x ), nitrogen dioxide (NO 2 ), nitrous oxide (NO), sulphur dioxide (SO 2 ) and ozone (O 3 ).It is located in the southern part of Krakow (geogr.50.010575; 19.949189) in a residential area.There are no signifi cant emission sources in the area surrounding the station [13].
Below the time series of daily average modelled and measured concentrations of the analysed air pollutants are presented (Figs 2-6).They are supplemented by the values of Spearman correlation coeffi cients (Tab. 1) and the results of statistical evaluation of the examined modelling system (Tabs 2, 3).Due to the variability of the observed concentrations during the year, the error statistics have been calculated with additional division of the analysed time period to non-heating season (April -September) and heating season (October -March).Predicted PM10 concentrations are properly modelled by the evaluated forecasting system as evidenced by the high correlation coeffi cient (0.79) and the value of FAC2 = 0.801.However, over the whole analysed period of time the model tends In the case of PM2.5 predictions the overestimation tendency of the model is similar to those observed for PM10 forecasts.It is yet consistently lower in value (MBE = 6.97 μg/m 3 ) which provides a slightly higher correlation rate of observations (r = 0.82).It is important to note that the percentage absolute percentage errors are comparable (MAPE PM10 = 61.17%;MAPE PM2.5 = 63.38%) and bett er representation of PM2.5 observations may be superfi cial.to slightly overestimate the observed concentrations of PM10 (MBE = 12.70 μg/m 3 ).Higher rates of errors have been obtained during the heating season with a maximum of 163 μg/m 3 (on December 7 th 2014).
Daily average sulphur dioxide concentrations measured at Bujaka St. station only rarely exceed 20 μg/m 3 whereas model predictions indicates the possible occurrence of signifi cantly higher values during the analysed year of observations.Thus, error statistics are great in value with MBE = 14.13 μg/m 3 and MAPE = 382.74%.Accordingly, the quantitative performance measures have defi cient values (FB = −1.148;MG = 0.250; FAC2 = 0.180) in reference to the assumed model quality criteria.Observed variation of the forecasted nitrogen dioxide concentrations during the non-heating season are not coherent with the measured values which aff ects the lower value of Spearman correlation coeffi cient (r = 0.49).For most of the observed days the evaluated prognostic model is overestimating NO 2 concentrations.The values of error statistics are yet not signifi cantly high (MAPE = 54.80%)which allows meeting the model acceptance criteria (FB = −0.254;MG = 0.8; FAC2 = 0.793).High discrepancies between modelled and measured ozone concentrations are particularly evident during the non-heating season.Over warm months of the year GEM-AQ modelling system produces highly overestimated predictions for this air pollutant (MBE NHS = 63.95 μg/m 3 ), yet the mean percentage error value remains similar for the heating season (MPE NHS = 191.55%and MPE HS = 194.88%).As a result the correlation coeffi cient denotes a moderate relationship between observations (r = 0.52).Although the model performs bett er during the heating season (FAC2 HS = 0.524), the overall error statistics indicate insuffi cient reliability of the short-term forecasts of ozone in the EkoForecast system.

Summary and Conclusions
The main objective of the carried out analysis was to perform a statistical evaluation of the results of a short-term air quality forecasting system for the area of Krakow.Available forecasted data derived from the Wrota Małopolski web service [24] were compared to the urban background measurements at the station located in Krakow, Bujaka St., over the period of April 2014 -March 2015 with heating (October -March) and non-heating (April -September) seasons taken into account.
In general, the analysed observations show a tendency of overestimation in reference to the actual concentrations recorded at the urban background station, which is particularly noticeable during the non-heating season (in the case of sulphur dioxide and ozone).The results obtained indicate a good reliability of PM10 and PM2.5 forecasted daily average concentrations as evidenced by the high Spearman correlation coeffi cient values (0.79 and 0.82, respectively) as they are also meeting the model acceptance criteria [4].Despite the rather accurate realization of the short-term forecasts of nitrogen dioxide in relation to the error statistics values, the correlation coeffi cient indicates moderate correlation rate of the modelled and measured observations (r = 0.49).This might be due to the higher variability of forecasted values compared to the actual observations during the analysed time series.In the case of sulphur dioxide and ozone forecasts the values of the statistical measures indicate high discrepancies in reference to the measurement data, which qualifi es the assessed model as insuffi ciently accurate in predicting the upcoming concentrations of these compounds.
It is important to emphasize that a general evaluation of the analysed modelling system has been published annually for the area of Małopolska Voivodeship since 2010.Those reports [19][20] point to the on-going process of improving the model confi guration using more recent input data.The extended statistical evaluation presented in this paper is consistent with those reports in relation to most of the concerned pollutants, showing generally good reliability of predicted values with slight overestimation noticeable during the non-heating season (especially for PM10 and PM2.5).Discrepancies regarding the results of the assessment for ozone predictions may be related to diff erent averaging time of data for the evaluation or with the availability of more accurate modelling data for the location of Kurdwanów urban background monitoring station in Krakow.
The conducted analysis highlights the importance of publicly available air quality forecasting data system and its limitations associated with spatial accuracy and data quality.Furthermore, discrepancies in the forecasted air pollutants concentrations are likely to aff ect the verifi ability of the Common Air Quality Index [1,10] which is calculated on the basis of these values and published at the Wrota Małopolski website for general information about current air quality.An additional examination of the model characteristics is recommended for adequate identifi cation of the causes and nature of the noticed errors in order to improve the air quality management system in Krakow [22].

Fig. 1 .
Fig. 1.Visualization of the global (a) and nested (b) run in the GEM-AQ model for air quality forecasting system in Małopolska VoivodeshipSource:[19,20]

Table 3 .
Selected quantitative performance measures (US EPA) for GEM-AQ model evaluation -daily average concentrations [μg/m 3 ]