Welcome to the Third Annual 2009 AMS AI Contest

Sponsored by the
American Meteorological Society Committee on Artificial Intelligence Applications to Environmental Science

Note: Due date for predictions changed to 23:59 MST Jan 7, 2010.

"Uncertainty is thus a fundamental characteristic of weather, seasonal climate, and hydrological prediction, and no forecast is complete without a description of its uncertainty."
... from Completing the Forecast: Characterizing and Communicating Uncertainty for Better Decisions Using Weather and Climate Forecasts, National Academies Press.

A forecast that includes an expression of uncertainty or a probability can be much more useful than the forecast of a single value. For this reason, the 2009 AMS Artificial Intelligence Competition will be based on forecasting a probability. The challenge for this year's competition is to predict the probability of turbulence exceeding a specific threshold.

The AMS Artificial Intelligence contest is open to all, and is intended to promote the study of statistical artificial intelligence techniques applied to meteorology.

Turbulence Prediction: Problem Description

The AI Contest this year is focused on predicting atmospheric turbulence that affects aviation. It uses a dataset collected during summer months (June - September) in which convectively-induced turbulence (CIT)--turbulence in and around thunderstorms--is particularly prevalent, though mountain-wave turbulence (MWT) and clear-air turbulence (CAT) are also present. Studies have suggested that CIT is responsible for over 60% of turbulence-related aircraft accidents; thus, accurate real-time turbulence diagnoses that include CIT could improve airline safety and also help mitigate the significant delays that now frequently afflict the national airspace system during periods of widespread convection.

The mechanisms for the generation and propagation of atmospheric turbulence, and CIT in particular, are a topic of current research and are still only partially understood. However, the likelihood of CIT is thought to be related to the proximity (vertical and horizontal), intensity, depth and extent of convection as well as the state of the atmosphere around the storm. It seems plausible that an empirical model that uses numerical weather prediction model data to get an indication of larger-scale environmental conditions and satellite, radar reflectivity, and lightning observations that indicate the extent and severity of the storms and associated clouds could have good skill in predicting turbulence. MWT could similarly be modeled based on location (e.g., the presence of rough topography) as well as environmental conditions reflected in numerical weather prediction model data, and CAT may be predicted based on environmental conditions. Observations of turbulence generally entail pilot reports or automated reports. Automated reports of eddy dissipation rate (EDR, a measure of atmospheric turbulence) produced every minute by a collection of commercial aircraft will provide the "truth" data for this contest.

Entering the AMS AI Contest

Anybody may enter as follows.

We will announce the winners after receiving the papers.

Judging Criteria

There are two important attributes for probabilistic forecasts - reliability and resolution. For a probabilistic forecast to be reliable, the frequency of an observed event, should agree with the forecasted probability value. For example, when a forecast of 20% is made, one should observe this event 20% of the time. When this is true, a forecast is considered reliable.

However, a reliable forecast is not necessarily a useful forecast. By only forecasting the long-term chance of an event occurring, one would have a reliable forecast. However, one can see this would be of limited utility. For this reason, we also need to consider the resolution of a forecast. A forecast with perfect resolution will always correctly forecast either 0% or 100%. A completely random forecast or a completely consistent forecast such as the climatological average probability has no resolution.

To reward both reliability and resolution, the forecasts in this competition will be assessed using the Brier Skill Score (BSS). The Brier Skill Score combines features of resolution, reliability and observational uncertainty. The reliability component of the Brier Skill Score is the standard deviation of the difference between the forecast probability and the average frequency of the observed value corresponding to that forecast. This component should be minimized. The resolution component is the variance of the difference between the climatological frequency of an event occurring and the individual forcasts. This value should be maximized. This is done when forecasts are either 0% or 100% in correct proportions to the climatological frequency.

It should be noted that the Brier Skill Score is not without its weaknesses. The value of the BSS, like all skill scores, is dependent on the sample climatology. Different climatologies will result in different scores. In this competition, everyone will be using the same sample dataset, so comparing scores is appropriate. Also, with these two components, a single Brier Skill Score can be the result of different combinations of resolution and reliability components. In the real world, different uses will have specific requirements for resolution and reliability.

Further information about calculating the Brier Skill Score can be found in the following references.

Training Dataset Format

The training dataset contains 103990 data rows, and 136 columns. The dataset is in ASCII format with comma separated values. The columns are described below under "Variables".

The ismog column is the predictand variable. The peak_edr is provided for your information but is not to be used as a predictor variable. The peak_edr is not found in the test dataset.

The remaining variables might, or might not, be useful in predicting ismog.

All variables can contain missing values, encoded as "NA".

Test Dataset Format

The test dataset contains 50127 data rows in a format similar to the training dataset, but without the peak_edr and ismog columns.

The contest problem is to predict, for each data line (row) in the test dataset, the probability that the ismog is true. This is the same as the probability that peak_edr >= 0.25.

Contest Entry Format

All contest entries must be in the following format. The entry must be an uncompressed ASCII file having 50127 lines - the number of data rows as the test dataset. Each line (row) should have two columns separated by a comma. The columns are:

Variables

In the AI Contest dataset, collocated observation and model-derived variables have been extracted for each aircraft EDR measurement. The object of this contest is to use these variables to predict the probability that the measured turbulence is moderate-or-greater (MoG). The 'target' or 'truth' variable to be predicted is ismog, which is 0 (false) if the EDR measurement (peak_edr, also included) reflects null or light turbulence, and 1 (true) if it is above the threshold for MoG turbulence. The peak_edr and ismog fields are provided in the training dataset, but not in the testing dataset. The NWP model, satellite and radar fields surrounding the plane's EDR measurement location have been used to calculate potential predictor variables that indicate a plane's distance from various intensity levels of storms and clouds, as well as environmental characteristics at the measurement point. These variables may have skill individually or in combination. However, there are many times that the satellite or radar readings are missing or null; those field values are labeled 'NA' in the data set. (Since MoG turbulence is quite rare, the proportion of null to positive instances in both training and testing datasets has been manipulated for the purposes of this contest by removing 2/3 of the null report instances.) In the comma-separated value (CSV) training and testing data files, each line represents an instance with predictor and predictand variables associated with an aircraft turbulence measurement. Below are brief descriptions of the variables. A line number is also provided for each instance.

Fields in the training set only: (last 2 fields)

Predictor fields in both training and testing sets: (listed in order)

Airplane information at time the EDR measurement was recorded:
Lightning information:
Satellite radiance channels from the NOAA GOES imager:
NEXRAD radar-derived storm intensity and proximity information:
NWP model-derived fields:
The following fields are provided by or calculated from the output of the Rapid Update Cycle (RUC) numerical weather prediction model analysis. The values are linearly interpolated from the model grid to the location of the EDR measurement. Most are 3-D fields, but some are 2-D. Individual descriptions are not provided.

Questions

If you have questions please email them to:
  ams-ai-2009@rap.ucar.edu
Questions and answers will be posted on this web site for all to read. Please check this site for updates before sending your question.

Acknowledgements

We'd like to thank the UCAR Research Applications Laboratory for their support in running the forecasting contest.

The small print

Currently there is no prize offered other than recognition. If your organization is interested in sponsoring, please contact us at ams-ai-2009@rap.ucar.edu. Past sponsorships have been $1000, which covers prizes for the first few places. Sponsors get recognized in the AMS conference agenda, on the contest web site, and at the paper presentations.

The decision of the judges is final. The AMS, UCAR, and all persons and organizations associated with the contest have no liability for any actions associated with the contest. Any communications with us regarding the contest may be published.

Version: 2009-08-24 A