We used database “Airline On-Time Performance Data” coming from Bureau of Transportation Statistics (BTS), which has records of the on-time performance of domestic flights operated by large air carriers. The database is accessed from here.
We selected 23 out of 109 variables of our interest, and chose time ranges from 2012 January to 2016 December. In total, there are 29,722,792 observations of 23 variables in our dataset. Each observation represents one flight. Variables fall into three categories based on variable types: Indicator variables, Continuous variables, Categorical variables, including summary information on the flights performance and descriptive information about the flights.
A flight is counted as “delayed” if it arrived 15min later than scheduled arrival.
nas_delay
): including Air Traffic Control (ATC), Bird strikes, Closed Runways etc.late_aircraft_delay
) means a previous flight with the same aircraft arrived late which caused the present flight to depart late.