Baseline Test Question Papers / Our model is based on bert and reduces the gap between the model f1 scores reported in the original dataset paper and the human upper bound by 30% and 50% relative.