Packing Input Frame Context in Next-Frame Prediction Models for Video Generation