Beyond Gradient Averaging in Parallel Optimization