Challenges in evaluation