The reliability of Test Scores

Indicators of quality

  • Validity
  • Reliability
  • Utility
  • Fairness

Question: how are they all inter-related?

Validity

  • Depends on the PURPOSE
  • E.g. a ruler may be a valid measuring device for length, but isn’t very valid for measuring volume
  • Measuring what ‘it’ is supposed to
  • Matter of degree (how valid?)
  • Specific to a particular purpose!
  • Must be inferred from evidence; cannot be directly measured

Reliability

  • Consistency in the type of result a test yields
    • Time & space
    • participants
  • Not perfectly similar result but ‘very close-to’ being similar
  • When someone says you are a ‘reliable’ person, what do they really mean?
  • Are you a reliable person? J  

What do you think…?

  • Forced-choice assessment forms are high in reliability, but weak in validity (true/false)
  • Performance-based assessment forms are high in both validity and reliability (true/false)
  • A test item is said to be unreliable when most students answered the item wrongly (true/false)
  • When a test contains items that do not represent the content covered during instruction, it is known as an unreliable test (true/false)
  • Test items that do not successfully measure the intended learning outcomes (objectives) are invalid items (true/false)
  • Assessment that does not represent student learning well enough are definitely invalid and unreliable (true/false)
  • A valid test can sometimes be unreliable (true/false)
    • If a test is valid, it is reliable! (by-product)