"

6 Assessment Development

But when we focus on the  purpose of assessments in  our courses, it comes down  to providing evidence that a  course does what it claims  to do.

The work of the assessment development  team is harder to explain with a filmmaking  metaphor than the work of some of the  other Center for Learning and Technology  teams. But when we focus on the purpose  of assessments in our courses, it comes  down to providing evidence that a course  does what it claims to do. For a movie, some  of that proof comes from reviews of film  critics and everyday people who post on  review sites, explaining why they do or do  not recommend seeing the movie. Within  the studio, filmmakers judge their success  by the financials—the box office numbers.  Everyone hopes for Titanic or Avatar, but  poor planning or execution can result in  Heaven’s Gate or Gigli.

The assessment development team  oversees the design, development, and  maintenance of all course assessments.  Each assessment developer (AD) works  collaboratively with subject matter experts  (SMEs), instructional designers (IDs),  instructional technologists (ITs), and  instructional services personnel in guiding  the creation of customized assessment  strategies for each course. The assessment  development team follows the principles,  standards, and strategies described in the  following sections.

Assessment Development Mission and Goals

Within the Center for Learning and Technology (CLT), the assessment development team  guides the development of course exams and quizzes by providing expertise in specialized  activities related to assessment strategy and assessment development. We bring this  expertise into the:

  • Development of customized assessment plans for courses
  • Implementation of unique technology-based tools to enhance assessment
  • Introduction of student-facing support materials on course assessments, test prep, and  general testing topics

By collaborating with the other specialists on a course development project (IDs, SMEs,  school administrators, and other ADs), we lead in the development of assessment tools and  assist in the articulation of course and modular objectives as well as in-course assessments,  such as discussion and assignment prompts. Our goal is to develop items and prompts— utilizing item writing and item development best practices—that assess at the appropriate  level of content and Bloom’s Taxonomy (as outlined in the course and module objectives). We  strive to assess what students have learned, not merely what they remember. Therefore, as  course levels permit, we assess students’ ability to apply, analyze, or evaluate what they have  learned. Generally we try to avoid “Trivial Pursuit” types of facts— items that ask for specific  dates and/or years or names—unless there is a clear and rational argument for inclusion.

What is an Assessment at CLT?

Before discussing why the assessment development team is such an integral part of the  CLT, let’s first discuss what we mean by an assessment. An assessment is any activity that  measures how well a student can demonstrate mastery of a content area. In the CLT, the  assessments we create are most often categorized as written assignments, reflections,  blogs, discussion forums, projects, quizzes, or exams. These individual assessments can  be categorized by style: formative style or summative style assessments. We will discuss  these two concepts later, but in short, a formative assessment primarily serves as a learning  and retention tool while a summative assessment serves primarily as an evaluation tool.  Regardless of the assessment type, we strive to assess what students have learned, not just  what they remember.

Regardless of the  assessment type,  we strive to assess  what students  have learned, not  just what they  remember.

The assessment development team at Thomas Edison State University is made up of  assessment professionals who bring a level of expertise not typically found in a University  setting. Thus we bring to assessment strategy a perspective focused both on content and  assessment expertise, not exclusively on content (as is often the case with assessments  developed only by faculty).

Overall Assessment Strategy

Whether developing new courses or revising existing ones, when incorporating assessment  activities into courses, our main goal is to collaborate with the development team to develop  valid, fair, and reliable course-specific assessments, based on external industry standards for testing and assessment. In the next sections, we’ll explore each of these three essential  attributes and also touch on those external standards. First, it is important to emphasize why  we do things the way we do.

Subject matter experts (SMEs) are, by definition, experts in their field of study. An SME fully  understands the nuanced differences between concepts, can offer customized feedback  based on students’ various levels of mastery of content, can inherently differentiate between  those concepts that students must know versus those that are supplementary to the subject  matter, and can act in the role of coach, mentor, cheerleader, tutor, enforcer, stickler, or any  other role that students require. But the one thing an SME generally is not is an expert  assessment developer. Yet scrupulous care in developing assessments is essential to our  model because of who we are. Our course generation is centralized, our sections are non customizable (standardized), and it still remains important to legitimize online learning. So  creating valid, reliable, and fair assessments is a necessity for us compared to brick-and mortar institutions, which have more latitude regarding assessments.

When an SME is teaching his or her own section in a brick-and-mortar institution, the  assessments can be tailored to lectures and can highlight specific examples discussed  in the classroom setting; that is, they can focus on the individuality that instructors are  afforded in a face-to-face setting. At Thomas Edison State University, the centralized nature  of development and standardized delivery of course content requires a different approach  to assessment. The approach adopted by the assessment development team is similar to  the development procedure used by professional testing companies, statewide assessment  programs, and even licensure/certification exams. The overall course content, across sections  and semesters, is the star of the show, not the instructor with his or her unique delivery of the  content. Course and module objectives define the breadth and depth of the content on which  students are assessed. The textbook offers a standardized presentation of the course content.  Therefore, the assessment structure is standardized for content and strictly aligned with the  textbook, guided by the collaboratively developed objectives. Items within the assessments  must also be standardized for grammar, voice, structure, and clarity.

By bringing the  SME and an assessment developer (AD)  together as integral  parts of the full  development team, we are able to create  assessments that  are the best of both worlds.

By bringing the SME and an assessment developer (AD) together as integral parts of the  full development team, we able to create assessments that are the best of both worlds.  They are equal parts course-specific content (similar to an in-class, SME-developed  assessment) and standardized, best-practice assessment development techniques designed  to improve validity, fairness, and reliability. As the SME brings her or his content expertise  to the development project, the AD provides expertise in evidence-centered design, item  development, test assembly from blueprints, statistical analysis, and implementation  experience via learning-management and item-banking software.

A Quick Primer: Validity, Reliability, and Fairness

Open nearly any assessment-related  standards publication, and you’ll find the  principles of validity, reliability, and fairness  mentioned within the first five, if not the first three, sections of that publication.

Before diving too deeply into what  assessment developers do, it is important  to unpack each of the three main themes  mentioned earlier: validity, reliability,  and fairness. All of these concepts are  interrelated, and it takes an experienced  AD, using best practices and standards based procedures, to develop assessment  activities that are strong in all three areas.  But, what do we mean by each of these  concepts? And why are they so important?  Since it is easy to point the blame for  lower-than-expected exam scores on  the assessment itself, the AD team uses  multiple external standards and industry  best practices as the basis for our  operational procedures. But some principles  are more prominent than others. Open  nearly any assessment-related standards  publication, and you’ll find the principles of  validity, reliability, and fairness mentioned  within the first five, if not the first three,  sections of that publication.

Validity

Validity is the fundamental consideration in developing assessments because it refers  to the degree to which evidence and theory support the interpretations of test scores  for the proposed use of the test. The AD and SME collaboratively develop summative  and formative assessments, and these automatically make certain claims about what  the scores mean.

By definition, summative and formative assessments should claim very different things.  The goal of a midterm exam, for example, which is summative, is to assess the extent  of the student’s achievement of requisite knowledge, skills, and abilities and the level  of mastery of course and module objectives related to the material presented in the  first half of the course. Instead of having to say that mouthful every time, we use the  term construct to mean the concepts, content, and characteristics that the assessment  is designed to measure. So an assessment with strong construct validity is one that  is designed to measure the content addressed in a specific portion of the course. As  we will discuss later, development of an assessment specifications document (also  known as a blueprint) that highlights the strict alignment to objectives and topics is an  indispensable tool that can verify construct validity.

Reliability

Reliability refers to the consistency in the performance of a particular assessment from  one testing instance to another. When the content of a course does not change from  one semester to another, the assessment must be able to measure consistently from  semester to semester. Reliability is closely linked to validity in that the ability to validly  interpret scores on an assessment is dependent on how reliably those scores are  attained. Below, we will look at an important concept directly connected to reliability:  error.

What any assessment is trying to measure is a student’s true score for that assessment.  This is a hypothetical average score over an infinite set of replications of the test. One  of the easiest ways to understand this aspect of reliability is by using the formula:

True Score = Observed Score ± Error  

The two types of errors of which we need to be aware are systematic errors and  random errors.

  • Systematic errors are those that affect the performance of individuals or groups  in a consistent manner. Questions with incorrect answer keys, questions with  syntax, grammar, or mechanical errors, or questions that are developed with  inconsistencies in difficulty or complexity can contribute to systematic errors.
  • Random errors are unpredictable and may be attributed to the student taking the  assessment or to an external source. Student-related sources of random errors  include variation in motivation, attention, interest, or the inconsistent application of  skills/knowledge. Errors external to the test taker are things such as inconsistent  testing and scoring procedures.

A perfectly reliable assessment would equal the student’s true score (indicating the  exact measure of the student’s mastery of the content). Creating an assessment with  zero error, however, is practically impossible. Thus knowing we can’t reduce that  error to zero, how can we at least diminish that variable in the equation? A number of  standard procedures can eliminate as much error as possible. We’ll touch on those  later on.

Fairness

Fairness is the third important principle in  assessment development. Fairness, like  reliability, is separate from but intimately  connected to validity. When we talk  about fairness, we mean that during  design, development, administration, and  scoring of our assessments, the diversity  of the student population is taken into  account so that all subgroups of students  are treated comparably, regardless  of differences in characteristics.

Assessments developed within the CLT  are designed to eliminate language and  content that is sexist, racist, generally  offensive, or culturally insensitive. Note  that certain specific sensitive content  (such as questions related to enslaved  people or sexual abuse) may be included  in an assessment because these  questions are related to course content;  in this case all students would have  been exposed to the same material in an  equivalent way and for academic reasons.

But fairness is important because  students who are in a stressful testing  session may easily become distracted  by assessment items that are offensive  or emotionally charged. With the adult  learner this may be a special concern, as  the learner’s test anxiety may be elevated  compared to traditional college students.  In the end, a fair test will not advantage  or disadvantage individuals because of  differences in characteristics.

The exam must also be free from unclear  or poorly written prompts and should not  include upsetting or controversial material  (unless, as noted above, it is necessitated by the actual course content). Students  who are already anxious about taking  an exam do not need to be confused  by badly written questions, upset by  unnecessarily controversial questions, or  distracted in any other way by the exam  itself. Differences that can be attributed  to construct-irrelevant variance—like  the issues above—can create unfair  disadvantages to specific individual test  takers, or even groups of test takers. The  AD team guides the SME through the  item development process, instinctively  keeping validity, reliability, and fairness in  mind at all times.

Assessment Strategy and Structure

A shared focus on aligning assessments back to objectives allows the development team  to achieve a high  degree of validity,  ensuring that the assessments are measuring what we claim they are measuring.

The first time that the AD and the SME are brought together is during the project kickoff call,  also attended by the instructional designer (ID) and the instructional technologist (IT). During  that call, the AD not only describes the collaborative nature of assessment development  but also brings to the conversation initial suggestions about the types of assessments  and the types of items that could potentially work best with the course in development.  We welcome from SMEs their own ideas of how best to assess student learning. While  subsequent conversations result in a specific assessment strategy for the course, much of the  groundwork is laid during this initial call.

The other critical role that the AD performs at the inception of the project is to help finalize  the course and module objectives, in collaboration with the ID and SME. These objectives  are presented in the Basic Course Outline (BCO), which (once it is approved) forms the basis  for the rest of the development project. A set of collaboratively written objectives define what  students are expected to master during the course. Objectives combine verbs indicating  the complexity or depth of learning (using Bloom’s Taxonomy) with statements about the  content to be learned. Course objectives must be purposeful, clear, and concise so that  the assessments designed later in the project can be directly aligned to both the content  coverage and the depth/complexity of the learning. A shared focus on aligning assessments  back to objectives allows the development team to achieve a high degree of validity, ensuring  that the assessments are measuring what we claim they are measuring.

Assessment strategies can be as varied as the courses themselves, but most courses at  Thomas Edison State University include at least some of the following course activities:  discussion forums, written assignments, blogs, projects, papers, quizzes, and exams. While  ADs play a role in the development of many types of course activities, they bring their  expertise most directly to the development of quizzes and exams. Quizzes are designed to be  formative assessments, while exams are considered to be summative assessments.

Formative Assessments

What is the rationale for including formative  assessments, like quizzes, in our courses?  For one thing, our students are not sitting  in a classroom. Think back to when you  were taking classes. You read the assigned  textbook sections, highlighted those  concepts and terms you thought were  important, and jotted down notes along the  way. After all of that studying, you completed  the assignment that was due for the next  class session. In class, your instructor  may have reviewed the practice exercises  and the written assignment, emphasizing  those topics that he or she deemed most  important to the coursework. You may have  done in-class group work on specific types  of problems or concepts in order to gain  extra practice on the important content.  And then, of course, there was always one  classmate who you could count on to ask,  “Will we need to know this for the exam?”  And while Thomas Edison State University  students do the reading, highlighting, note  taking, and written assignments, they do  not get the extra in-class benefits. How do  students know if they are on the right track?  In which areas are they strong or weak?  Where is the “just-in-time” feedback that an  instructor might give if a concept seems to  be misunderstood? And…is this going to be  on the exam? Formative quizzes can fill in  this gap.

Another rationale has to do with the fact  that humans just forget things, especially  things they are introduced to for the  first time. In order to be able to retrieve  something, we must first learn it and then  not forget it. Therefore, the ability to retrieve  information is a function of both learning  and non-forgetting. Think of it using this  formula:

Retrieval = Learning – Forgetting 

If there was a way to eliminate the  “forgetting” part of that equation, then we  would be able to retrieve everything we  have learned. While it is unrealistic to think  that we can eliminate forgetting completely,  there is one proven way to avoid sliding  down the forgetting curve: to practice  retrieving through repetition.

Repetition. But, what is the best way to  practice retrieving through repetition?  Should students reread the section of the  textbook? That might help, but that is a very  passive repetition tactic. It is also the easiest  to do, so it is likely used often by students.  However, an active style of repetition (one  in which the student is forced to actively  retrieve information) will probably be more  effective. In 1890, philosopher, psychologist,  and educator William James wrote, “A  curious peculiarity of our memory is that  things are impressed better by active than  by passive repetition.” While James had no  empirical evidence to back this statement  at the time, studies performed over the next  125 years proved that his assertion was  correct.

While it is unrealistic to think that we can eliminate forgetting completely, there is one proven way to avoid sliding down the forgetting curve: to practice retrieving through repetition.

The AD and SME can bring both of these  threads together by adding formative  quizzes to the course development project,  especially in introductory-level, broad  survey courses or in Gen Ed courses that  may appeal to students from a wide range  of disciplines. Quizzes can be used as a  pre-learning activity (taken before studying  to see what students will be expected  to know and what prior knowledge they  may already have) or as a post-learning  activity (taken after studying to gauge how  well students learned the content and to  identify gaps in their knowledge). Ideally,  students can use quizzes as both pre- and  post-learning activities. Quizzing provides  a multitude of benefits for online learners.  Besides identifying gaps in knowledge and  encouraging studying, frequent quizzing  allows learners to:

  • Practice retrieval skills to aid in later  retention of content
  • Better organize their knowledge
  • Reduce interference when learning  new material
  • lmprove transfer of knowledge to new  contexts
  • Facilitate retrieval of material that  wasn’t even assessed

Quiz Advantages. Quizzes are planned as  low-stakes activities (typically accounting  for a maximum of 15% of the course grade) that help students reinforce concepts,  vocabulary, theories, and other building block content that must be mastered  before moving on to higher order skills  such as applying, analyzing, synthesizing,  evaluating, or creating. This gets to the  heart of organization of knowledge. Take  for example, an objective that states that  students will be able to critically analyze  strengths and weakness of various  personality theories. Students have to first  know who the theorists are, what theories  they are associated with, and the specific  strengths and weaknesses of each theory  before they can complete the complex  objective.

An expanding schema allows for students to transfer knowledge to new contexts and reduces interference when learning  new material. The stronger the content is connected, the easier it is to retrieve from memory.

The lower Bloom’s Taxonomy levels  (remembering and understanding),  along with the fundamental aspects of  higher order levels, are ideal for formative  assessments. Large question pools allow  for randomized presentation each time  a student attempts a quiz, which helps  build a framework (or, schema) around the  content being assessed in the quiz. This  also reduces the chance that students are  merely memorizing the answers to the  questions on the short quiz. If they see  different questions each time—questions  that are of similar and related content— students can build their own knowledge  schema around the topic being assessed.  An expanding schema allows for students  to transfer knowledge to new contexts  and reduces interference when learning  new material. The stronger the content is  connected, the easier it is to retrieve from  memory. Further, the content at these  fundamental levels is most conducive to  objective-style items (such as multiple  choice), which are easily scored by the  testing software at the conclusion of  the quiz. Scoring ease leads to system provided feedback at the conclusion of  each quiz, identifying areas of mastery and  identifying those gaps in knowledge that  need more study. (As mentioned above,  this is especially helpful to students.)  Feedback helps students concentrate  their studies on those concepts that have  not yet been mastered, resulting in more  focused and efficient study time. Students  can take quizzes as often as they wish, via computer, tablet, or smartphone, and  may attempt each quiz as many times as  they like, providing an on-demand tool— with feedback—to help them gauge their  progress.

Quizzes administered through testing  software are a way for students to assess  their knowledge formatively while receiving  immediate targeted feedback. The grades  are weighted relatively low, but even this  grade will likely increase with targeted  study and further practice. Low scores  early in their studies can be raised by using  the feedback to hone in on those areas of  weakness. Further attempts at the quizzes  will increase recall by building out the  schema, making cognitive connections  within the foundational knowledge of the  subject matter. These connections help  with future activities, including discussions  and written assignments, in which students  should then be synthesizing the vocabulary,  theories, and basic concepts in ways that  help them expand their mastery of the  content.

This confidence in content mastery might  even keep students from searching for  “cheating” websites that allow students  to download prepared responses,  undermining the validity of the course  and its assessments. The development  and administration of the quiz itself, even  though it is taken in an unproctored setting,  actually make it more difficult for students  to look up the answers online. The pool  of questions is not available to students  unless they click through each question.  If they take the quiz again, due to item  pools and randomized presentation, a  different set of items will appear. With the  weighting low, even if students cheat their  way through the quizzes, they have used  deception to affect only about 10 percent of  their course grade. Such cheating will have  been counterproductive, since students  will have negated the value of the quizzes,  which is to prepare for the proctored course  exams. And exams do count for a significant portion of the course grade.

Summative Assessments

Now that the SME and AD have spent all  of this time collaborating on the course and  module objectives and then developing  formative style quizzes, there needs to  be a way to ask the student to “put it all  together” in a summative style assessment.  These types of assessments provide the  opportunity for the student to demonstrate  mastery of the course content using the  course and module objectives as a guide  and the formative quizzes to reduce the  forgetting curve. Summative assessments  can take the form of papers, projects, or  exams. Since the AD works directly with the  SME on the exams, that will be the focus of  this section.

The course exams, built collaboratively  by the SME and the AD, are meant to  be demanding assessments that assess  students’ breadth of knowledge of a subject  as well as their higher level intellectual  skills as applied to the subject. Since exams  are driven by the learning objectives, go  through multiple quality reviews, and are  proctored, they give us a sense of how  well students actually reach the learning  objectives. To prepare effectively for an  exam, a student must consistently practice  retrieving information and knowledge  related to the course’s subject matter during  the weeks leading up to the exam. The SME  is vitally important in ensuring that there  is strict alignment between the formative  and summative assessments so that the  formative assessments truly do prepare  students for the summative assessments.

A course’s exam strategy helps to break the  content into segments, as each exam is a  summative assessment of those concepts  learned during the course segment  leading up to the exam. The different exam  strategies include: a three-exam model  (breaking content into three equal parts), a  midterm exam (focusing on content from  the first half of the course), a final exam  (which may cover the second half content  or may be cumulative over the entirety of  the course content), or a combined midterm  and final exam strategy.

Exams are high-stakes—they can collectively account for between 25 and 50  percent of the course grade. The University  therefore requires that students follow our  standardized online or in-person proctoring  procedures. Exams do not provide feedback  and students are allowed only one attempt  at each exam. While quizzes are comprised  of all objectively-scored questions, exams  often include (or are totally composed of)  essay questions.

Assessment Strategy Wrap-Up

Regardless of the type of assessment,  all quizzes and exams are developed  collaboratively with the SME to align with  course and module objectives as well  as the other in-course activities (written  assignments, discussion forums, etc.).  The course and module objectives  introduce what the students are expected to  learn and to be able to do at the conclusion  of the course.

The quizzes are tools (like assignments and  discussions) to help the students practice  retrieval and reinforce those objectives.  The exams (similar to papers or projects)  are ways for the students to show evidence  that they have mastered course objectives.  For the AD, the most important output from  the collaborative Phase 1 development  process consists of the objectives. With  clear and measurable objectives in place,  the AD has a clear sense of the direction  that the assessments should take, the SME  can tailor the course content to prepare  students for the assessment(s), and the ID  can help the SME generate objective-based,  engaging course activities that will guide  students through the course.

Item Development Principles and Best Practices 

So far we’ve covered the “why” of what we do, with a little bit of the “what.” Item development  principles and best practices entwine the “why” and the “what” to an even greater degree.

Blueprinting

Once the assessment strategy has been agreed upon and finalized, a blueprint (test  specifications document) is collaboratively developed by the SME and the AD. This blueprint  describes what objectives or topics are to be covered on the assessment, how much weight  should be allotted to each within the assessment, what types of items will be used to address  those objectives or topics, and the order in which the items will be presented. Every item  in the assessment must fit into the overall blueprint by aligning with a stated objective or  topic, not only by concept or key term, but also at the appropriate Bloom’s Taxonomy level.  If students are expected to apply an element of knowledge or provide an analysis, the items  must assess at the appropriate level. If, upon further review, it is determined that a question  does not fit into the blueprint (content or level), it is removed from the pool of items. The  blueprint also specifies information such as materials allowed to the student when taking the  assessment, the overall time limit (if applicable), and the construction of sections within the  assessment.

Item Development

Up until this point, the AD and SME have been setting the stage for the main event;  finally, item development—the most granular level of assessment strategy—begins. Item  development involves selecting and/or authoring the actual items that are going to make  up the assessments. The same critical scrutiny that is required for planning types of  assessments now is trained upon creating the items themselves. Our first question (which  the blueprint has already helped to answer) is: “How are we going to assess what we have  deemed to be the important concepts and topics?”

Put simply, we strive to assess what students can do rather than simply  what they  know.

Put simply, we strive to assess what students can do rather than simply what they know.  (Doing in this case may be as simple as restating or explaining or as complex as analyzing,  synthesizing, or creating.) However, items that involve “knowing” do have their place within  the assessment strategy, especially in formative assessments where they strengthen baseline  knowledge (terminology, theories, and concepts) that will be used in subsequent course  activities. Summative assessments, however, focus more on the “doing” types of items.  While there are many different types of items, the most commonly used by the AD team are  multiple choice and essay.

Essay. An essay item only has two parts:  the prompt (also known as the stem) and  the response area. The prompt, developed  by the SME, is the question(s), evaluative statement(s), and/or presented scenario(s)  to which the student must respond. The  response area is the space after the prompt  that is dedicated to the student’s open ended answer. An essay item is used  when the concept being assessed cannot  be evaluated adequately in an objective,  selected response item (like multiple  choice). The depth and complexity that  characterizes an essay prompt requires an  open-ended, student-written original essay  or a series of calculations as a solution to a  problem.

The most common item-development error  made with essay items occurs when stems  are written too broadly. Take a look at the  following example:

Essay prompt: Evaluate the effect of HIV/ AIDS on society. 

The answer to this could be the subject of  an entire research paper or even of a course  itself; there are certainly many books written  about this broad subject. How can we  expect a student to write a three-paragraph  essay on such a sprawling topic?

SMEs, when suggesting such a question,  sometimes say, “I have found that the  students who really know the content will  come to the correct response.” But our goal  is not to assess a student’s ability to read  the mind of the SME or AD; our goal is to  assess their learning on the topic. Based on  the content coverage of the course, a more  focused stem for an essay item might be  written as:

Essay prompt: Evaluate the present day economic effects of HIV/AIDS on  rural communities in the southern United  States.  

This stem allows students to focus on  economic effects in a particular area. It  demonstrates mastery of a manageable  area of content and an ability to select  appropriate facts and form a cogent  response.

Multiple-choice Items. A multiple-choice item consists of two parts: a stem and a response  area offering several possible options—most typically four. One of the options is the correct  answer, or the key. The remaining incorrect options are called distractors. Because so many  multiple-choice items in education usage are written too simplistically, the question type  often receives unfair criticism. For example, most publisher test banks include items such as  the following:

In The Flying Car, what is the name of the professor’s dog?

a. Charlie (key)

b. Richie

c. Spot

d. Einstein

This item concerns a trivial nugget of information that is likely to be forgotten among the vast  amount of information the student is responsible for learning in the course. It is unlikely that a  module or course objective specifically mentions the importance of knowing the names of all  the characters (human and animal) from the specified story.

A better item would be the following:

In The Flying Car, why is it significant that the professor’s dog is named Charlie?

a. It creates a scenario that allows the boy to meet the professor. (key)

b. It helps explain the connection between the professor and one of his students.

c. It reveals that the professor’s dog once belonged to someone else.

d. It is the same name as the professor’s Model T flying car.

This item does a much better job of focusing on the student’s understanding of the story and  of the interplay between characters; it moves the item out of the realm of Trivial Pursuit. In  fact, an item like this could be used in multiple objectives related to children’s literature: story  development, character introductions, the role of non-human characters, and so on.

Overall a good item displays the following characteristics, which correspond with the overall  assessment strategy:

  • Clear and concise stem and options
  • Difficulty level that is appropriate for the course
  • Uses inclusive, people-first language and vocabulary that is free from jargon or overly  technical terminology (unless supported by the course)
  • Assumes only outside knowledge that is appropriate for the level of the course • Avoids extraneous information that is not needed to answer the question • Avoids topics that can be upsetting, controversial, or offensive

In addition to these overall guidelines, we follow certain conventions regarding the stem and  distractors. A good stem:

  • Presents one clear problem that leads to one clear answer
  • Allows students to formulate a response without looking at the options • Avoids negative words such as not, never, and except

Good options are:

  • Plausible (distractors are believable enough to be attractive to the student without the required knowledge, but they should not be tricky or potentially correct)
  • Parallel in structure
  • Similar in length and specificity
  • Mutually exclusive (one option does not preclude another)
  • Free from phrases such as none of the above, all of the above, and both A and B

In addition to these general considerations, the AD team follows an item-writing style guide.  This style guide effectively aligns all CLT-developed and CLT-reviewed items to display the  same characteristics and adhere to the same standards.

We believe that when SMEs are authoring items or selecting items for inclusion in  assessments, the SMEs should be able to focus on what they know best: the content. The  finer points of testing convention and item style are foreign concepts to most subject matter  experts. That’s why the AD who is working with the SME reviews every item to make sure  that each is appropriately constructed. This allows the SME to focus on providing the content  expertise and verifying alignment to the objectives.

Thus the Center for Learning and Technology brings together the combined talents of  assessment experts and content experts. When these individuals work collaboratively  and iteratively, the end products are valid, reliable, and fair sets of items for the course  assessments.

A Few More Safeguards

Writing items in a clear and concise manner, while eliminating all grammatical and  spelling errors, offers the simplest way to reduce error. The AD team (by mandating  two internal quality reviews), along with the SME, spends significant time reading,  reviewing, and editing each item until it is approved by all parties. After all, if any part  of an item is confusing or ambiguous, the interpretation of the item is then unfairly  left up to the student test-taker. This takes the student’s focus away from the content  being assessed within the item and may create unnecessary test-day stress. The more  confusing the item, the more likely the student may get the item wrong—or even right  for the wrong reasons.

When these individuals work collaboratively and iteratively, the end products are valid, reliable, and fair sets of items for the course assessments.

An especially crucial area for SME review is multiple-choice options: the selection of  the keyed response and all of the distractors. The key should be fully correct. (This is  not as easy as it sounds!) Likewise, the distractors must be completely incorrect.

As our assessments are focused on course content, it is imperative that the items  developed for the assessments are focused on the course content only. Content irrelevant information only weakens the validity of both items and assessments.  Once the validity of any part of an assessment is compromised, it is a slippery slope  of overall validity of the assessment structure, the course activities, the course as a  whole, the program that encompasses the course, the degree program, and even the  institution. With today’s focus on outcomes, outcomes assessment, program reviews  and evaluations, and strict guidelines related to all of the above by the accrediting  bodies, the CLT model allows for sound assessment at the course level. This provides  a solid place to stand when looking “up the hill” from course to program to degree to  institution.

Conclusion

Regardless of the type of questions included  on an exam, the AD team utilizes an item  and assessment development procedure  that borrows many best practices,  standards, and quality review aspects from  large scale test development companies.  Our assessments are built according to a  number of external standards, including:

  1. The Standards for Educational and Psychological Testing (2014), by the American Educational Research Association, the American  Psychological Association, and the  National Council on Measurement in  Education
  2. The 2014 ETS Standards for Quality and  Fairness, by Educational Testing Service  Thus we bring the standards-based best  practices of standardized testing philosophy  into the local academic environment. This  philosophy fits particularly well in Thomas  Edison State University’s centralized  development model, utilizing at least one  SME in every course development and  then broadcasting this centrally developed  course to other mentors and sections.

The AD team within the CLT leads the  creation of assessments that are backed  by external standards and that incorporate  best practices, while pulling into the mix  the rich subject matter expertise residing  in our mentors. This “best of both worlds”  philosophy allows the full scope of course  content to be assessed in the most  comprehensive way.

The AD team within the CLT leads the  creation of assessments that are backed  by external standards and that incorporate  best practices, while pulling into the mix  the rich subject matter expertise residing  in our mentors. This “best of both worlds”  philosophy allows the full scope of course  content to be assessed in the most  comprehensive way. It’s always a vote of  confidence when a subject matter expert  who has not previously been exposed to  our development model finishes by saying  something such as, “I hope to incorporate  some of the things I learned into my own  classroom assessments.” This is the highest  compliment we can be paid, to do a job  right and to guide others to do the same.

 

License

Course Design and Development Handbook - Center for Learning and Technology at TESU Copyright © by Thomas Edison State University. All Rights Reserved.