6 Assessment Development
But when we focus on the purpose of assessments in our courses, it comes down to providing evidence that a course does what it claims to do.
The work of the assessment development team is harder to explain with a filmmaking metaphor than the work of some of the other Center for Learning and Technology teams. But when we focus on the purpose of assessments in our courses, it comes down to providing evidence that a course does what it claims to do. For a movie, some of that proof comes from reviews of film critics and everyday people who post on review sites, explaining why they do or do not recommend seeing the movie. Within the studio, filmmakers judge their success by the financials—the box office numbers. Everyone hopes for Titanic or Avatar, but poor planning or execution can result in Heaven’s Gate or Gigli.
The assessment development team oversees the design, development, and maintenance of all course assessments. Each assessment developer (AD) works collaboratively with subject matter experts (SMEs), instructional designers (IDs), instructional technologists (ITs), and instructional services personnel in guiding the creation of customized assessment strategies for each course. The assessment development team follows the principles, standards, and strategies described in the following sections.
Assessment Development Mission and Goals
Within the Center for Learning and Technology (CLT), the assessment development team guides the development of course exams and quizzes by providing expertise in specialized activities related to assessment strategy and assessment development. We bring this expertise into the:
- Development of customized assessment plans for courses
- Implementation of unique technology-based tools to enhance assessment
- Introduction of student-facing support materials on course assessments, test prep, and general testing topics
By collaborating with the other specialists on a course development project (IDs, SMEs, school administrators, and other ADs), we lead in the development of assessment tools and assist in the articulation of course and modular objectives as well as in-course assessments, such as discussion and assignment prompts. Our goal is to develop items and prompts— utilizing item writing and item development best practices—that assess at the appropriate level of content and Bloom’s Taxonomy (as outlined in the course and module objectives). We strive to assess what students have learned, not merely what they remember. Therefore, as course levels permit, we assess students’ ability to apply, analyze, or evaluate what they have learned. Generally we try to avoid “Trivial Pursuit” types of facts— items that ask for specific dates and/or years or names—unless there is a clear and rational argument for inclusion.
What is an Assessment at CLT?
Before discussing why the assessment development team is such an integral part of the CLT, let’s first discuss what we mean by an assessment. An assessment is any activity that measures how well a student can demonstrate mastery of a content area. In the CLT, the assessments we create are most often categorized as written assignments, reflections, blogs, discussion forums, projects, quizzes, or exams. These individual assessments can be categorized by style: formative style or summative style assessments. We will discuss these two concepts later, but in short, a formative assessment primarily serves as a learning and retention tool while a summative assessment serves primarily as an evaluation tool. Regardless of the assessment type, we strive to assess what students have learned, not just what they remember.
Regardless of the assessment type, we strive to assess what students have learned, not just what they remember.
The assessment development team at Thomas Edison State University is made up of assessment professionals who bring a level of expertise not typically found in a University setting. Thus we bring to assessment strategy a perspective focused both on content and assessment expertise, not exclusively on content (as is often the case with assessments developed only by faculty).
Overall Assessment Strategy
Whether developing new courses or revising existing ones, when incorporating assessment activities into courses, our main goal is to collaborate with the development team to develop valid, fair, and reliable course-specific assessments, based on external industry standards for testing and assessment. In the next sections, we’ll explore each of these three essential attributes and also touch on those external standards. First, it is important to emphasize why we do things the way we do.
Subject matter experts (SMEs) are, by definition, experts in their field of study. An SME fully understands the nuanced differences between concepts, can offer customized feedback based on students’ various levels of mastery of content, can inherently differentiate between those concepts that students must know versus those that are supplementary to the subject matter, and can act in the role of coach, mentor, cheerleader, tutor, enforcer, stickler, or any other role that students require. But the one thing an SME generally is not is an expert assessment developer. Yet scrupulous care in developing assessments is essential to our model because of who we are. Our course generation is centralized, our sections are non customizable (standardized), and it still remains important to legitimize online learning. So creating valid, reliable, and fair assessments is a necessity for us compared to brick-and mortar institutions, which have more latitude regarding assessments.
When an SME is teaching his or her own section in a brick-and-mortar institution, the assessments can be tailored to lectures and can highlight specific examples discussed in the classroom setting; that is, they can focus on the individuality that instructors are afforded in a face-to-face setting. At Thomas Edison State University, the centralized nature of development and standardized delivery of course content requires a different approach to assessment. The approach adopted by the assessment development team is similar to the development procedure used by professional testing companies, statewide assessment programs, and even licensure/certification exams. The overall course content, across sections and semesters, is the star of the show, not the instructor with his or her unique delivery of the content. Course and module objectives define the breadth and depth of the content on which students are assessed. The textbook offers a standardized presentation of the course content. Therefore, the assessment structure is standardized for content and strictly aligned with the textbook, guided by the collaboratively developed objectives. Items within the assessments must also be standardized for grammar, voice, structure, and clarity.
By bringing the SME and an assessment developer (AD) together as integral parts of the full development team, we are able to create assessments that are the best of both worlds.
By bringing the SME and an assessment developer (AD) together as integral parts of the full development team, we able to create assessments that are the best of both worlds. They are equal parts course-specific content (similar to an in-class, SME-developed assessment) and standardized, best-practice assessment development techniques designed to improve validity, fairness, and reliability. As the SME brings her or his content expertise to the development project, the AD provides expertise in evidence-centered design, item development, test assembly from blueprints, statistical analysis, and implementation experience via learning-management and item-banking software.
A Quick Primer: Validity, Reliability, and Fairness
Open nearly any assessment-related standards publication, and you’ll find the principles of validity, reliability, and fairness mentioned within the first five, if not the first three, sections of that publication.
Before diving too deeply into what assessment developers do, it is important to unpack each of the three main themes mentioned earlier: validity, reliability, and fairness. All of these concepts are interrelated, and it takes an experienced AD, using best practices and standards based procedures, to develop assessment activities that are strong in all three areas. But, what do we mean by each of these concepts? And why are they so important? Since it is easy to point the blame for lower-than-expected exam scores on the assessment itself, the AD team uses multiple external standards and industry best practices as the basis for our operational procedures. But some principles are more prominent than others. Open nearly any assessment-related standards publication, and you’ll find the principles of validity, reliability, and fairness mentioned within the first five, if not the first three, sections of that publication.
Validity
Validity is the fundamental consideration in developing assessments because it refers to the degree to which evidence and theory support the interpretations of test scores for the proposed use of the test. The AD and SME collaboratively develop summative and formative assessments, and these automatically make certain claims about what the scores mean.
By definition, summative and formative assessments should claim very different things. The goal of a midterm exam, for example, which is summative, is to assess the extent of the student’s achievement of requisite knowledge, skills, and abilities and the level of mastery of course and module objectives related to the material presented in the first half of the course. Instead of having to say that mouthful every time, we use the term construct to mean the concepts, content, and characteristics that the assessment is designed to measure. So an assessment with strong construct validity is one that is designed to measure the content addressed in a specific portion of the course. As we will discuss later, development of an assessment specifications document (also known as a blueprint) that highlights the strict alignment to objectives and topics is an indispensable tool that can verify construct validity.
Reliability
Reliability refers to the consistency in the performance of a particular assessment from one testing instance to another. When the content of a course does not change from one semester to another, the assessment must be able to measure consistently from semester to semester. Reliability is closely linked to validity in that the ability to validly interpret scores on an assessment is dependent on how reliably those scores are attained. Below, we will look at an important concept directly connected to reliability: error.
What any assessment is trying to measure is a student’s true score for that assessment. This is a hypothetical average score over an infinite set of replications of the test. One of the easiest ways to understand this aspect of reliability is by using the formula:
True Score = Observed Score ± Error
The two types of errors of which we need to be aware are systematic errors and random errors.
- Systematic errors are those that affect the performance of individuals or groups in a consistent manner. Questions with incorrect answer keys, questions with syntax, grammar, or mechanical errors, or questions that are developed with inconsistencies in difficulty or complexity can contribute to systematic errors.
- Random errors are unpredictable and may be attributed to the student taking the assessment or to an external source. Student-related sources of random errors include variation in motivation, attention, interest, or the inconsistent application of skills/knowledge. Errors external to the test taker are things such as inconsistent testing and scoring procedures.
A perfectly reliable assessment would equal the student’s true score (indicating the exact measure of the student’s mastery of the content). Creating an assessment with zero error, however, is practically impossible. Thus knowing we can’t reduce that error to zero, how can we at least diminish that variable in the equation? A number of standard procedures can eliminate as much error as possible. We’ll touch on those later on.
Fairness
Fairness is the third important principle in assessment development. Fairness, like reliability, is separate from but intimately connected to validity. When we talk about fairness, we mean that during design, development, administration, and scoring of our assessments, the diversity of the student population is taken into account so that all subgroups of students are treated comparably, regardless of differences in characteristics.
Assessments developed within the CLT are designed to eliminate language and content that is sexist, racist, generally offensive, or culturally insensitive. Note that certain specific sensitive content (such as questions related to enslaved people or sexual abuse) may be included in an assessment because these questions are related to course content; in this case all students would have been exposed to the same material in an equivalent way and for academic reasons.
But fairness is important because students who are in a stressful testing session may easily become distracted by assessment items that are offensive or emotionally charged. With the adult learner this may be a special concern, as the learner’s test anxiety may be elevated compared to traditional college students. In the end, a fair test will not advantage or disadvantage individuals because of differences in characteristics.
The exam must also be free from unclear or poorly written prompts and should not include upsetting or controversial material (unless, as noted above, it is necessitated by the actual course content). Students who are already anxious about taking an exam do not need to be confused by badly written questions, upset by unnecessarily controversial questions, or distracted in any other way by the exam itself. Differences that can be attributed to construct-irrelevant variance—like the issues above—can create unfair disadvantages to specific individual test takers, or even groups of test takers. The AD team guides the SME through the item development process, instinctively keeping validity, reliability, and fairness in mind at all times.
Assessment Strategy and Structure
A shared focus on aligning assessments back to objectives allows the development team to achieve a high degree of validity, ensuring that the assessments are measuring what we claim they are measuring.
The first time that the AD and the SME are brought together is during the project kickoff call, also attended by the instructional designer (ID) and the instructional technologist (IT). During that call, the AD not only describes the collaborative nature of assessment development but also brings to the conversation initial suggestions about the types of assessments and the types of items that could potentially work best with the course in development. We welcome from SMEs their own ideas of how best to assess student learning. While subsequent conversations result in a specific assessment strategy for the course, much of the groundwork is laid during this initial call.
The other critical role that the AD performs at the inception of the project is to help finalize the course and module objectives, in collaboration with the ID and SME. These objectives are presented in the Basic Course Outline (BCO), which (once it is approved) forms the basis for the rest of the development project. A set of collaboratively written objectives define what students are expected to master during the course. Objectives combine verbs indicating the complexity or depth of learning (using Bloom’s Taxonomy) with statements about the content to be learned. Course objectives must be purposeful, clear, and concise so that the assessments designed later in the project can be directly aligned to both the content coverage and the depth/complexity of the learning. A shared focus on aligning assessments back to objectives allows the development team to achieve a high degree of validity, ensuring that the assessments are measuring what we claim they are measuring.
Assessment strategies can be as varied as the courses themselves, but most courses at Thomas Edison State University include at least some of the following course activities: discussion forums, written assignments, blogs, projects, papers, quizzes, and exams. While ADs play a role in the development of many types of course activities, they bring their expertise most directly to the development of quizzes and exams. Quizzes are designed to be formative assessments, while exams are considered to be summative assessments.
Formative Assessments
What is the rationale for including formative assessments, like quizzes, in our courses? For one thing, our students are not sitting in a classroom. Think back to when you were taking classes. You read the assigned textbook sections, highlighted those concepts and terms you thought were important, and jotted down notes along the way. After all of that studying, you completed the assignment that was due for the next class session. In class, your instructor may have reviewed the practice exercises and the written assignment, emphasizing those topics that he or she deemed most important to the coursework. You may have done in-class group work on specific types of problems or concepts in order to gain extra practice on the important content. And then, of course, there was always one classmate who you could count on to ask, “Will we need to know this for the exam?” And while Thomas Edison State University students do the reading, highlighting, note taking, and written assignments, they do not get the extra in-class benefits. How do students know if they are on the right track? In which areas are they strong or weak? Where is the “just-in-time” feedback that an instructor might give if a concept seems to be misunderstood? And…is this going to be on the exam? Formative quizzes can fill in this gap.
Another rationale has to do with the fact that humans just forget things, especially things they are introduced to for the first time. In order to be able to retrieve something, we must first learn it and then not forget it. Therefore, the ability to retrieve information is a function of both learning and non-forgetting. Think of it using this formula:
Retrieval = Learning – Forgetting
If there was a way to eliminate the “forgetting” part of that equation, then we would be able to retrieve everything we have learned. While it is unrealistic to think that we can eliminate forgetting completely, there is one proven way to avoid sliding down the forgetting curve: to practice retrieving through repetition.
Repetition. But, what is the best way to practice retrieving through repetition? Should students reread the section of the textbook? That might help, but that is a very passive repetition tactic. It is also the easiest to do, so it is likely used often by students. However, an active style of repetition (one in which the student is forced to actively retrieve information) will probably be more effective. In 1890, philosopher, psychologist, and educator William James wrote, “A curious peculiarity of our memory is that things are impressed better by active than by passive repetition.” While James had no empirical evidence to back this statement at the time, studies performed over the next 125 years proved that his assertion was correct.
While it is unrealistic to think that we can eliminate forgetting completely, there is one proven way to avoid sliding down the forgetting curve: to practice retrieving through repetition.
The AD and SME can bring both of these threads together by adding formative quizzes to the course development project, especially in introductory-level, broad survey courses or in Gen Ed courses that may appeal to students from a wide range of disciplines. Quizzes can be used as a pre-learning activity (taken before studying to see what students will be expected to know and what prior knowledge they may already have) or as a post-learning activity (taken after studying to gauge how well students learned the content and to identify gaps in their knowledge). Ideally, students can use quizzes as both pre- and post-learning activities. Quizzing provides a multitude of benefits for online learners. Besides identifying gaps in knowledge and encouraging studying, frequent quizzing allows learners to:
- Practice retrieval skills to aid in later retention of content
- Better organize their knowledge
- Reduce interference when learning new material
- lmprove transfer of knowledge to new contexts
- Facilitate retrieval of material that wasn’t even assessed
Quiz Advantages. Quizzes are planned as low-stakes activities (typically accounting for a maximum of 15% of the course grade) that help students reinforce concepts, vocabulary, theories, and other building block content that must be mastered before moving on to higher order skills such as applying, analyzing, synthesizing, evaluating, or creating. This gets to the heart of organization of knowledge. Take for example, an objective that states that students will be able to critically analyze strengths and weakness of various personality theories. Students have to first know who the theorists are, what theories they are associated with, and the specific strengths and weaknesses of each theory before they can complete the complex objective.
An expanding schema allows for students to transfer knowledge to new contexts and reduces interference when learning new material. The stronger the content is connected, the easier it is to retrieve from memory.
The lower Bloom’s Taxonomy levels (remembering and understanding), along with the fundamental aspects of higher order levels, are ideal for formative assessments. Large question pools allow for randomized presentation each time a student attempts a quiz, which helps build a framework (or, schema) around the content being assessed in the quiz. This also reduces the chance that students are merely memorizing the answers to the questions on the short quiz. If they see different questions each time—questions that are of similar and related content— students can build their own knowledge schema around the topic being assessed. An expanding schema allows for students to transfer knowledge to new contexts and reduces interference when learning new material. The stronger the content is connected, the easier it is to retrieve from memory. Further, the content at these fundamental levels is most conducive to objective-style items (such as multiple choice), which are easily scored by the testing software at the conclusion of the quiz. Scoring ease leads to system provided feedback at the conclusion of each quiz, identifying areas of mastery and identifying those gaps in knowledge that need more study. (As mentioned above, this is especially helpful to students.) Feedback helps students concentrate their studies on those concepts that have not yet been mastered, resulting in more focused and efficient study time. Students can take quizzes as often as they wish, via computer, tablet, or smartphone, and may attempt each quiz as many times as they like, providing an on-demand tool— with feedback—to help them gauge their progress.
Quizzes administered through testing software are a way for students to assess their knowledge formatively while receiving immediate targeted feedback. The grades are weighted relatively low, but even this grade will likely increase with targeted study and further practice. Low scores early in their studies can be raised by using the feedback to hone in on those areas of weakness. Further attempts at the quizzes will increase recall by building out the schema, making cognitive connections within the foundational knowledge of the subject matter. These connections help with future activities, including discussions and written assignments, in which students should then be synthesizing the vocabulary, theories, and basic concepts in ways that help them expand their mastery of the content.
This confidence in content mastery might even keep students from searching for “cheating” websites that allow students to download prepared responses, undermining the validity of the course and its assessments. The development and administration of the quiz itself, even though it is taken in an unproctored setting, actually make it more difficult for students to look up the answers online. The pool of questions is not available to students unless they click through each question. If they take the quiz again, due to item pools and randomized presentation, a different set of items will appear. With the weighting low, even if students cheat their way through the quizzes, they have used deception to affect only about 10 percent of their course grade. Such cheating will have been counterproductive, since students will have negated the value of the quizzes, which is to prepare for the proctored course exams. And exams do count for a significant portion of the course grade.
Summative Assessments
Now that the SME and AD have spent all of this time collaborating on the course and module objectives and then developing formative style quizzes, there needs to be a way to ask the student to “put it all together” in a summative style assessment. These types of assessments provide the opportunity for the student to demonstrate mastery of the course content using the course and module objectives as a guide and the formative quizzes to reduce the forgetting curve. Summative assessments can take the form of papers, projects, or exams. Since the AD works directly with the SME on the exams, that will be the focus of this section.
The course exams, built collaboratively by the SME and the AD, are meant to be demanding assessments that assess students’ breadth of knowledge of a subject as well as their higher level intellectual skills as applied to the subject. Since exams are driven by the learning objectives, go through multiple quality reviews, and are proctored, they give us a sense of how well students actually reach the learning objectives. To prepare effectively for an exam, a student must consistently practice retrieving information and knowledge related to the course’s subject matter during the weeks leading up to the exam. The SME is vitally important in ensuring that there is strict alignment between the formative and summative assessments so that the formative assessments truly do prepare students for the summative assessments.
A course’s exam strategy helps to break the content into segments, as each exam is a summative assessment of those concepts learned during the course segment leading up to the exam. The different exam strategies include: a three-exam model (breaking content into three equal parts), a midterm exam (focusing on content from the first half of the course), a final exam (which may cover the second half content or may be cumulative over the entirety of the course content), or a combined midterm and final exam strategy.
Exams are high-stakes—they can collectively account for between 25 and 50 percent of the course grade. The University therefore requires that students follow our standardized online or in-person proctoring procedures. Exams do not provide feedback and students are allowed only one attempt at each exam. While quizzes are comprised of all objectively-scored questions, exams often include (or are totally composed of) essay questions.
Assessment Strategy Wrap-Up
Regardless of the type of assessment, all quizzes and exams are developed collaboratively with the SME to align with course and module objectives as well as the other in-course activities (written assignments, discussion forums, etc.). The course and module objectives introduce what the students are expected to learn and to be able to do at the conclusion of the course.
The quizzes are tools (like assignments and discussions) to help the students practice retrieval and reinforce those objectives. The exams (similar to papers or projects) are ways for the students to show evidence that they have mastered course objectives. For the AD, the most important output from the collaborative Phase 1 development process consists of the objectives. With clear and measurable objectives in place, the AD has a clear sense of the direction that the assessments should take, the SME can tailor the course content to prepare students for the assessment(s), and the ID can help the SME generate objective-based, engaging course activities that will guide students through the course.
Item Development Principles and Best Practices
So far we’ve covered the “why” of what we do, with a little bit of the “what.” Item development principles and best practices entwine the “why” and the “what” to an even greater degree.
Blueprinting
Once the assessment strategy has been agreed upon and finalized, a blueprint (test specifications document) is collaboratively developed by the SME and the AD. This blueprint describes what objectives or topics are to be covered on the assessment, how much weight should be allotted to each within the assessment, what types of items will be used to address those objectives or topics, and the order in which the items will be presented. Every item in the assessment must fit into the overall blueprint by aligning with a stated objective or topic, not only by concept or key term, but also at the appropriate Bloom’s Taxonomy level. If students are expected to apply an element of knowledge or provide an analysis, the items must assess at the appropriate level. If, upon further review, it is determined that a question does not fit into the blueprint (content or level), it is removed from the pool of items. The blueprint also specifies information such as materials allowed to the student when taking the assessment, the overall time limit (if applicable), and the construction of sections within the assessment.
Item Development
Up until this point, the AD and SME have been setting the stage for the main event; finally, item development—the most granular level of assessment strategy—begins. Item development involves selecting and/or authoring the actual items that are going to make up the assessments. The same critical scrutiny that is required for planning types of assessments now is trained upon creating the items themselves. Our first question (which the blueprint has already helped to answer) is: “How are we going to assess what we have deemed to be the important concepts and topics?”
Put simply, we strive to assess what students can do rather than simply what they know.
Put simply, we strive to assess what students can do rather than simply what they know. (Doing in this case may be as simple as restating or explaining or as complex as analyzing, synthesizing, or creating.) However, items that involve “knowing” do have their place within the assessment strategy, especially in formative assessments where they strengthen baseline knowledge (terminology, theories, and concepts) that will be used in subsequent course activities. Summative assessments, however, focus more on the “doing” types of items. While there are many different types of items, the most commonly used by the AD team are multiple choice and essay.
Essay. An essay item only has two parts: the prompt (also known as the stem) and the response area. The prompt, developed by the SME, is the question(s), evaluative statement(s), and/or presented scenario(s) to which the student must respond. The response area is the space after the prompt that is dedicated to the student’s open ended answer. An essay item is used when the concept being assessed cannot be evaluated adequately in an objective, selected response item (like multiple choice). The depth and complexity that characterizes an essay prompt requires an open-ended, student-written original essay or a series of calculations as a solution to a problem.
The most common item-development error made with essay items occurs when stems are written too broadly. Take a look at the following example:
Essay prompt: Evaluate the effect of HIV/ AIDS on society.
The answer to this could be the subject of an entire research paper or even of a course itself; there are certainly many books written about this broad subject. How can we expect a student to write a three-paragraph essay on such a sprawling topic?
SMEs, when suggesting such a question, sometimes say, “I have found that the students who really know the content will come to the correct response.” But our goal is not to assess a student’s ability to read the mind of the SME or AD; our goal is to assess their learning on the topic. Based on the content coverage of the course, a more focused stem for an essay item might be written as:
Essay prompt: Evaluate the present day economic effects of HIV/AIDS on rural communities in the southern United States.
This stem allows students to focus on economic effects in a particular area. It demonstrates mastery of a manageable area of content and an ability to select appropriate facts and form a cogent response.
Multiple-choice Items. A multiple-choice item consists of two parts: a stem and a response area offering several possible options—most typically four. One of the options is the correct answer, or the key. The remaining incorrect options are called distractors. Because so many multiple-choice items in education usage are written too simplistically, the question type often receives unfair criticism. For example, most publisher test banks include items such as the following:
In The Flying Car, what is the name of the professor’s dog?
a. Charlie (key)
b. Richie
c. Spot
d. Einstein
This item concerns a trivial nugget of information that is likely to be forgotten among the vast amount of information the student is responsible for learning in the course. It is unlikely that a module or course objective specifically mentions the importance of knowing the names of all the characters (human and animal) from the specified story.
A better item would be the following:
In The Flying Car, why is it significant that the professor’s dog is named Charlie?
a. It creates a scenario that allows the boy to meet the professor. (key)
b. It helps explain the connection between the professor and one of his students.
c. It reveals that the professor’s dog once belonged to someone else.
d. It is the same name as the professor’s Model T flying car.
This item does a much better job of focusing on the student’s understanding of the story and of the interplay between characters; it moves the item out of the realm of Trivial Pursuit. In fact, an item like this could be used in multiple objectives related to children’s literature: story development, character introductions, the role of non-human characters, and so on.
Overall a good item displays the following characteristics, which correspond with the overall assessment strategy:
- Clear and concise stem and options
- Difficulty level that is appropriate for the course
- Uses inclusive, people-first language and vocabulary that is free from jargon or overly technical terminology (unless supported by the course)
- Assumes only outside knowledge that is appropriate for the level of the course • Avoids extraneous information that is not needed to answer the question • Avoids topics that can be upsetting, controversial, or offensive
In addition to these overall guidelines, we follow certain conventions regarding the stem and distractors. A good stem:
- Presents one clear problem that leads to one clear answer
- Allows students to formulate a response without looking at the options • Avoids negative words such as not, never, and except
Good options are:
- Plausible (distractors are believable enough to be attractive to the student without the required knowledge, but they should not be tricky or potentially correct)
- Parallel in structure
- Similar in length and specificity
- Mutually exclusive (one option does not preclude another)
- Free from phrases such as none of the above, all of the above, and both A and B
In addition to these general considerations, the AD team follows an item-writing style guide. This style guide effectively aligns all CLT-developed and CLT-reviewed items to display the same characteristics and adhere to the same standards.
We believe that when SMEs are authoring items or selecting items for inclusion in assessments, the SMEs should be able to focus on what they know best: the content. The finer points of testing convention and item style are foreign concepts to most subject matter experts. That’s why the AD who is working with the SME reviews every item to make sure that each is appropriately constructed. This allows the SME to focus on providing the content expertise and verifying alignment to the objectives.
Thus the Center for Learning and Technology brings together the combined talents of assessment experts and content experts. When these individuals work collaboratively and iteratively, the end products are valid, reliable, and fair sets of items for the course assessments.
A Few More Safeguards
Writing items in a clear and concise manner, while eliminating all grammatical and spelling errors, offers the simplest way to reduce error. The AD team (by mandating two internal quality reviews), along with the SME, spends significant time reading, reviewing, and editing each item until it is approved by all parties. After all, if any part of an item is confusing or ambiguous, the interpretation of the item is then unfairly left up to the student test-taker. This takes the student’s focus away from the content being assessed within the item and may create unnecessary test-day stress. The more confusing the item, the more likely the student may get the item wrong—or even right for the wrong reasons.
When these individuals work collaboratively and iteratively, the end products are valid, reliable, and fair sets of items for the course assessments.
An especially crucial area for SME review is multiple-choice options: the selection of the keyed response and all of the distractors. The key should be fully correct. (This is not as easy as it sounds!) Likewise, the distractors must be completely incorrect.
As our assessments are focused on course content, it is imperative that the items developed for the assessments are focused on the course content only. Content irrelevant information only weakens the validity of both items and assessments. Once the validity of any part of an assessment is compromised, it is a slippery slope of overall validity of the assessment structure, the course activities, the course as a whole, the program that encompasses the course, the degree program, and even the institution. With today’s focus on outcomes, outcomes assessment, program reviews and evaluations, and strict guidelines related to all of the above by the accrediting bodies, the CLT model allows for sound assessment at the course level. This provides a solid place to stand when looking “up the hill” from course to program to degree to institution.
Conclusion
Regardless of the type of questions included on an exam, the AD team utilizes an item and assessment development procedure that borrows many best practices, standards, and quality review aspects from large scale test development companies. Our assessments are built according to a number of external standards, including:
- The Standards for Educational and Psychological Testing (2014), by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education
- The 2014 ETS Standards for Quality and Fairness, by Educational Testing Service Thus we bring the standards-based best practices of standardized testing philosophy into the local academic environment. This philosophy fits particularly well in Thomas Edison State University’s centralized development model, utilizing at least one SME in every course development and then broadcasting this centrally developed course to other mentors and sections.
The AD team within the CLT leads the creation of assessments that are backed by external standards and that incorporate best practices, while pulling into the mix the rich subject matter expertise residing in our mentors. This “best of both worlds” philosophy allows the full scope of course content to be assessed in the most comprehensive way.
The AD team within the CLT leads the creation of assessments that are backed by external standards and that incorporate best practices, while pulling into the mix the rich subject matter expertise residing in our mentors. This “best of both worlds” philosophy allows the full scope of course content to be assessed in the most comprehensive way. It’s always a vote of confidence when a subject matter expert who has not previously been exposed to our development model finishes by saying something such as, “I hope to incorporate some of the things I learned into my own classroom assessments.” This is the highest compliment we can be paid, to do a job right and to guide others to do the same.