Wednesday, June 29, 2016

Test Performance Profile


The Test Performance Profile (see below) was invented to guide item, test, student, teacher and instruction development. Item, Percent Difficulty, and Discriminating are listed in each column sorted first by item and then by difficulty for Master/Easy, Unfinished, and Discriminating.

The high scoring RMS nursing test has far more mastery items than the other two categories. The KJS biology test has far more discriminating items then the total of the other two categories. It also has item 9 flagged as BAD. It may need to be dropped and the test re-scored.

Low scoring Unfinished and highly Discriminating items need to be discussed in class. This information is examined in greater detail in the Student Counseling Matrixes.

Standardized tests do not contain Mastery/Easy and Unfinished items. The goal is to obtain the needed distribution of scores with the fewest items. Discriminating items have a far higher Avg PBR (0.39 and 0.40) than Mastery (which has near zero) and Unlimited (0.12 and 0.15).

Test reliability or reproducibility is estimated by KR20 and alpha. It increases with the length of the test and with discriminating items.  

The Discriminating, 50 item and 100 item values for reliability, are surprisingly close for these two tests with very different students (nursing and general biology), student preparation, and assessment (RMS and KJS). In summary, teacher skill takes precedence over statistics in selecting questions for a test. 

The nursing test measures mastery. The biology test measures the different things that students found of interest in reading assignments and other course actives: lecture and laboratory. The biology students did a good job in picking items to report what they knew and what they had yet to learn (only 9 Unfinished). 

The two columns for Unfinished and Discriminating for biology may look similar to those for nursing if the biology students were forced to guess. The practical, useful, details are in the Student Counseling Matrixes.


Right Mark Scoring (RMS)

Knowledge and Judgment Scoring (KJS)

Wednesday, June 22, 2016

Test Fitness




The average item difficulty on the right mark scored (RMS) test was 84%. That is how well the students prepared for the test. 

The test fitness, the average estimate of the Minimum and Maximum number of answers marked for each question (2.2) is how well the test fit student preparation. Test fitness is then 46%. The test design value of one out of four or 25% is lower than the test fitness estimate.

This test functioned close to a true/false test. The test fitness is the average test score when students discard wrong answers they know are wrong and then guess for a right answer from the remaining items. Knowing and guessing, quality and quantity are intermingled .



Right Mark Scoring (RMS)










The average item difficulty on the knowledge and judgment scored (KJS) test was only 73%. Test fitness was 3.1 marks or 32%. The test design value of 1 out of 5 or 20% is again lower than the test fitness estimate. 

Multiple-choice tests are easier than their design values. The KJS tally analysis indicates the complex make up of the average test score. Only 16% of the items were answered by just guessing. Some 35% were recording mastery. Some 43% of the items each split the class into two groups in which one group did significantly better than the other group. What, who and why? There was only one misconception item on the test at the end of the semester.

With KJS knowing and guessing are clearly identified. Quality and quantity are assessed independently. This can be summarized on a single page: Test Performance Profile.

Knowledge and Judgment Scoring (KJS)



Wednesday, June 15, 2016

Student Scores

Student scores are sorted by score and by student ID. The percent right (%RT) and score are the same for right mark scoring (RMS). Knowledge and judgment scoring (KJS) combines good judgment (GJ) and and %RT to get the student score. In this case knowledge and judgment are valued equally.



If grades are set using the standard deviation, then the RMS test yields (range 29/standard deviation 8) 3.6 grade levels. KJS yields (range 51/standard deviation 10) 5 grade levels (See below). These results are customary for a test in Nursing and in a freshman general study Biology class.

Skew and kurtosis capture, in numbers, the shape of the score distribution in relation to the normal bell-shaped curve (the ultimate goal of test makers). These values are of little interest to teachers who can look at the actual score distribution. The normal bell-shaped curve has fascinated institutional education for decades. Tremendous effort is made to select questions that will produce a normal distribution of student scores that can then be compared between years on standardized tests.

In the most extreme classroom example, grades are assigned by matching the list of ranked scores directly to a standard bell-shaped distribution (there is the same portion of A, B, C, D, and F grades on each test). Mark off the grades on this printout using both grading methods to see the difference.

The Biology test is measuring learning, for the most part. The Nursing test is, for the most part, measuring what has been learned; it is confirming mastery (average test score 84%). Had the nursing students been given a KJS test the result would have been more clear of who knew what and what each one needed yet to learn. Both quantity and quality would be assessed.

Knowledge and Judgment Scoring (KJS) gives students the option of doing either one; guessing at right answers or reporting what they actually know or can do, what is meaningful and useful. The average test score of 73% is examined further in the following four printouts.

Right Mark Scoring

Knowledge and Judgment Scoring

Wednesday, June 8, 2016

Enter, Edit, and Save Data

Sheet 1 collects the actually marks, a static record, and backup file. Sheet 2 makes possible the dynamic features for scoring multiple-choice tests. All marks have an equal value unless something happens to question that assumption.


Here is the place to STANDARDIZE your test, to select the items that performed well. You can also reward the class when a bad item prompts a deep and full discussion in the classroom. Or just drop it out of the test (This is a lot easier to do than to write the next test so the results are higher or lower to obtain a desired average class score). The analyses of the student scores will point out the items needing to be discussed and/or revised, and the need for further instruction. 



Right Mark Scoring (RMS)
Question 50, on the following test, is set to tally how frequent hour tests should be given: A, weekly; B, biweekly: C, triweekly and blank, teacher's choice.

Knowledge and Judgment Scoring (KJS)

Wednesday, June 1, 2016

Answer File Data

The answer data from right mark scoring (RMS)and knowledge and judgement scoring (KJS) look very different. One is a solid mass of marks (except for MURTA LA who missed one). The other is peppered with blank marks (yes, a blank can be a mark that carries the same information as adding another answer option for "do not know" or "something I still need to learn").



Names in the KJSing student ID column have been replaced with the order number in which the answer sheets were turned in during a 50-minute hour test. The order in which answer sheets are turned in is important information. Students electing RMSing are scattered throughout the KJSing data set. Students electing the fewest items to mark to report what they actually knew and trusted were, in general, the first to turn in their answer sheets.


Right Mark Scoring (RMS)

Knowledge and Judgment Scoring (KJS)