Knowledge and Judgment Scoring

Sunday, September 18, 2016

Student-Centered Learning

I have spent over 40 years preaching the need for schools designed for success rather than for failure. Yesterday I happened upon an article by Nicholas Donohue that presents convincing evidence that that is being done by transforming high schools in the New England states. It is call student-centered learning. Also see Andrew Cohen What School Could Be.

My attempt in 1981-1989 used a campus computer system at NWMSU, textbook, lecture, laboratory, AND voluntary student presentations, research, and projects. This work has been further developed in Multiple-Choice Reborn and summarized in Knowledge and Judgment Scoring - 2016. In 1995, Knowledge Factor patented an online confidence based learning system (now in amplifier). Masters, 1982, developed Rasch partial credit scoring (PCS).

All three put the student in the position of being in charge of learning and reporting; at all levels of thinking. They approached evaluating an apple from the skin, as traditional multiple-choice (guess) testing is done.

Partial Credit Scoring just polished the apple skin. The emphasis was still on the surface, the score, at that time. Knowledge Factor made the transition from the concrete level of thinking to understanding (skin to core), and provided the meat between in amplifier. Nuclear power plant operators and doctors were held to a much higher responsibility (self-judgment) standard (far over 75%, over 90% mastery) than is customary in a traditional high school classroom (60% for passing).

My students voted to give knowledge and judgment equal value (1:1 or 50%:50%). Voluntary activities replaced one letter grade (10% each). The students were then responsible for reporting what they knew or could do. They could mix several ways of learning and reporting.

A student with a knowledge score of 50% and a quality score of 100% would end up with about the same test score as a student who marked every question (guessed) for a quality, quantity, and test score of 75% (with no judgment).

These two students are very different. One is at the core of being educated (scholar). The other is only viewing the skin (tourist). The first one has a solid basis for self-instruction and further learning; is ready for independent scholarship. The apple seeds germinate (raise new questions) and produce more fruit (without the tree).

We know much less about the second student, and about what must be “re-taught”. The apple may just be left on the tree in what is often a vain effort to ripen it. Such is the fate of students in schools designed for failure (grades A to F).

In extreme cases, courses are classified by difficulty or assigned PASS/FAIL grades. My General Biology students were even “protected” so I could not know which student was in the course for a grade or pass/fail.

Students assess the level of thinking required in a course by asking on the first day, “Are your tests cumulative?” If so, they leave. This is a voluntary choice to stay at the lowest levels of thinking. Memory care residents do not have that choice.

There is a frightening parallel between creating a happy environment for memory care residents here at Provision Living at Columbia, and creating an academic environment (national, state, school, and classroom) that yields a happy student course grade. Both end up at the end of the day pretty much where they started, at the lowest levels of thinking.

Many students made the transition from memorizing nonsense for the next test to questioning, answering, and verifying; learning for themselves and knowing they were “right”. This is self-empowering. They started getting better grades in all of their courses. They had experienced the joy of scholarship, an intrinsic reward. “I do know what I know.” The independent quality score in knowledge and judgment scoring directed their path.

Student centered learning is not new. The title is. This is important in marketing to institutionalized education. What is new is that at last entire high schools are now being transformed for the right reason: student development rather than standardized test scores based on lower levels of thinking during instruction and testing.

These students should be ready for college or other post high school programs. They should not be the under-prepared college students we worked with. The General Biology course was to last for only a few years; until the high schools did all of this work. In practice, the course became permanent. Biology did not became a required course in all high schools.

My interest in this project was to find a way to know what each student really knew, believed, could do, and was interested in, when a new science building was constructed in 1980 with 120 seat lecture halls. The unexpected consequence of promoting student development, based on the independent quality and quantity scores, was not only a bonus but appropriately needed for under-prepared college students. Over 90% of students voluntarily switched from guessing right answers to reporting what they actually knew and could do.

In my experience, the multiple-choice test, when administered and scored properly (quantity and quality) yields as good (if not better) an insight into student ability as many overly elaborate and expensive assessments other than actual performance. Student development (becoming comfortable using higher levels of thinking) is an added bonus.

Wednesday, August 3, 2016

Copy Detector - RMS

This copy detector is an auto-pilot version of the original cheat checker that could point to the source person. Here (Sheet 8) a pairing index ranks answer sheets by the degree of pairing (Sheet 9).

An interesting feature of this right marked scoring (RMS) nursing test is that Unique pairings (red) occurred only toward the end of the test.

There is no distinct break in the beginning of the pairing index and at the end of the pairing count plots to indicate cheating.

Wednesday, July 20, 2016

Test Marks with Student ID by Item

A chart showing marks with student ID by item number is a classic printout for multiple-choice tests for use in class discussions. It is simple, but it lacks the analysis results.

Grade book software can import these files.

This post, #20, ends the pages from the nine-patch.com website.

The real magic is using these printouts is in pointing out where students and teachers should spend their time most productively.

Detailed analyses can be found in Multiple-Choice Reborn and Rasch Model Audit. With the end of NCLB and CCSS, fertile soil may yet be found for knowledge and judgement scoring. It is time to do multiple-choice right. Give students the option of Smart Testing along with traditional Dumb Testing.

Nursing Right Mark Score (RMS)

Biology Knowledge and Judgment Scoring (KJS)

Wednesday, July 13, 2016

Guttman with Scores by Item Difficulty

The lowest scoring student is on the bottom line. The most difficult item is on the right side. The lower right corner is as bad as things can get. The upper left corner is as good things can get.

Again, knowledge and judgment scoring (KJS) has more information to work with (accurate, honest, and fair) than right mark scoring (RMS). The quality score (%RT) is in the 80%s all the way down to a student test score of about 60%.

Most of these students actually know what they know and what they have yet to learn. They have a solid basis for learning more. KJS promotes student quality.

Nursing Right Mark Score (RMS)

Biology Knowledge and Judgment Scoring (KJS)

Wednesday, July 6, 2016

Test Maker Counseling Matrix - RMS

The test maker view of a right mark scored (RMS) test is based on item difficulty (%), discrimination ability (A,B,C,D), and item performance (mastery, unfinished, and discriminating).

There is no right mark score (RMS) test taker student counseling matrix as students have no vote on which items to select for their individualized test.

This is a traditional item analysis plus a ranking based on how the item performed on the test. For example, two items have a difficult of 50%. One is unfinished (the entire class is having trouble, or there is a problem with the item or instruction). The other ranks at the highest for discrimination ability (one group knows or can do something that the rest in the class do not know or cannot do).

Students of all abilities missed the first item; mainly lower scoring students missed the second.

Right Mark Scoring (RMS)

Test Taker Counseling Matrix - RMS & KJS

The right mark score (RMS) student test taker counseling matrix for biology is the same as for nursing in the prior post. Again there is another example of two items with the same difficulty (58) but classified differently: unfinished and discriminating. Question 50 was a tally item.

The knowledge and judgment scoring (KJS) student test taker counseling matrix for biology presents a student view of the test not possible with just RMSing.

(E)xpected most marked & most right
(G)uessing few marked & few right
(M)isconception most marked & few right
(D)iscriminating few marked & most right

As students select items for their individualized tests they are also voting for item performance. Item 50 is an example of a tally that is not scored for a grade.
Item 6 (58%) was labeled unfinished by the test maker view and here is labeled the only misconception by the test taker view. Both sets of data can be sorted (mined) for a variety of relationships. One is to look for copying.

Right Mark Scoring (RMS)

Knowledge and Judgment Scoring (KJS)

Wednesday, June 29, 2016

Test Performance Profile

The Test Performance Profile (see below) was invented to guide item, test, student, teacher and instruction development. Item, Percent Difficulty, and Discriminating are listed in each column sorted first by item and then by difficulty for Master/Easy, Unfinished, and Discriminating.

The high scoring RMS nursing test has far more mastery items than the other two categories. The KJS biology test has far more discriminating items then the total of the other two categories. It also has item 9 flagged as BAD. It may need to be dropped and the test re-scored.

Low scoring Unfinished and highly Discriminating items need to be discussed in class. This information is examined in greater detail in the Student Counseling Matrixes.

Standardized tests do not contain Mastery/Easy and Unfinished items. The goal is to obtain the needed distribution of scores with the fewest items. Discriminating items have a far higher Avg PBR (0.39 and 0.40) than Mastery (which has near zero) and Unlimited (0.12 and 0.15).

Test reliability or reproducibility is estimated by KR20 and alpha. It increases with the length of the test and with discriminating items.

The Discriminating, 50 item and 100 item values for reliability, are surprisingly close for these two tests with very different students (nursing and general biology), student preparation, and assessment (RMS and KJS). In summary, teacher skill takes precedence over statistics in selecting questions for a test.

The nursing test measures mastery. The biology test measures the different things that students found of interest in reading assignments and other course actives: lecture and laboratory. The biology students did a good job in picking items to report what they knew and what they had yet to learn (only 9 Unfinished).

The two columns for Unfinished and Discriminating for biology may look similar to those for nursing if the biology students were forced to guess. The practical, useful, details are in the Student Counseling Matrixes.

Right Mark Scoring (RMS)

Knowledge and Judgment Scoring (KJS)

Wednesday, June 22, 2016

Test Fitness

The average item difficulty on the right mark scored (RMS) test was 84%. That is how well the students prepared for the test.

The test fitness, the average estimate of the Minimum and Maximum number of answers marked for each question (2.2) is how well the test fit student preparation. Test fitness is then 46%. The test design value of one out of four or 25% is lower than the test fitness estimate.

This test functioned close to a true/false test. The test fitness is the average test score when students discard wrong answers they know are wrong and then guess for a right answer from the remaining items. Knowing and guessing, quality and quantity are intermingled .

Right Mark Scoring (RMS)

The average item difficulty on the knowledge and judgment scored (KJS) test was only 73%. Test fitness was 3.1 marks or 32%. The test design value of 1 out of 5 or 20% is again lower than the test fitness estimate.

Multiple-choice tests are easier than their design values. The KJS tally analysis indicates the complex make up of the average test score. Only 16% of the items were answered by just guessing. Some 35% were recording mastery. Some 43% of the items each split the class into two groups in which one group did significantly better than the other group. What, who and why? There was only one misconception item on the test at the end of the semester.

With KJS knowing and guessing are clearly identified. Quality and quantity are assessed independently. This can be summarized on a single page: Test Performance Profile.

Knowledge and Judgment Scoring (KJS)

Wednesday, June 15, 2016

Student Scores

Student scores are sorted by score and by student ID. The percent right (%RT) and score are the same for right mark scoring (RMS). Knowledge and judgment scoring (KJS) combines good judgment (GJ) and and %RT to get the student score. In this case knowledge and judgment are valued equally.

If grades are set using the standard deviation, then the RMS test yields (range 29/standard deviation 8) 3.6 grade levels. KJS yields (range 51/standard deviation 10) 5 grade levels (See below). These results are customary for a test in Nursing and in a freshman general study Biology class.

Skew and kurtosis capture, in numbers, the shape of the score distribution in relation to the normal bell-shaped curve (the ultimate goal of test makers). These values are of little interest to teachers who can look at the actual score distribution. The normal bell-shaped curve has fascinated institutional education for decades. Tremendous effort is made to select questions that will produce a normal distribution of student scores that can then be compared between years on standardized tests.

In the most extreme classroom example, grades are assigned by matching the list of ranked scores directly to a standard bell-shaped distribution (there is the same portion of A, B, C, D, and F grades on each test). Mark off the grades on this printout using both grading methods to see the difference.

The Biology test is measuring learning, for the most part. The Nursing test is, for the most part, measuring what has been learned; it is confirming mastery (average test score 84%). Had the nursing students been given a KJS test the result would have been more clear of who knew what and what each one needed yet to learn. Both quantity and quality would be assessed.

Knowledge and Judgment Scoring (KJS) gives students the option of doing either one; guessing at right answers or reporting what they actually know or can do, what is meaningful and useful. The average test score of 73% is examined further in the following four printouts.

Right Mark Scoring

Knowledge and Judgment Scoring

Wednesday, June 8, 2016

Enter, Edit, and Save Data

Sheet 1 collects the actually marks, a static record, and backup file. Sheet 2 makes possible the dynamic features for scoring multiple-choice tests. All marks have an equal value unless something happens to question that assumption.

Here is the place to STANDARDIZE your test, to select the items that performed well. You can also reward the class when a bad item prompts a deep and full discussion in the classroom. Or just drop it out of the test (This is a lot easier to do than to write the next test so the results are higher or lower to obtain a desired average class score). The analyses of the student scores will point out the items needing to be discussed and/or revised, and the need for further instruction.

Right Mark Scoring (RMS)

Question 50, on the following test, is set to tally how frequent hour tests should be given: A, weekly; B, biweekly: C, triweekly and blank, teacher's choice.

Knowledge and Judgment Scoring (KJS)

Wednesday, June 1, 2016

Answer File Data

The answer data from right mark scoring (RMS)and knowledge and judgement scoring (KJS) look very different. One is a solid mass of marks (except for MURTA LA who missed one). The other is peppered with blank marks (yes, a blank can be a mark that carries the same information as adding another answer option for "do not know" or "something I still need to learn").

Names in the KJSing student ID column have been replaced with the order number in which the answer sheets were turned in during a 50-minute hour test. The order in which answer sheets are turned in is important information. Students electing RMSing are scattered throughout the KJSing data set. Students electing the fewest items to mark to report what they actually knew and trusted were, in general, the first to turn in their answer sheets.

Right Mark Scoring (RMS)

Knowledge and Judgment Scoring (KJS)

Wednesday, May 25, 2016

Right Mark Scoring

Everyone knows traditional multiple-choice or right mark scoring (RMS): students mark, teacher scores, and that is it. Stage One is done.

In Stage Two the teacher may review difficult questions. Then, if an item is thoroughly discussed and of great value give all students the point. If it turns out to be just a bad item, drop it. Click Score for the "true" test and scores embedded in the full test. You have made a lower level of thinking STANDARDIZED classroom test. Stage Two is done.

In Stage Three, a genuine learning environment is established by students and teachers discussing items classified as Expected, Discriminating, Guessing, and Misconceptions. This is only possible with Knowledge and Judgment Scoring (KJS). Stage Three ends with re-scoring to produce a higher level of thinking STANDARDIZED classroom test.

The test fitness is an estimate of how well the test fits student preparation. This is the average score if all students reject known wrong options and then guess from the remaining items on each item. In operation, multiple-choice tests are easier than their design value, such as, one out of four, 25%. Test wise students know this.

The omit value can still be adjusted in Power Up Plus, built on Break Out and advanced features. The Omit default value for Knowledge and Judgment Scoring is set to 50% in Power Up Plus.

[Copyright dates are all 2006 when the company was created. In 2013 the company copyrights were returned to me.]