Knowledge and Judgment Scoring

Wednesday, December 6, 2017

Quantity and Quality of Mind

Table 1

I had to relate the graphic in the last post to the four ways multiple-choice tests can be scored when students pick the questions to report what they know and can do. This is not an efficient way to do standardized testing.

It is a way to promote student development from passive pupil to self-correcting scholar; the basis for high standardized test scores.

Table 1 points out respective pass/fail scores (red box) for the four quantity to quality ratios. The 60% cut point was used in my quantity and quality scoring.

Students voted to value quantity and quality equally. This removed a variable between traditional scoring of marks influenced by their luck on test day; and quantity and quality scoring of marks influenced by their judgment of what they already did know and could do, and of what they had yet to learn: know and understand.

It was also important not to present something that new students would see as a barrier to switching from luck to reason. As is, it took until the third bi-weekly test in each semester for over 90% of students in a class to make the change in scoring and several more weeks to change study habits (bicameral to introspective mind, see previous post).

Chart 1

Chart 1 graphs the above table. It shows the quality vectors for a score of 60%. A traditional test would require a minimum of 30 right marks.

Quantity and quality scoring requires a minimum of 10 right marks added to a perfect 50% value for self-judgment. This rarely happened out of over 3,000 students.

The standard of 90%, set for high risk exams (nuclear power operators, police, and doctors), starts with a 75% value for judgment. You better know what you are doing before you act, or ask questions, or do not answer.

With the active start score (75%) set higher than the cut point (60%) there would be no incentive to mark anything on such a test. A game rule is needed: A minimum of 10 questions must be selected [(75%-60%)/(75%/50 counts) = 10 counts] to include the 60% cut point.

The chart points out that any cut point can be reached from any starting score. Then create a test bank of items with difficulties in the 20s, 40s, 60s and 80s.

Deliver the test over the Internet starting with a question from the 40s or 60s. A test, starting with a few items averaging at 60% difficulty, followed by 10 more items near 60%, may stop and call you a winner at 60%. The test stops when answering more questions is expected to not make a change in the score.

The test is efficient. It takes the least time with the fewest questions. It can dynamically cruise across the four, on paper, static levels to only deliver a few questions that match each student’s preparation instead of presenting a test booklet of a 100 questions.

We will never know what you actually know. But you know enough. We will know that your performance is a near perfect match to the average scores of a select group of individuals that you are believed to be a good fit. This works for graduation, job placement and entrance exams.

It works for high quality students; who may only attend class on test days. Low quality students need a teacher to attend to specific problems.

The Internet statistics are based on all the students tested. Classroom, teacher created test, statistics are based only on that one class. The paper and pencil methods we found so effective are now in the past. The bicameral/introspective mind theory hints at why they worked.

Sunday, December 3, 2017

Student Minds

My reading of “Two Origins of Consciousness” by Bill Rowe, Chapter 11 in Gods, Voices and the Bicameral Mind: the Theories of Julian Jaynes, edited by Marcel Kuijsten, 2016, brought back to mind how underprepared college students performed when multiple-choice (forced-choice) was not scored by just counting right marks but by scoring both knowledge (quantity) as well as judgment (quality).

The new science building (1980) contained lecture rooms, spanning two floors for 120 students, that replaced the 64-chair lecture room. I wanted to know what these students actually believed, knew, and understood, in order to do the best job teaching. Bluebook essay exams were no longer feasible.

The time-honored solution in mass testing was, of course, multiple-choice. But this time it was different. Now the class voted to score judgment (quality) as well as knowledge (quantity). Quality was valued the same as quantity.

The unexpected result was passive pupils (with a declared negative interest in the course), in general, became active scholars. They started studying for their own enlightenment rather than memorizing nonsense for the next test.

This change in attitude and how they used their brain in studying; learning by questioning and relating, yielded higher scores in the course and their other courses. Students voted to use knowledge and judgment scoring in other large general studies courses.

Does knowledge and judgment scoring provide a cheap, and easy to use, window into the bicameral and conscious introspective mind? The scoring reinforces successful judgment even if the test score is low (Scores 1-4).

The judgment score, or feel good score, that corresponds to notes that teachers make when scoring an essay test, is key to directing student study behavior. Improve a test score with a high judgment score by studying, making sense, of more material. Improve a test score with a low judgment score by changing study habits.

Or is the case more of practice until the mind changes from bicameral to introspective? We can see and test for the results but we have yet to see what goes on in the brain and in the mind.

One student commented that he had failed the first three exams on which he chose traditional scoring. He elected knowledge and judgment scoring and ended up with a comparable score.

His judgment score (quality) was as bad as his knowledge score (quantity). OH!! “Had I just marked questions I could have used to report what I really knew (100% judgment (quality) score) I would have passed the test. Yes!

Most students made the transformation within one semester. Over 90% switched from traditional scoring to knowledge and judgment scoring after two tests.

Another reported that the teacher is to write the questions, score the test, and tell the student how many right marks were counted. “This is a general studies course that I am taking only to satisfy graduation requirements (pass/fail).”

To me this represents the bicameral mind at work. They are unable to generate an interest for a semester. Their study habits are grade school memorize and forget. Their standing at the end of course is the same as before taking the course (except a general studies course has been checked off).

Knowledge and judgment scoring quickly identifies these students. They elect not to use their judgment. They fear making a wrong mark but feel comfortable gambling (God’s will) for right marks by marking all questions.

They lack the habit of relating information into a web of relationships (making sense) that promotes understanding. They lack the ability to verify what they know.

Test scores can be viewed from both the test maker’s and the test taker’s viewpoints. The test takers can be ranked on knowledge and judgment, quantity and quality. High quality students (scholars) can be expected to do well in the future.

Test makers receive an item analysis that groups questions into four groups based on how students used the questions to report what they believe, know, and understand:

Expected: Most students elect to mark and are right.
Misconception: Most students elect to mark and are wrong.
Discriminating: Few students elect to mark and are right.
Unexpected: Few students elect to mark and are wrong.

I placed the same misconception items on the next test three times in a row after reviewing the rational for the misconception each time. On each test there were fewer students marking. The class knew they did not know but could not mark the right answer. What we learn by association (growing up with it) is difficult to change by rationalization (introspection).

Discriminating questions provide a powerful insight into what a class understands. In my experience this is better and more reproducible than essay question in large classes.

These student’s brains did not change during the course, but how they used their minds did.

Struggling students could see a validation of their judgment. They did have judgment. They did know what they knew. They did not need to have someone else tell them. They could now see, by way of the judgment score, what scholars do to be successful. Something that is meaningful is easy to remember.

They did not have to fear courses that were cumulative. The answer to that question on the first day was a trip to the registrar’s office for several students each semester.

Introspection was rewarded. Is this a quantifying model for viewing how students use their minds and change their use for the better? It works, whatever the underlying reasons.

The concept of the bicameral/introspective mind has given knowledge and judgment scoring an extended life, possibly in research projects. There is much more to be learned if one choses to work outside of safe areas in psychology and education.

Also see http://residentialcarefortwo.blogspot.com for other related details.

Your comments are appreciated.

Sunday, September 18, 2016

Student-Centered Learning

I have spent over 40 years preaching the need for schools designed for success rather than for failure. Yesterday I happened upon an article by Nicholas Donohue that presents convincing evidence that that is being done by transforming high schools in the New England states. It is call student-centered learning. Also see Andrew Cohen What School Could Be.

My attempt in 1981-1989 used a campus computer system at NWMSU, textbook, lecture, laboratory, AND voluntary student presentations, research, and projects. This work has been further developed in Multiple-Choice Reborn and summarized in Knowledge and Judgment Scoring - 2016. In 1995, Knowledge Factor patented an online confidence based learning system (now in amplifier). Masters, 1982, developed Rasch partial credit scoring (PCS).

All three put the student in the position of being in charge of learning and reporting; at all levels of thinking. They approached evaluating an apple from the skin, as traditional multiple-choice (guess) testing is done.

Partial Credit Scoring just polished the apple skin. The emphasis was still on the surface, the score, at that time. Knowledge Factor made the transition from the concrete level of thinking to understanding (skin to core), and provided the meat between in amplifier. Nuclear power plant operators and doctors were held to a much higher responsibility (self-judgment) standard (far over 75%, over 90% mastery) than is customary in a traditional high school classroom (60% for passing).

My students voted to give knowledge and judgment equal value (1:1 or 50%:50%). Voluntary activities replaced one letter grade (10% each). The students were then responsible for reporting what they knew or could do. They could mix several ways of learning and reporting.

A student with a knowledge score of 50% and a quality score of 100% would end up with about the same test score as a student who marked every question (guessed) for a quality, quantity, and test score of 75% (with no judgment).

These two students are very different. One is at the core of being educated (scholar). The other is only viewing the skin (tourist). The first one has a solid basis for self-instruction and further learning; is ready for independent scholarship. The apple seeds germinate (raise new questions) and produce more fruit (without the tree).

We know much less about the second student, and about what must be “re-taught”. The apple may just be left on the tree in what is often a vain effort to ripen it. Such is the fate of students in schools designed for failure (grades A to F).

In extreme cases, courses are classified by difficulty or assigned PASS/FAIL grades. My General Biology students were even “protected” so I could not know which student was in the course for a grade or pass/fail.

Students assess the level of thinking required in a course by asking on the first day, “Are your tests cumulative?” If so, they leave. This is a voluntary choice to stay at the lowest levels of thinking. Memory care residents do not have that choice.

There is a frightening parallel between creating a happy environment for memory care residents here at Provision Living at Columbia, and creating an academic environment (national, state, school, and classroom) that yields a happy student course grade. Both end up at the end of the day pretty much where they started, at the lowest levels of thinking.

Many students made the transition from memorizing nonsense for the next test to questioning, answering, and verifying; learning for themselves and knowing they were “right”. This is self-empowering. They started getting better grades in all of their courses. They had experienced the joy of scholarship, an intrinsic reward. “I do know what I know.” The independent quality score in knowledge and judgment scoring directed their path.

Student centered learning is not new. The title is. This is important in marketing to institutionalized education. What is new is that at last entire high schools are now being transformed for the right reason: student development rather than standardized test scores based on lower levels of thinking during instruction and testing.

These students should be ready for college or other post high school programs. They should not be the under-prepared college students we worked with. The General Biology course was to last for only a few years; until the high schools did all of this work. In practice, the course became permanent. Biology did not became a required course in all high schools.

My interest in this project was to find a way to know what each student really knew, believed, could do, and was interested in, when a new science building was constructed in 1980 with 120 seat lecture halls. The unexpected consequence of promoting student development, based on the independent quality and quantity scores, was not only a bonus but appropriately needed for under-prepared college students. Over 90% of students voluntarily switched from guessing right answers to reporting what they actually knew and could do.

In my experience, the multiple-choice test, when administered and scored properly (quantity and quality) yields as good (if not better) an insight into student ability as many overly elaborate and expensive assessments other than actual performance. Student development (becoming comfortable using higher levels of thinking) is an added bonus.

Wednesday, August 3, 2016

Copy Detector - RMS

This copy detector is an auto-pilot version of the original cheat checker that could point to the source person. Here (Sheet 8) a pairing index ranks answer sheets by the degree of pairing (Sheet 9).

An interesting feature of this right marked scoring (RMS) nursing test is that Unique pairings (red) occurred only toward the end of the test.

There is no distinct break in the beginning of the pairing index and at the end of the pairing count plots to indicate cheating.

Wednesday, July 20, 2016

Test Marks with Student ID by Item

A chart showing marks with student ID by item number is a classic printout for multiple-choice tests for use in class discussions. It is simple, but it lacks the analysis results.

Grade book software can import these files.

This post, #20, ends the pages from the nine-patch.com website.

The real magic is using these printouts is in pointing out where students and teachers should spend their time most productively.

Detailed analyses can be found in Multiple-Choice Reborn and Rasch Model Audit. With the end of NCLB and CCSS, fertile soil may yet be found for knowledge and judgement scoring. It is time to do multiple-choice right. Give students the option of Smart Testing along with traditional Dumb Testing.

Nursing Right Mark Score (RMS)

Biology Knowledge and Judgment Scoring (KJS)

Wednesday, July 13, 2016

Guttman with Scores by Item Difficulty

The lowest scoring student is on the bottom line. The most difficult item is on the right side. The lower right corner is as bad as things can get. The upper left corner is as good things can get.

Again, knowledge and judgment scoring (KJS) has more information to work with (accurate, honest, and fair) than right mark scoring (RMS). The quality score (%RT) is in the 80%s all the way down to a student test score of about 60%.

Most of these students actually know what they know and what they have yet to learn. They have a solid basis for learning more. KJS promotes student quality.

Nursing Right Mark Score (RMS)

Biology Knowledge and Judgment Scoring (KJS)

Wednesday, July 6, 2016

Test Maker Counseling Matrix - RMS

The test maker view of a right mark scored (RMS) test is based on item difficulty (%), discrimination ability (A,B,C,D), and item performance (mastery, unfinished, and discriminating).

There is no right mark score (RMS) test taker student counseling matrix as students have no vote on which items to select for their individualized test.

This is a traditional item analysis plus a ranking based on how the item performed on the test. For example, two items have a difficult of 50%. One is unfinished (the entire class is having trouble, or there is a problem with the item or instruction). The other ranks at the highest for discrimination ability (one group knows or can do something that the rest in the class do not know or cannot do).

Students of all abilities missed the first item; mainly lower scoring students missed the second.

Right Mark Scoring (RMS)