The Perils of Having Tests Graded by a Computer

My hands hovered over the keyboard as my brain caught up to what my fingers had just typed. Did I really just make that comment to a student? “The computer won’t know that this fragment works as part of your style; it will just see a sentence fragment and most likely will ding you for it.”

Even before responses were graded by computers, AIR tests were never looking for a piece of writing that demonstrated a unique voice and style, but rather a piece of writing that included enough elements on a checklist for the assessor to deem the text “proficient”. Still, with human assessors, there was the opportunity to wow the grader, to stand out from the other essays in some way. Now, I fear that essays that stand out too much might actually lose points for not aligning closely enough to the templates used to program the machine for that particular essay prompt.

In addition to worrying about writing a too-unique response for a computer, a student must also worry about not using enough original language in his response. 3rd Grade AIR test responses are being given zero points if there is too much wording from the question in the student’s answer. Many students are taught to restate the question to help guide their writing, but now, with machines scoring their work, that can result in a score of zero. Curiously, tests regraded by humans at the request of school districts are not seeing a significant number of scores changed. I wonder if we are training computers to grade like humans or, sadly, training humans to grade like computers.

Anyone not familiar with the high school AIR ELA tests might be shocked to learn that only 30% of the student’s response is expected to be original. That seems like a very low amount of original text. However, students are asked to read a few passages and then cite the passages extensively in their essay response. Indeed, four of the ten points possible on the essay are based on Evidence and Elaboration. Students are expected to include “smoothly integrated, thorough, and relevant evidence, including precise references to sources and an effective use of a variety of elaborative techniques (including but not limited to definitions, quotations, and examples) . Do the machines recognize “precise evidence” and quotations from sources as support for the writer’s argument? Or do they simply register unoriginal (copied) language and give the essay a zero?

The rest of the high school rubric is troublesome, too. To earn the highest scores, students are supposed to use a “variety of transitional strategies” in their response. Can a computer recognize strategies or does it just count transitions?

Students are expected to include a “satisfying” introduction and conclusion. How can a machine determine satisfaction?

A good essay response will maintain an “objective tone”. How does a computer even begin to recognize tone, let alone determine whether or not it has been maintained?

Students desiring the highest scores need to use “appropriate academic and domain-specific vocabulary” in their response. How can a computer determine if a vocabulary word was used appropriately? Can it even tell if the word was used correctly?

Evidence used in a response must be “smoothly integrated”. A computer can be programmed to look for quotation marks indicating a direct quote from the passage, but can it tell how well that quote has been integrated into the essay?

No, I am simply not convinced that a computer can assess a piece of writing in any fair or meaningful way.

All that aside, there’s another more important concern I have with our students writing for a computer audience. Writing is used to communicate in myriad situations, but at its core, writing is an art form. One of the late Robin Williams’ greatest performances was as the English teacher John Keating in the movie Dead Poets Society. In the film, Mr. Keating challenges his teenage students to see the beauty and power of the written word: “We don’t read and write poetry because it’s cute. We read and write poetry because we are members of the human race. And the human race is filled with passion. And medicine, law, business, engineering, these are noble pursuits and necessary to sustain life. But poetry, beauty, romance, love, these are what we stay alive for.” Don’t we want our students to find a creative outlet that allows them to express their true selves? To find some art form, perhaps writing, that gives their lives meaning? I doubt that learning to write a standardized test essay, especially one written for a computer, will encourage any student to explore the beauty of the written word. And if today’s young writers aren’t being encouraged to create pieces that express their unique view of the world, will there be any engaging texts to read in the future? Or will we lose the beauty of a Fitzgerald metaphor, the power of a Maya Angelou poem, the lasting impression of a Dickens first line?

The idea of a computer assessing any art form is ludicrous. Could a computer assess a painting? Perhaps it could be programmed to look for certain colors or shapes, but the overall feeling of the painting would not be well-represented by that analysis. A computer could be programmed to analyze certain chords or rhythms or key changes in a song, I suppose, but none of that would adequately measure the power of the music, the way the song makes the listener feel upon hearing it. To extend an old cliche, expecting computers to meaningfully assess any artistic endeavor would be like trying to comprehend the beauty of the forest by analyzing individual tree branches and leaves.

John Keating also told his students that “No matter what anybody tells you, words and ideas can change the world.” There is a monumental difference between teaching our students to use language in a way that will change the world and teaching our students to earn a good essay score from a computer. I shudder to think of how the testing generation we are producing will view the world and the role of language in it. When they write, will they imagine a lover’s heart being moved by the beauty of their poem? Will they envision a mind changing, a society evolving because of the power of their impassioned arguments? Or will they simply see yet another screen on the receiving end of their writing?

I used to encourage my students to use the introduction of their AIR test essays to “wake up” the human assessor who had probably already read dozens of essays about the same topic. Now, I must teach them to consider how a programmed computer might view their words. And that, I’m afraid, could have a devastating impact on how my students might view the world.

About

Member Center

Get Involved

Resources

Media

The Perils of Having Tests Graded by a Computer