In Defense of Teaching: Qualitative Assessment

September
2010
Joe Safdie, SLO and Accreditation Committee, San Diego Mesa College

We sometimes forget that the classroom, whether virtual or face-to-face, is a mysterious place: lots of different things are always going on, and good teachers have to hustle to stay aware of even a small portion of them. It helps, for example, to have a sense of whether or not the students are understanding or engaging with the subject matter and the extent to which they’re exhibiting thoughtfulness in responding to it. But other elements, some of which aren’t strictly academic, are also important: their emotional and social development; whether they can apply what they’ve learned to future enterprises; even their relative enjoyment of the class. Lately, we’ve been trying to measure such things by assessing student learning outcomes—which, among other things, might give teachers more information about students who aren’t doing well and the opportunity to help them—but the jury is still out. Because there’s still more to consider: how many students in the class work full-time or recently had their financial aid withdrawn? Is the class an elective or required for their major? For how many is English their second or third language? Did they test into the class, take the preceding class in the sequence, or self-select? Are students texting each other in class? Etc., etc.

We can find out the answers to some of these questions, and often they might inform some of what we do. But whatever the answers are, they contribute to the alchemical beaker of a class as much or more so than the data yielded through student learning outcomes (SLO) assessment. In fact, one problem with SLOs is that breaking the class down into certain measurable elements and not others can give us too narrow a picture of the essentially mysterious conversion of course material into what’s called, in another context, “actionable intelligence.” Michael Pollan, in his In Defense of Food, has a similar complaint against “nutritionism”:

Another potentially serious weakness of nutritionist ideology is that, focused so relentlessly as it is on the nutrients it can measure, it has trouble discerning qualitative distinctions among foods…Milk through this lens is reduced to a suspension of protein, lactose, fats, and calcium in water, when it is entirely possible that the benefits, or for that matter the hazards, of drinking milk owe to entirely other factors … that have been overlooked. Milk remains a food of humbling complexity… (p. 31)

As, I would submit, are classes, on every day and in every way. And yet, there are qualitative distinctions between them: everyone knows when they’re in a good one, or teaching a good one, over a day or over a semester. In fact, professors who constantly teach good classes might be considered “better” than others because the students in those classes are energized, enthusiastic, and succeed at higher rates in subsequent classes. On the other hand, there are good students and poor students in every class: is there a way to increase the ratio of good ones to poor and thus “improve student learning?” SLOs, as well as the collection of “best practices” and programs like the Basic Skills Initiative and Race to the Top would seem to depend on affirmative answers to this seemingly simple question, but … I’m not so sure. Has any best practice resulted from factoids like “80% of the students demonstrated a facility with creative thinking”? Can we measure the actual learning that takes place in a classroom without losing the classroom?

I think we can, but it depends on giving qualitative evaluation—rather than just quantitative assessment—a bigger place at the table. Here’s an example of what I mean, using a rubric from the SLO software system “Taskstream” for assessing a poem in a creative writing class:

  1 2 3 4
Metaphor There are no metaphors in the poem, thus indicating a lack of understanding about metaphors. Some language appears as if it was intended to be a metaphor, thus indicating understanding of definition. However, the metaphors are either not fully developed, or appear as similes. Metaphors are present, but there is obvious confusion between metaphors and similes. Full understanding and application of metaphorical language is evident.
Imagery There is no evidence of imagery in the poem; no evidence of the understanding of imagery. There is evidence of imagery, but it is randomly applied, thus indicating lack of clear understanding. The imagery is applied to standard words or phrases, but it is not very vivid. The imagery is so vivid that it paints a picture with words, thus indicating a clear understanding of imagery.
Tone The tone or mood of the poem is unclear. The tone or mood of the poem is somewhat clear, but inconsistent. The tone or mood of the poem is clear, but inconsistent. The tone or mood of the poem is completely clear and consistent with the context of the poem.
Rhythm There is no recognizable rhythm to the poem. There appears to be rhythm to the poem, but it is somewhat inconsistent. The rhythm of the poem is easily recognized, but does not make sense with the content. The rhythm of the poem is easily recognized and clearly consistent with the context.
Line/Word Spacing The poem looks like prose; no clear sense of the significance of spacing. The text is formatted in poem "form", but the spacing does not seem purposeful. Some spacing seems purposeful and creative, other spacing seems accidental or random. The poem's line and word spacing is creative, purposeful, and enhances the meaning of the poem.

And here’s what I would submit to you about this rubric: it’s no better or worse than almost any other rubric I’ve seen. But somewhere in this elegant construction, the thing we’re trying to measure has disappeared. Where’s the poem? For that matter, where is the student? Put aside that the list of criteria on the left might not be ideal: whatever the criteria was, the work under consideration would be lost in a sea of aggregated numbers. What would it tell me if, say, of the 23 students left in this creative writing class (after seven had withdrawn or just stopped coming, another thing that isn’t covered by SLO assessment), 16 of them had scores of 3.2 on their recent poems? How could I use that information to improve the class, let alone share it in any sort of meaningful way with my colleagues?

Mind you, I was SLO Coordinator at my college for 2½ years until the position got eliminated because of the budget crisis. And I’m someone who believes that the idea behind SLOs is a good one: that there are some general, overarching skills and abilities that students should pick up while they’re in college that might be different than or unrelated to the mastery of course material. But the Great Chain of Being seemed like a good idea at one point as well; like everything else, it’s all in the details. I know some people have used the results of their SLO assessments to make changes in a class or practice or mode of instruction that have helped students learn more, or better, and that’s great: that is, after all, part of our job responsibilities. But in such cases, I feel sure of two things: 1) the numerical scores have been discussed, and interpreted, and argued over, by colleagues who are teaching the same course, and 2) the measurement retained essential contact with the phenomenon it was measuring, which isn’t always easy to manage when dealing with the abstraction of numbers.

For example, the latest report on SLOs, conducted by REL West researchers and agreed to by the Academic Senate, compares the SLOs from one class (English 1A, or transfer-level English composition) in different California community colleges—but to my eyes, it doesn’t really tell us much at all. That’s not because all the colleges didn’t participate: it wouldn’t have mattered if all 112 had. It’s because the SLOs here are operating at a level that’s thrice removed from anything real: they’re removed from the context of the class, the class is removed from the context of the department and school, and the outcomes are then compared to other such outcomes and to an external set of descriptors, the value of which is assumed and never argued. Such surveys tell us nothing about a particular class and the students in it, and, I think, have the further potential to lead to standardization and a loss of aesthetic diversity in our teaching practices.

It might be time for a few qualifications. I don’t believe that numbers are evil (something my own college researcher mentioned after reading a draft of this piece), and I also follow her in believing that quantitative and qualitative assessment shouldn’t be mutually exclusive. But words and numbers “measure” different things in different ways: words can’t be reduced to captions for graphs, and we should be wary of any system that translates our practice into abstractions. It’s also possible that many faculty who have resisted SLOs have done so because of the perceived “injustice” of exclusively quantitative results. The nature of good teaching is—and must remain—an art: not just a science.

Interestingly enough, as I was composing this piece, this e-mail came through:

The experience of performing is very similar to channeling. The more open I am, the more these ideas come into mind ahead of time. I’m performing but I can see these options in the future and can continue performing… . I’m performing live, and I get a preview of a potential idea. I can use it however I want. I can rotate the shape. I can put it over here or put it over there and create a strategy in real time. When I’m open, I see more pieces ahead of time. (Watts)

This seems familiar to me, because it’s what happens with good lecture/discussions in class. But I don’t think it can be quantified, any more than I think a student’s ability to think critically can be: simply, a class is more than the sum of its parts. But I can describe it, qualitatively, and Reggie Watts, in this article in Artforum, just did.

What would qualitative assessment look like? Wouldn’t it take more time? The answer to the second question is probably “yes, a little,” and I’ll come back to it before I close. But including narrative description as part of the assessment is only part of the story; the other part is the assignments for which it might profitably be used. When I was teaching (briefly) in Europe, we gave students a sort of “exit interview” called a colloquium at the end of the term instead of a final exam, in which we asked them open-ended questions about the course material: similar things could happen in individual conferences at any point of the semester. Similarly, oral reports or presentations, perhaps organized in groups, would provide ideal prospects for qualitative evaluation, as would, of course, written documents. In all of this, direct observation is as important as measurement; the assignments we’re asking the students to do should give them lots of chances to exercise critical and creative thinking, and our evaluations of those assignments would naturally follow suit. Just as, in our own evaluations, we usually pay more attention to student comments than the mathematical scores, the notes we jot down using qualitative assessment could be shared with our colleagues more profitably than an abstract statistical average.

There are even ways to make qualitative assessment of SLOs not so time-consuming. For example, we can still use rubrics, but they could be “empty,” keeping the criteria we’re looking for on the left, but leaving spaces for snatches of narrative description and observation in the middle. Below is an example from Mesa’s English 49 class (a developmental English class one level below transfer-level):

Rubric for Assessing Critical Thinking in English 49:
“Upon completion of English 49, students will be able to …”

Criteria    Sophisticated       Competent      Emergent      Beginning    
Analyze and Evaluate
a reading assignment's
argument
Define the purpose
and audience for their
own writing
constrct effective
argments in
response to
assigned reading

A few comments: here, there are words across the top for the various levels of attainment, not numbers: this was obviously constructed by English professors! But, of course, you could use the numbers 1-4, or you could use three columns instead of four, labeling them, perhaps, “Exceeds Standards,” “Meets Standards,” and “Doesn’t Meet Standards.” The important point is that you have space to write down a few comments while the student activity is happening. Otherwise, we run the risk that this enterprise will eventually become another version of No Child Left Behind, an outcome that many of us have worked many hours to avoid. As Pollan, again, reminds us:

Scientists study variables they can isolate; if they can’t isolate a variable, they won’t be able to tell whether its presence or absence is meaningful. Yet even the simplest food [class] is a hopelessly complicated thing to analyze, a virtual wilderness of chemical compounds, many of which exist in intricate and dynamic relation to one another, and all of which together are in the process of changing from one state to another. So if you’re a nutrition scientist [faculty member doing assessment] you do the only thing you can do, given the tools at your disposal: Break the thing down into its component parts and study those one by one, even if that means ignoring subtle interactions and contexts and the fact that the whole may well be more than, or maybe just different from, the sum of its parts. This is what we mean by reductionist science. (p. 62)

Whatever SLOs are or aren’t, I don’t think many people would feel thrilled by the description of their assessment as reductionist science. Still, given the necessity to measure the extent of our students’ learning, the question becomes: how do we do that in a way that isn’t reductionist? I submit that using qualitative assessment as an equal partner to the quantitative targets we set, and increasing the amount of different activities and assignments we use for such measurement, would be a start. By so doing, we’d also better recognize the astonishing balancing act that teachers perform each and every day.

Works Cited

Pollan, Michael. In Defense of Food: An Eater’s Manifesto. New York: Penguin, 2008. Print.

Watts, Reggie. “500 Words.” Artforum. Artforum International Magazine. 6-29-10. Web. On Al Filreis’ blog. “where is this leading me?—on improvisation.” 7-3-10.

The articles published in the Rostrum do not necessarily represent the adopted positions of the academic senate. For adopted positions and recommendations, please browse this website.