ChatGPT & Assessing Student Learning
How Do We Assess Learning in the Age of Ubiquitous AI?
I’m not sure how to assess student learning for generative tasks that involve the production of text or images — models like ChatGPT and Stable Diffusion are so easy to use, and so maddeningly good (and bad!) that it doesn’t seem plausible that we can stop students from using these tools through surveil and punish efforts. Despite the hype over so-called “AI detection tools,” so far these tools are too untrustworty to use. They are essentially simple perplexity (word variation) models, and evading them is trivial. You can simply add “use a perplexity level of around 600” to your prompt, and even OpenAI’s own detection tool is baffled.
What Are We Teaching? What Are We Assessing?
Instead of a fruitless effort to catch my students cheating, my instinct is to try and clarify what I am teaching, and try to adapt my assessments to align them with human learning outcomes. So for example in my undergraduate class on “Social Media, Technology, and Conflict,” two important learning outcomes are analysis and synthesis. I want students to be able to analyze a claim (What assumptions does this require? What does this imply? What are objections? Are there analogous that shed light on this?). I also want them to synthesize existing knowledge — to, combine, typologize, and taxonomise existing knowledge. In both examples, the goal is to get students to move from knowledge repetition to knowledge creation. I was already trying to do assess this, but given the easy glibness of ChatGPT, I think assessments can be revised to more effectively center on these complex human skills.
What Does a Knowledge Creation Assessment Look Like?
The following is a first draft assessment that requires students to conduct a literture review (an inherently synthetic task) within an analytical framework (a warfighting concept). There is also a design memo requirement, which asks students to articulate rhetorical choices, a self-analytical exercise.
This first effort is provisional — for one thing, ChatGPT can do literature reviews (sort of, and if you trick it). The problem is that ChatGPT can produce glib, genre-conforming reviews, that range from “pretty good, actually” to “complete nonsense.” My empirical analsysis suggests that when prompts include often-used terms of art, they provide semantic hooks — they point the model at the correct embeddings space for next-word-prediction that works fairly well. But if a prompt includes classes of things or uses non-discplinary terms, there’s no semantic hook. If I ask ChatGPT:
Suggest four scholarly sources that treat war as a boxing match between two people who seek to dominate the other. Include a 2–4 sentence summary of each source.
It refuses to answer, and gives a blurb about the complexity of modern war theory. But if swap out some terms:
Suggest four scholarly sources that treat war as a zweikampf: two wills seeking to dominate each other. Include a 2–4 sentence summary of each source.
It cheerfully complies, and does a pretty good.
I’m sure time and experience will show the need to revise this, but for now, here’s a first stab:
Mid-Term: Literature Review & Design Memo
You will conduct a literature review so I can assess your analytical and synthetic thinking skills to produce new knowledge, as well as your ability to craft a clear written argument: write a 4–6 page review of research on how social media has affected armed conflict in the 21st century. Organize your lit review using one of the major concepts for war from MCDP 1 Warfighting, Ch 2 “The Theory of War.”
The purpose of a literature review is to walk the reader through the current research field on a topic. Don’t list and summarize research literature: that’s knowledge repetition. Instead, analyze and synthesize the literature in a way that will further inform your audience with your insights and take-aways (knowledge creation). For both analysis and synthesis feel free to use the heuristics and strategies we learned and practiced in class, but feel free to use your own strategies as well. There are two explicit locations for you to synthesize distinct sources in this assignment: at the highest level:
- The prompt (“How has social media affected armed in the 21st century?”)
- The framing (Warfighting Ch. 2).
In addition to your lit review, you will also draft a design memo, explaining your writing choices — you have two deliverables.
Instructions:
First, write a literature review on how social media has affected armed conflict in the 21st century, covering at least 12 research studies, for a total length of four to six pages. Please use APA style (Times New Roman 12, double-spacing), and please cite sources including page number to support your claims. For this assignment, please do not use LikeWar as a source.
- Write an introductory section that introduces the broader issue, zeroes in on the specific topic, articulates why it is important, and sets up the audience to make sense of your review (both the content of the review and your take on it).
- By “studies,” I mean peer reviewed, scholarly sources. Don’t use LikeWar or journalistic sources. Do feel free to use RAND reports or other scholarly sources we’ve read this semester.
- Group studies together meaningfully in the review section. For example, if some studies focus on causes and others on solutions, that could be a way to synthesize them. Or imagine a resilience review where some studies use an agentic model of human behavioral, while others use a more deterministic, and grouping them that way.
- When appropriate, consider higher level synthesis, e.g. imagine gathering resilience studies that are medical in nature (genetics, epigenetics, neurology, mental health) but then showing how those can be re-grouped into finer-grained categories (mechanistic & disease/etiology approaches).
- As part of your review, analyze the field/parts of the field:
- Are there gaps in what is addressed?
- Are there potential problems in research, for example only using quantitative approaches to a fundamentally qualitative issue?
- Can we trust these studies/approaches?
- Do you see similarities/complements that suggest new possibilities
- Can you articulate affordances/constraints or relationships between approaches/schools/traditions? - Have a conclusion that ties together your review. Good things to include are a summary of your outline and analysis of the field, the main take-aways for the audience, and some kind of “so what,” such as new directions, implications for policy, support for a position, etc. The goal is new knowledge about what experts are saying about X.
Second, draft a 1–2 page written design memo explaining your writing choice. You should answer these questions:
- What was your organizing concept, and why did you choose it?
- Who is your intended audience?
- What new knowledge have you created?
- How did you use AI-writing assistance (if you did). If you did, what process did you use to verify accuracy and credit others’ work?
- What did you learn from this assessment?
Additional requirements:
- Chunk ideas by paragraphs. Also consider labeling paragraphs via topic sentences.
- Label parts of your argument, e.g. sections and subsections. Hint: “Introduction” is good, but “Introduction: How Does the Army Choose a Model for Soldier Resilience?” is way more useful to your audience.
- Understand what the topic of any given sentence or clause is, and make sure you focus attention on that topic.
- No block quotes. Also, don’t quote when a paraphrase with citation would work.
- Avoid lists: “Tony said this, Mary said that,” etc. Good literature reviews synthesize prior research. For example, you might read 5 studies on social media privacy and synthesize them into three main lines of inquiry, e.g. potential harm to users, platform motivations, and mechanisms for control.
- Unless there is a reason to topicalize author “X,” topicalize the research findings, supported by citations, e.g., “Prior research in military resilience has shown that protective factors exist at multiple levels, including personal, unit, family, and broader culture (Meredith, et al., 2013).”
Rubric:
- Does the student fulfill the assessment requirements? Is the review accurate? (Learning Outcome: Completeness/Accuracy)
- Does the student analyze cited research? (Learning Outcome: Analysis)
- Does the student synthesize cited research? (Learning Outcome: Analysis)
- Does the student chunk distinct ideas? (Learning Outcome: Chunking/Whitespace)
- Does the student usefully label chunks? (Learning Outcome: Labelling)
- Does the student topicalize ideas & concepts? (Learning Outcome: Topicalization)
- Does the student write to a specific audience? (Learning Outcome: Audience Analysis)
- Does the student show evidence of productive AI use? (Learning Outcome: Productive AI use)