Reflections on AI’s growing role in evaluation, and what’s missing

Nov 8

At the 2023 American Evaluation Association conference in Indianapolis, I sat in a large, packed ballroom and watched two presenters demonstrate their latest uses of ChatGPT. They wowed the crowd with live translations. “Arabic!” “Spanish!” “Hindi!” yelled attendees, the accommodating presenters complied, and ChatGPT impressed with translations that, according to those who could check, were quite good.

That was less than a year ago, and it already feels quaint. At this year’s AEA gathering in Portland, Oregon, there were many more presentations dedicated to how evaluators can use AI, and it likely showed up in many dozens more presentations, demonstrations, roundtables, and happy hours. And the use cases went far beyond translation. Here are three takeaways I left Portland with that are informing how I think about using AI in our work at Impact Architects, and that I hope will be useful for our partners.

The power of AI is about more than efficiency

There are plenty of use cases about how effective use of AI can make processes more efficient, such as summarizing large bodies of text or structuring data for analysis. We’ve even explored some options on how to use AI for computational text analysis. But AI can do so much more. I attended one presentation about how AI can be used as a “stakeholder at the table” for systems change evaluations. By training an AI module, in this case ChatGPT, with as much specific detail as possible about the details of the system, the challenges of the problem that needs to be solved, and an overview of the efforts taken thus far, evaluators can create an AI with perspective and an awareness of bias. It can be used to move conversations forward and untangle tricky situations. This is AI as a thought partner.

AI can see the future

Okay, it can’t really see the future, but an effective use of AI could be to take advantage of its wild imagination to understand what might happen. I attended one presentation about how evaluators can leverage AI to understand program theory — a logic model that lays out how an initiative might be expected to progress while accounting for activities, outputs, outcomes, and ultimate impact, all while considering context and assumptions. Evaluators will partner with organizations as their work unfolds and support them in making adjustments along the way based on new information or shifting context. With enough parameters set, evaluators can upload a program theory into an AI platform and ask it to embody a hypothetical audience member of the program, a person or an organization, over a certain time period. It can help add nuance to the theory, modify assumptions, or unearth new ones. This use case takes advantage of one of the pitfalls of AI, “hallucinations.” In this instance, we can ask it to look forward and hallucinate in order to benefit the program and its real constituents.

Using AI effectively is a skill to be built

AI is a tool, and the use of a tool is only as good as the skill of the person wielding it. In some cases, the skill needed is making sure the prompts are fine-tuned for useful responses. In other cases, it requires baseline coding or data analysis skills to quality control the outputs. In both cases, effective use will require practice, patience, and experimentation. I learned about dozens of AI platforms for different purposes over the course of the conference, and it will be important to choose the right tool for the right job.

It’s also a skill to know when not to use AI. The takeaway message was this: Evaluators should not use AI to make judgments, such as crafting recommendations or making a determination of success. That seems like good advice for any field. AI can be a thought partner and future-oriented thinker, but we should not rely on it for decision making.

It’s exciting and intimidating to consider the type of sessions that might be held at the 2025 conference. One thing that was definitely missing that would be welcome is this: Evaluation of AI use cases in programs. AEA is a valuable conference because it’s about practice and practitioners, and this year I learned a lot about how it could be useful for Impact Architects, but what about the most effective evaluation methodologies and approaches for AI-based experiments? What are the new sets of assumptions and risk tolerances for these experiments? How are they measuring success? And most importantly, how can evaluators amplify these learnings so that others can benefit?

In the area Impact Architects works — journalism and media — there are active experiments taking place. The Lenfest Institute, OpenAI, and Microsoft just announced a $10 million collaborative focused on AI experiments, and in July 2023 The American Journalism Project and OpenAI announced a partnership to experiment among AJP’s newsrooms. I’m eager to learn what is coming out of these experiments and how the fields of journalism and evaluation can learn from them as well.

Eric Garcia McKinley

Reflections on AI’s growing role in evaluation, and what’s missing

Breaking down Wyoming’s local news landscape: Key insights from the state’s first media ecosystem…

Making workflows work for you: What’s the best impact tracking process for your team?

Let's collaborate!