Will Artificial Intelligence Force Us to be Less Dumb about How We Evaluate Humans? | McDonnell Boehnen Hulbert & Berghoff LLP
Years ago, I was a proud parent when my children were invited to participate in an honors math program at their grade school. But this initial delight turned to confusion, and eventually frustration.
As just one example of why I was less than pleased with our school’s pedagogy, one very highly emphasized part of the curriculum required that the kids memorize as many digits of pi as they could, with the minimum being 25. Sure, they also learned how pi defined the ratio of a circle’s circumference to its diameter and how to use it in simple algebra, but this memorization task was the focus of the unit, with the child who memorized the most digits (130 one year) winning special accolades.
To me, this assignment missed the point. Pi is a critical value in many aspects of science and engineering, and can be taught directly or indirectly in a number of compelling and fun ways involving wheels, pizza, spirographs, and so on. And its importance in aviation and communications can at least be mentioned.
But the focus was on committing those 25-plus digits to memory and being able to recite them on demand. When I pointed out to the teachers that maybe — just maybe — this was not the best way to prepare children to have an appreciation for STEM fields, they looked at me like I was from another planet. The curriculum was designed around what was easy to test (can the kid produce the 25 digits when asked?) rather than the harder-to-evaluate skills (does the kid know how and when to use pi to solve problems?) that are actually important when using math in the real world.[1]
Thus, when the news broke that OpenAI’s GPT-4 large language model passed the uniform bar exam at the 90th percentile, I was less than impressed. In fact, this outcome is completely unremarkable given that it was trained on billions of units of human text.
The bar exam is a memorization exam. Aspiring lawyers typically spend 10-12 weeks taking a bar exam review course, which involves committing massive amounts of legal rules and principles to memory, as well as learning how to write essays in a formulaic fashion (IRAC). Then you sit for two days of testing in which you regurgitate as much as you can. If you manage to score highly enough, you pass and become a licensed attorney.
During the summer that I spent preparing, I remember at one point mentioning in frustration to my study partner that what the bar exam is actually testing is how much pain one is willing to accept to be a lawyer, and that rapping us across the knuckles a few times with a ruler would probably have the same effect. Indeed, I know of individuals who graduated law school in the top ten percent of their class (in terms of GPA), failed the exam on their first try, later passed, and went on to be excellent attorneys. Clearly, these folks were bright, but when speaking to them they attributed their failure (which was quite the source of shame) on not studying hard enough during bar review. Let that sink in — top law students can fail to be licensed because they do not learn the mechanical proclivities of one specific exam.
A recent paper from Professor Daniel Katz evaluates GPT-4’s bar exam performance and states that “These findings document not just the rapid and remarkable advance of large language model performance generally, but also the potential for such models to support the delivery of legal services in society.”[2] The key word in this sentence is “potential” but even so this statement is misleading.
GPT-4 scoring well on the bar exam is not because AI is achieving human levels of intelligence. It is because the bar exam tests a human’s ability to perform like a robot. Missing from the bar exam are tests of executive function (e.g., staying organized, keeping to deadlines), soft skills (e.g., client interaction and counseling, interpersonal competencies), and law firm operation (e.g., finance, marketing, managing groups, how to be a good employer), all of which are more relevant to a lawyer’s success than their ability to stuff facts into their brains.
Indeed, it is now widely accepted that GPA is much more predictive of a student’s ultimate success than standardized test scores. This is because maintaining a high GPA requires more than the raw cognitive ability to do well on memorization-based exams — the aforementioned executive functioning and soft skills play a significant role. Intellectual ability is important, but so is emotional intelligence.
Turning to patent law, there might be one multiple choice question out of 200 addressing intellectual property on the typical year’s bar exam. So for us patent attorneys, the bar exam is measuring our ability to regurgitate law that we are unlikely to ever apply in practice. To that point, the USPTO requires that we pass a separate patent bar exam. Admittedly, it is also memorization-based, but at least it is open book.
So, to the extent that Professor Katz is implying that GPT-4 or any other of the current generation of large language models can perform significant legal tasks, I have to disagree. Large language models are tools that lawyers can employ, not unlike search engines or Wikipedia. They may be able to carry out certain first-level research functions in place of a junior associate. But when it comes to crafting creative legal strategies that guide clients through complex transactions, they are still far from the mark.
Nonetheless, the strong performance of GPT-4 on memorization-based exams provides us with a golden opportunity to re-evaluate how we teach both children and law students. If the goal is to turn out humans with skills that can be easily replaced by automation, then maintaining the status quo will get us there. But we would be much better off by recognizing and embracing large language models, while remaining cognizant of their strengths and weaknesses. Integrating these tools into a broad-spectrum education system with a flexible curriculum is much more likely to produce graduates who can adapt to the changing needs of the legal profession, or any other field for that matter.
The modern education system is still based too much on a paradigm established in the 1800s, one in which an instructor lectures and the students passively receive their lessons. Given that large language models can outperform most humans in these scenarios, we need to seriously consider changing the system to meet the demands of 21st century life.
And for anyone who absolutely needs to know the first 25 digits of pi, don’t worry because GPT has you covered: “The first 25 digits of pi (π) are: 3.14159265358979323846264. Note that pi is an irrational number, meaning that its decimal representation goes on infinitely without repeating.” Or, it almost has you covered, as the 25th digit is missing from its output.
[1] To get a sense of how prevalent issues like this are in education, 60 years ago Nobel laureate physicist Richard Feynman was asked to help the state of California select math textbooks for its schools. He wrote about the process, which is both humorous and disheartening. From what I have seen, today’s textbooks are better than they were back then but still leave plenty of room for improvement . . . such as justifying why one needs a textbook, period.
[2] Katz, Daniel Martin and Bommarito, Michael James and Gao, Shang and Arredondo, Pablo, GPT-4 Passes the Bar Exam (March 15, 2023). Available at SSRN: or