Wednesday, February 23, 2011

Context, Content and the Turing Test

In a recent post I laid the blame for the inadequacies of neoclassical economics and behavioral economics on the failure to take into account human context. By context I mean that humans make decisions that are colored by their assumptions, experience, agenda, and even their sense of foreboding.
One way for economics to overcome its deficiencies is to take into account these inherently human characteristics. A different route is for people to cast aside these traits and start behaving more like computers. It looks like we might be going down the latter path.

In an article in this month's Atlantic, Brian Christian recounts his role as a confederate in the annual Loebner competition, which runs the Turing Test to see if computers can fool judges over the course of a five minute conversation conducted via computer console. The humans won this time around, as they have in each of the twenty years the contest has been run. And Christian's bet is that the computers will not be winners anytime soon because even as computers get faster and more adeptly programmed humans will counterattack with the weapons in their arsenal. One of those, which Christian used to win the event's prize as the “most human human” (the human who was most often identified correctly as a human), was to interrupt frequently and backtrack to previous points in the conversation the way we do in real conversation. By comparison, the computers far preferred a you-ask-I-answer interrogative approach.

The tendency for the Turing Test to become a competitive game for the humans as well as the computer programmers -- that is, where the humans are trying to win rather than 'be themselves' within the structure of the game -- defeats the test's intention, which is more or less to have a computer be indistinguishable from a person in a “normal” human interaction: say, a pleasant dinner conversation with a stranger, in which neither party is trying to prove that he is not a computer.
A better Turing Test to overcome the problems introduced in the competitions is to interject a computer into a round of dinner conversations where the human subjects are not made aware that this is occurring. After the fact, subjects are told that some of their companions might have been computers, and only then are they asked to rank the guests by “humanness.”

Apply the same method to other common modes of conversation, moving down the line toward the increasingly vacuous and context-less: e-mail exchanges, then online chat, and finally “texting.” As we go down the line we lose more and more context and depth. Each back and forth depends, if anything, on fewer and shorter prior communications. Tweets, which seem like the lower-limit of texting, are virtually “stateless,” meaning that they often spew forth apropos of nothing. As we descend into these more modern forms of communication it becomes easier and easier for a computer to “win” the Turing Test.

To illustrate this point, Christian relates an exchange between a computer and an unwitting human, where the human engaged in a conversation for an hour and a half, and then broke away without ever realizing there wan no human on the other end. (The dialogue, presented in part in this link, is one of the funniest things I have ever read). And this occurred in 1989:

Mark Humphrys, a 21-year-old University College Dublin undergraduate, put online a program he’d written, called “MGonz,” and left the building for the day. A user (screen name “Someone”) at Drake University in Iowa tentatively sent the message “finger” to Humphrys’s account—an early-Internet command that acted as a request for basic information about a user. To Someone’s surprise, a response came back immediately: “cut this cryptic shit speak in full sentences.” This began an argument between Someone and MGonz that lasted almost an hour and a half. (The best part was undoubtedly when Someone said, “you sound like a goddamn robot that repeats everything.”)

Returning to the lab the next morning, Humphrys was stunned to find the log. His program might have just shown how to pass the Turing Test. When it lacked any clear cue for what to say, MGonz fell back on things like “You are obviously an asshole,” or “Ah type something interesting or shut up.” It’s a stroke of genius because, as becomes painfully clear from reading the MGonz transcripts, argument is stateless—that is, unanchored from all context. Each remark after the first is only about the previous remark. If a program can induce us to sink to this level, of course it can pass the Turing Test. 

We are indeed sinking to that level, not by becoming more verbally abusive, but by becoming less verbal, period. We are moving as a society toward the vacuous and non-contextual as we embrace new modes of conversation. Many have written on the vacuousness of IM and SMS-based conversation. But it is not depth of content that differentiates humans from machines. A computer can already beat us in terms of content. One human in a previous Loebner competition was pegged as a computer because she knew more Shakespeare than the judges thought was humanly possible, but not more than what they thought was possible for a computer. Recently a computer went head-to-head with past Jeopardy champions and won handily.

For humans, context matters more than content. A computer does not have existential angst. It does not hold grudges or have its reactions shaped by its childhood experience. It does not respond to a remark based on the previous conversations and how that colors the sense of the other person's interests and emotions. These dimensions of human interaction are flattened as we sink into the texting, twittering world.

Monday, February 14, 2011

Tiger Mothers and the Ming Dynasty Examination System

In the Ming Dynasty, (1368 – 1644), China established an examination system as a merit-based approach for appointments to government office. There were three levels to the exams, with the final cut then coming through an examination administered by the Emperor himself. The subject matter of the exams was standardized beyond anything we see today. It was based on a limited set of ancient works, stripped of any contemporary additions. The examinations depended exclusively on the memorization of these classics. The exams were administered in a way that assured anonymity. Those reaching the third level wrote in separate cells, the equivalent of modern-day cubicles. After days of writing, they literally threw their papers over a wall, where the writing was copied by a scribe to assure there would be no tell-tale indications of the examinees.
Those seeking elite government office spent years preparing for the exams. Those who failed could reapply as often as they wished. This gave hope that even those of humble birth could rise to the upper class by dint of their will and assiduous efforts. This in turn increased the stability of the Dynasty, because those who might vent their frustration of being outside the system and who had the talent for fomenting a revolution could be channeled into the elite rungs of society instead. And the fact that this path existed made it more difficult to corral others of a similar mindset.
This system was adopted by other Asian countries, notably Japan and Korea, and has continued to the modern era with little change. The path to the top colleges came through similarly standardized tests based on that ability to memorize and learn by rote. These tests were of such critical importance that students followed up their class work with hours of after school studies, and often took an additional year to prepare. Tests governed admission into the elite middle schools, which in turn prepared the student for the next set of tests to get into the elite high schools, which then led to the elite colleges. Unlike the U.S., the pecking order of those colleges is clearly determined, with one school indisputably at the top – Seoul University in Korea, Tokyo University in Japan.
I studied Asian languages in college and spent a few years in Asia, seeing this first hand. Just before I spent time in Korea, the country had eliminated the grueling examination program for entrance into the middle schools, and the result was an almost immediate increase of an inch in the average height of twelve and thirteen year-olds. I knew students who took a year after college, living in squalid conditions while studying non-stop for the kodug koshi, the equivalent of the third-level exam that extended from the Ming. And those who failed could retake the exams, in the same spirit as occurred during the Ming.

I believe this tradition of examination, based on memorization and rote learning, with a fanatical focus to the exclusion of all else, is at the root of the Asian “Tiger Mother” approach to raising children today.

The examination system is less prevalent now in Asia because government service has lost some of its earlier luster as opportunities expanded in the private sector, and it is certainly irrelevant for Asians who now live in U.S., (the preparation for the SAT, substantial as that can be, pales in comparison). But the tradition remains. Perhaps it survives as more than a tradition, because the families of those who harbored the characteristics that allowed them to succeed in these exams would have flourished, so those traits would have survived disproportionately.

The rigor of this examination process, which in the U.S. simply does not require the level of focus and does not fully determine one's future, is being channeled into other areas. One area, prominent in the Tiger Mother book, is music. Several of my children who participated in piano competitions were often the only non-Asians. The results of the Tiger Mother progeny's two plus hours a day of practice, focused a year at a time on the two or three pieces required for most competitions, is spectacular in one respect, and flat in another. Such musical training is more like training for athletics; indeed piano performance in particular can be readily transformed into an athletic event that focuses on small-muscle groups. The performances of the piano athletes are technically spectacular, but as would be expected from something that is developed by rote, they can be lean on musicality. Think gymnastics versus ballet. (I sponsor a piano competition in the memory of one of my children who had an insatiable love of music where a broad repertoire is required, with the hope that this will map to students who have a love of music as an end in itself).

What is the end result of this vestige of the Ming approach to education? Well, we can look back to the end result in the Ming itself. Those who passed the examinations and entered into the elite offices had the classics down cold. But they didn't know much else. How could they, given the efforts and focus required of these examinations? And while I don't have much to go on, my guess would be that they were not exactly off the charts in terms of what we now popularly call emotional IQ. But the history of the period suggests that for all the laudable screening, those who succeeded to office often did not succeed in the office.

My experience is that this process as it has been retained in the modern era leads to similar failings. That should not be surprising, because as with the Ming, there is little time for anything beyond the task. There is an incredible uniformity in the approach to problem solving, and the sorts of problems that can be solved. When I was a professor, I had two Korean students who handed in identical exam papers. They went so far as to work out the problems in the same steps, put a box around each problem, put identical work in the same place in the box. They both even underlined each of the answers twice. It was clear to me that one of them must have copied in distinctively uncreative fashion from the other. When I called them into my office and confronted them with their identical work, they really had no idea why I thought there was a problem. They had not cheated, they had been trained with painstaking precision to do things in the same way. Thus the form of their work was identical, the process of their solutions was identical, and their mistakes were as well.




Tuesday, February 1, 2011

Why are We “Irrational”: The Path from Neoclassical to Behavioral Economics 2.0

A few months ago I discussed the failing of econophysics, and more generally, the economic paradigm that treats people like computers and views economic dynamics like physics. The natural follow up question is, “What can you say that is constructive?” The answer is an emerging approach to behavioral economics.
Over the past few decades it has dawned on some researchers that we don't make decisions the way most economists think we should. And as a result behavioral economics has become a burgeoning field of study. Initially, the bulk of this field consisted of cataloging behavior deemed aberrant and anomalous. That is, the underlying assumption was that the economic view of decision making is the correct one, and the economists need to see where people get it wrong. Thus, we had descriptions of behavioral economics such as “exploring limited rationality” and developing models for the “systematic imperfections in human rationality.” When inconsistencies between behavior and theory were demonstrated, the most charitable response from the neoclassical school was that maybe there was a missing factor; the theory was correct but not well parametrized. Unlike similar fields in psychology and biology, little time was spent on understanding how people think, why they think the way they do, and the ways the bedrock assumptions of economics based on mathematical methods and axioms of behavior might be off the mark.
And they probably are off the mark because, after all, neoclassical economics is missing half the story. It has left out any consideration of the context in which people make decisions, how that relates to people's varied experience, environment, and the uncertainty they harbor about how the world might change in unanticipated ways -- ways that cannot be captured through an enumeration of the probabilities of the possible states of nature. One field that does take this important (for humans) context into account is called behavioral ecology. It is not as well known in economics as it is in biological and psychological studies of behavior. Now, behavioral economics is incorporating this psychological realm.
This new approach is a quiet revolution that may transform the way we look at economic behavior. The era of mathematical, axiomatic views of human behavior will give way to approaches that start with how people look at decision making, understanding why they do that, and then understanding why that approach might have arisen evolutionarily and how it, rather than the utility maximization approach that has dominated the field for two generations, moves us closer to reality.
Following is a critique of the neoclassical approach, and the initial and perhaps still dominant approach of what might be called Behavioral Economics 1.0, within the context of behavioral ecology. A key proponent of behavioral ecology is Gerd Gigerenzer. I rely on his writings, including his book Rationality for Mortals, in much of the discussion below.

Assumption: We are Logicians

The seminal work on which behavioral economics 1.0 rests is that of Kahneman and Tversky. Using carefully posed questions, they plumb the ways people fail as rational beings, where rational means making decisions in a way consistent with the rules of logic. They find that the same question posed in different but logically equivalent ways leads to different results. They catalog these aberrations as demonstrating human tendencies toward heuristics, biases, frames, and other devices.
The notion here, which was then embraced by the first wave of behavioral economists, is that if nothing else, a rational human should act logically. The problem with this is that for humans logic cannot be considered apart for context, such as the usage and norms of language. For example, does anyone really think that when Mick Jagger sings “I can't get no satisfaction” he actually means he can get satisfaction? If you are parsing like a logician, that is what you think, because you are operating in the absence of context, namely how people use language. Language usage and the mode of conversation are among the clearest examples of how context and norms matter. If someone says “I'm not going to invite anyone but my friends and relatives,” does anyone really think that means he will only invite that subset of people who are both his friends and also his relatives? Again, that will be the takeaway for someone parsing like a logician. These two examples are simplistic, but if you look at the work used to establish the failure of logic and inconsistencies based on framing and the like, they are fairly illustrative.
The bedrock of much of behavioral economics assumes that we should follow the rules of logic, and when we don't, that it is suggestive of a behavioral bias or anomaly; the axioms are right, and we are flawed. The objective is to uncover those flaws. A classic example of the problems that come from this assumption is shown by this question posed by Kahneman and Tversky, and critiqued by Gigerenzer:
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student she was deeply concerned with issues of discrimination and social justice and also participated in anti-nuclear demonstrations.
Which of two alternatives is more probable:
A. Linda is a bank teller.
B. Linda is a bank teller and is active in the feminist movement.
The vast majority of U.S. college students who were given this question picked B, thus scoring an F for logical thinking. But consider the context. People are told in detail about Linda, and everything points to her being a feminist. In the real world, this provides the context for any follow up. We don't suddenly shift gears, going from normal discourse based on our day-to-day experience into the parsing of logic problems. Unless you are a logician or have Asperger's, the term “probable” is going to be taken as “given what I just described, what is your best guess of the sort of person Linda is”. Given the course of the question, the bank teller is extraneous information, and in the real world where we have a context to know what is extraneous, we filter that information out.
Demonstrating our failures to operate within the framework of formal logic is more a manifestation of logic not being reconciled to context than it is of people not being logical. Much of Kahneman and Tversky's work could just as well have been directed toward the failures of formal logic as a practical device than the failures of people to think with logical rationally.
Assumption: We are Mathematicians
In going up against the neoclassical paradigm, behavioral economics sets itself against mathematical structure. A mathematician entering the world of economics begins with a set of axioms. That is just the way mathematics works. And one of those axioms is that people think like mathematicians. In starting this way, they fail to consider how people actually think, much less how that thinking is intertwined with their environment and the context of their decisions.
The mathematical approach is to assume that, absent constraints on cognitive ability, people will solve the same sort of problem a mathematician will solve in decision making: one of optimization. Then, recognizing that people cannot always do so, they step back to concede that people will solve the optimization problem subject to constraints, such as limited time, information and computational power. Of course, if computational power is an issue, then moving into a constrained optimization is moving in the wrong direction, because the new problem may be even more difficult that the unconstrained one. But given the axioms, what else can you do?
It doesn't take much familiarity with humans – even human mathematicians – to realize we don't actually solve these complex, and often unsolvable, problems. So the optimization school moves into “as if” mode. “We don't know how people really think (and we don't care to know) but we will adjust our axioms to assume they act 'as if' they are optimizing. So if we solve the problem, we will understand the way people behave, even if we don't know how people's mental processes operate in generating their behavior.”
Behavioral economics 1.0 does not fully get away from the gravitational pull of this mathematical paradigm. Decision making is compared to the constrained optimization, but then the deviations are deemed to be anomalies. Perhaps this was a necessity at the time, given the dominance of the neoclassical paradigm. But academic politics aside, it might be better to ask if the axioms that would fit for a mathematician are wrong for reality. After all, I could start a new field of economics where I assert as an axiom that people make decisions based on astrology, and then enumerate the ways they deviate from the astrological solution. Of course, people will throw stones at such an axiom, but I do have evidence that there are people who operate this way, which is more, as far as I can tell, than the optimization school has.
Behavioral economics of the 2.0 variety, patterned after the context-laden methods of behavioral ecology, does not take mathematical optimization as its frame, so to speak. And the more it delves into how people actually think – work that naturally originated in psychology rather than economics – we find that people employ heuristics: rules of thumb that do not look at all like optimization.

Assumption: We are Probability Theorists

Behavioral economics recognizes that we operate in an uncertain world, and so assumes people not only act “as if” they optimize, but do so under uncertainty. Things then get really complicated, because we have not only added constraints but also made the problem stochastic.
Heuristics take a different approach to this problem; they overcome the uncertainty by applying coarse and robust rules. They do not try to capture all of the nuances of the possible states and their probabilities. They operate in a different way, unrelated to optimization. They use simple approaches that are robust to changes in states that might randomly occur.
This turns out to be better because it recognizes an important aspect of our environment that cannot be captured even in a model of constrained optimization under uncertainty: There are things that can happen which we cannot anticipate, much less assign a probability to. In such an environment, the best solution is one that is coarse. And, being coarse and robust leads to another anomaly for those who are looking through the optimization lens. In a robust and coarse rule, we will ignore some information, even if it is costless to employ. (This is a point of a paper I co-authored years ago in the Journal of Theoretical Biology, one that, like much of the argument in this post, has been embraced in behavioral ecology while passed over in behavioral economics).
Let's consider environmental context again to see why the apparently rational appeals based on the application of probability theory might be off the mark. At Caltech, Antonio Rangel is looking at how the brain lights up when various problems are posed to subjects. It turns out the problems related to large losses affect different parts of the brain than problems that seem, from a probability standpoint, to be nothing more than a reflection of problems that look at the potential for large gains. This might provide physiological evidence to support the irrationality observed by many in behavioral economists. Or it might be that it demonstrates these apparent biases were wired deep in our evolutionary past, and that they might be what is rational given that past.
Today it is not hard to envision a windfall gain that is similar in magnitude to a large loss. We can hit the lottery; we can build up wealth to last our lifetime. We can do that because of relatively new social and economic structures that allow us to save our wealth, and a legal structure backed up by a police force that gives us confidence that we and our possessions will be around long enough for us to enjoy them.
If we go back far enough, and not so far in terms of evolutionary time, the only good thing that could happen is capturing a large animal, or rebuffing the most recent tribal raids. Anything good was short-term and could easily be reversed. On the other hand, the negative tail was long and ominous. Even short of the not insubstantial risk of losing one's life or that of one's family (and with it one's future support), there was the risk of crippling injury, floods, and any number of other calamities. Include in these a gnawing realization that there were calamities that could not even be envisioned. In that world, it is not surprising that the brain circuitry would be wired differently for gains and losses. In that world, mapping gains and losses with any notion of symmetry is what would be irrational.
This use of robust and information-sparse heuristics again stems from context. We make our decisions in the context of our environment at the time, and our experience with how the world works. In that world, we have to ignore information because much of it is likely to be irrelevant.
Summary
Mathematical optimization can be correct in its purified world and we can be rational in our world, without optimization as the benchmark. It is a truism that if we inhabit a world that fully meets the assumptions of the mathematical problem, it is irrational to deviate from the solution of the mathematical optimization. So either we catalog our irrationality and biases, or we ask why the model is wrong. The invocations of information cost, limited computational ability, missing risk factors are all continually shaving off the edges of the square peg to jam it into the round hole. Maybe the issue is not that we are almost there, and with a little tweaking we can get the optimization approach to work. Rather, logical models may not be the right approach for studying and predicting human behavior.
It deserves repeating that the use of heuristics and the deliberate limits on the use of information as employed in the Gigerenzer worldview are not part of an attempt at optimization, real or “as if”. It is not a matter of starting with optimization and, in some way, determining how to achieve something close to the mathematically optimal solution. It is a different route toward decision making, one that, unfortunately for economists and mathematicians, is most likely the way people actually operate.
Logic, math and probability are all context independent. That is where their power lies; they will work as well on Mars as on Earth. But heuristics can take into account context and norms, an awareness of the environment, and our innate understanding that the world may shift in unanticipated way. As with many new paradigms, the new route to behavioral economics adds a critical part of the world that the old one ignored. Perhaps it was ignored for the same reason physics assumes a perfect vacuum. Or perhaps because the field became overrun with mathematicians, and as Kuhn has said, a new paradigm such as this will only successfully assert itself once the older generation dies off.