VideoPanel Session 3 Transcription: METACOGNITION



KAMILLA JÓHANNSDÓTTIR: Welcome to the Third Session of VideoPanels. The topic today is Metacognition, and I am very pleased to introduce our panelists. We have Aaron Sloman, an Honorary Professor of Artificial Intelligence and Cognitive Science at Birmingham. We have Ashok Goel, Associate Professor of Computer Science and Cognitive Science at Georgia Tech. We have Michael Anderson, Assistant Professor of Cognitive Science at Franklin and Marshall College, and he is also a Visiting Professor of the Institute for Advanced Computer Studies Program in Neuroscience and Cognitive Science at the University of Maryland. And Simon Levy, Associate Professor and the Head of the Computer Science Department at the Washington and Lee University. So, very briefly about the format of today’s panel. I will read three questions that will address various issues about today’s topic. Each panelist will then get a chance to address some of those issues, and panelists will also get a chance to respond to each other’s statements. Following this we will open the panel to the audience.

I will read the first question now: People talk a lot about metacognition in AI, yet there is little conclusive proof of its leverage in cognition and learning. How and why should cognitive architectures benefit from metacognition? The second one: What is metacognition in AI? Perhaps the biggest controversy here is in the understanding on this concept by different researchers. For example, what separates metacognitive architectures from merely cognitive architectures? Also, how is metacognition related to creativity, self-regulation, self-awareness, theory-of-mind, higher-order thoughts, complex emotions, social cognition, and finally, the ability to learn like a human? And the third one: Is it possible to have a universal metacognitive assistant that can improve performance of virtually any merely cognitive system connected to it? Or, is metacognition and its implementation always situation specific?

AARON SLOMAN: OK, well, I’ll start with 3 because that’s easiest. I think the answer is no, you can’t have a universal anything in intelligent systems. But I know that’s controversial. As far as 1 is concerned, “little conclusive proof” surprises me as a view, because anyone who has been a teacher or who has thoughts of the experiences of making discoveries about thinking or various kinds of human activities like philosophy develop must have lots of evidence that metacognition plays important roles, at least in humans. But if you mean in AI as just the machines that exist so far, I have nothing to say about that. I don’t know what machinery can do because I haven’t studied that. You want to give everyone a chance to respond to that?

KAMILLA JÓHANNSDÓTTIR: I guess we’ll just get a statement from everyone first and then respond.

AARON SLOMAN: ..As for what is metacognition. I agree that as with many distinctions that start off looking clear, if you start looking at many cases you find that you generally need lots of subdivisions and if you’re trying to make one big division it becomes arbitrary where you put them. I don’t think there’s a continuum of cases because I don’t think its possible to have a continuum of information processing systems. They differ in ways that involve discontinuous additions and that’s true also of biology that is ultimately just chemistry that is discontinuous. A project that I think will take us another hundred years it to understand all those distinctions that can be seen in evolution that may include transitions and we can see what the problems are and what the transitions do to help solve the problems and then we may find that rather one cognitive metacognitive distinction we may have a lot of subdivisions which are lot more interesting that just one big one.

ASHOK GOEL: It is a pleasure and honor to join all these panelists and my colleagues. Let me being with number one. I would not entirely agree that there is no conclusive proof of wanted metacognition. I’ll take an example from my own work. We do a lot of work on how metacognition can help an intelligent agent adapt its own knowledge and reason. Imagine there is an intelligent agent that you and I can design for assembling a bike from its components, so the agent generates plans for assembling a bike. Tomorrow, you give it a problem, different problem, you tell it to dissemble the bike, not to assemble it. Now we know that most AI systems are brittle and an agent designed to assemble a bike cannot by itself disassemble a bike. We’d have generated a new plan completely. Now if the agent has a model of its own reasoning, how it goes about assembling the bike and its know that disassembly is the reverse of assembling it can go about changing the way it reasons about assembly and disassembly and adapt its strategy for assembly into a strategy for disassembly. Now, does this work? yes it does. Is this an advantage over generative planning or reinforcement learning? yes there is. If the problem is very small, if we are dealing with a bike or any object with just two components, it cheaper to use reinforcement learning. If you have a more complex object, with 4, 5, 6 components it is cheaper to use generative planning. As the size of the problem increases, there is way to metacognition for self-verification has clear conditional correspondence. This is in empirical results. AI has shown and we are not the only ones working on the problem. AI has clearly shown that there are situations in which metacognition has conditional benefits. What is metacognition in AI? That is a very good question, but I am going to skip to number three and come back to number 2. (Audio cuts out). Going to number two for a second. Now that we have done some work on how metacognition enables self verification, we are very, very keen on trying to understand how these same kind of architectures can help humans do self regulated learning and interact with learning environments. So, assuming that theories on AI are grounds for metacognition and this an assumption, we don’t have an answer to that, I use it as a hypothesis to move forward. Then they should inform us on how we can help people do self–regulated learning. For example, what kind of models does an agent need to have to about people’s abilities so that the agent can guide people in self-regulated learning? What kind of models we should help people develop about their own abilities so that they might be able to self-regulated learning? The answer to the questions is that agents and people want to have functional models that capture goals. I’ll leave it here and we can come back to it later.

MICHAEL ANDERSON: Starting with one as well. Let me give you two examples from the human literature which is instructive I think. One is the philosophy of knowing. If I for example ask you what President Obama’s phone number is, you won’t waste any time retrieving that information because you know already that you don’t know what the answer is. Where if I ask what his address is many of you will know that, maybe come up with it immediately, but you will spend time searching your memory and eventually come up with 1600 Pennsylvania Avenue. So there is a layer in our memory system that allows us and we in fact use it to allocate search time. And the better you are at the meta level of understanding you’re memory, the more efficiently you allocate search time. Now that’s not something that’s yet implemented in computational systems, but I think its something that could be. The other example is what is called the judgment of learning. This is, suppose you are learning a new language or any other material it’s your ability to know when you have learned it and when you haven’t. Students do this all the time and the main finding is that the inaccuracy of your metacognitive judgment is more highly correlated with grade point average. This is not surprising. Because again, you will be allocating your study time in a very efficient way. This last thing is something that has been implemented, at least in my lab, in AI, where we have systems that know what they don’t know and can take directed actions to learn things that they think might be useful to them. So that’s separating them into two. There are a number of systems that we’ve built, based essentially on this notion that systems that monitor there own performance and notice when they are performing below where they would like to perform and take targeted actions to improve this performance do in fact show a performance enhancements over systems that don’t have that capacity. In the learning systems, the leverage comes from the ability to target learning, learning things that seem to be most relevant. As far as a universal metacognitive assistant, I agree with Aaron, if you emphasize universal then no, but one of things we’ve been doing in my lab over the past several years is using the very same metacognitive assistant, whatever you want to call it, in attaching it to very different systems and we do show improvements across reinforcement learners, natural language, human computer interfaces, autonomous robots and this is all with the exact same system that you sort of just plug in to different things. That’s not universal, but its certainly pretty flexible.

KAMILLA JÓHANNSDÓTTIR: Simon?

SIMON LEVY: Its difficult for me to add very much to the excellent points that have already been made. What I might add though is that I think for the term metacognition to be meaningful is has to be grounded in some sort of different approach to cognitive architectures than we’ve traditionally seen and any sort of addition you make to your existing architecture under the frame of a traditional symbolic computing language or traditional computer that uses pointers and things like that, there is nothing meta about it. I think that if you try to ground cognition in something that at least tries to address how it might actually be relatable to brain structures and neural architectures, I think it’s a little bit farther from answering a meta question if you like, that cuts across all three of us, these three questions. So that was the approach that Ross Galor and Rick Berger and I took in our paper last year for BICA 2010, which was actually besides Aaron’s work, specifically inspiring this cognitive issue. The reason I think I’m biased in that direction, I recently had a paper in a recent issue of, it was actually collected by Recursion and Human Languages, this was a hot topic three or four years ago and I think there was even a big New Yorker article on the controversial issue. Dan Everrett had made this claim that recursion isn’t universal and I think that debate suffered even more from this issue which is that you end up sort of, if you don’t have a new approach to the overall problem, you end up just debating terminology. I don’t see that going on here in this particular group, I especially impressed by the amount of empirical results that people have actually had, Ashok and Michael, but I’m sort of located from sort of an outsiders perspective I think its really important to make progress on this issue, at least philosophically and conceptually to take the kind approach where you at least address the issue of brain inspired in a meaningful way.

KAMILLA JÓHANNSDÓTTIR: Aaron would you like to respond to any of the statements that have been made.

AARON SLOMAN: OK, a few quick comments. One is that I agree that there are interesting things being done and we also have a project where we have agents that have become aware of gaps in what they know and limitations in their performance and adjust what they’re doing to correct that and I’m sure that will happen more and more for all sorts of systems not just AI. At MIT, several years ago was talking about how there is a need for the internet to have a knowledge plan which you can see as sort of a metacognitive addition to the internet so that it could detect when things are being attacked inside and now that’s sort of thing is happening more and more in software engineering with people thinking that there doing AI, but that’s what they’re doing. That’s one little point. Another is, I think that this is a very old theme in my email message I mentioned for instance a systems hacker program which did some very interesting things although did them very slowly and incompletely and some might for instance finding out that it didn’t solve a problem and then repeating the procedure it went through in what the system called careful mode, documenting the various things that happened and comparing them to what was suppose to happen and using that to propose changes to the program. And even more interestingly, detecting patterns that could have been used in a much earlier stage in the program duration to create what he called critics so they would detect features, syntactic features, parts of the program at the time its being constructed which seems to me exactly what most of us learn to do. A related point is that I think we need to distinguish metacognition and metacognitive capabilities from meta-semantic capabilities. I think the distinction between semantic and meta-semantic is the deeper one. You have semantic capabilities if you refer to things they can be physical objects or numbers, you have meta-semantic capabilities if you can refer to these things that refer to the relational referring and to the processes that control them. That raises lots of old philosophical problems, for instance how you deal with interferential capacity which I can explain if people aren’t familiar with that. And there’s been different approaches to that in AI. For instance, John McCarthy tries to deal with all the meta-semantics by having extra operators added to the logic that he uses and its good that people should explore that, but I don’t think that’s the right answer. I think you need architectural capabilities to deal with meta-semantics and meta-semantics competencies. Ways of using the same representations as before, but in encapsulated ways in an architecture, there use doesn’t have the same consequences. And I think that’s relevant whether you’re referring to your own cognition or somebody else’s. I think it was Michael or maybe it was someone else, talked about the need to understand what other people are doing as a means of metacognition and it needs meta-semantics in order to do this. Lastly, the point I thought I would make is that the developmental psychologist Anette Karmiloff-Smith has a book called Beyond Modularity which is probably published in 1992 which I only recently read even though I’ve known about it for a long time. Has anyone else read it?

MICHAEL ANDERSON: Yes, I have.

AARON SLOMAN: What I highly recommend is, even though the details are wrong and even if she doesn’t get the point of producing mechanisms, because I think it provides a number of indicators of what’s needed. She claims and I think she’s right, that there is a pattern of development that you can observe in human maturation and other animals in collection with different domains. First is that you see a process of increasing behavioral competence. People can avoid bumping into things, they can grasp things without them slipping or they can communicate in English or some language, but that tends to be followed by what she calls a process of representation or redescription , which I think has largely gone unnoticed except in connection with human language, where there seems to be a change from exemplar based linguistic production and understanding to syntax based or more generally rule based linguistic use and that transition produces much more tar because it can cover more cases than the exemplar based competences and in fact it covers cases that were previously learnt to be wrong and this phenomenon of
new safe learning where kids get things right and when they become and more generally get them wrong, they have to adjust their architectures to cope with counter examples, like don’t say he sitted on the chair, you say he sat on the chair and that requires a significant change in the architecture. Anyway, what I’m getting at, to make a long story short, there are several stages in inflating what has already been learned, reorganizing it and extending its power in various ways and perhaps combining it with other kinds of things and I think we don’t really know very much about these processes, they’re hardly visible in AI to my knowledge, development psychologists have hardly looked at them because they cannot mostly think about mechanisms and I think if we can get together and work on that, that would be a great achievement.

KAMILLA JÓHANNSDÓTTIR: OK Ashok.

ASHOK GOEL: I would like to respond to something that Michael said and Simon said. Michael was talking about a metacognitive assistant that can be used for a large number of tasks and domains. One of the things we have found is that one can have an adaptation of many kinds. One can have a proactive adaptation where an agent is given a new goal, one can have retrospective adaptation where an agent fails to achieve a goal, one can adapt in real time or a real long time span, one can adapt one’s own reasoning, one can adapt one’s own knowledge, so there are whole variations of what adaptations can be done and metacognition plays a role in all of them I think. One way of looking at this issue of universality versus domain specificness is to think in terms of task specificness rather than domain. So maybe there is a metacognitive set of strategies, adaptation strategies that work for proactive adaptation of domain knowledge. And another set of strategies that work for retrospective adaptation or reasoning, so that it is neither completely universal, nor domain specific, but depending on the functionality that the dictator requires one has a different set of reasoning operators. So that’s one way in which we are trying to carve it out. How many different architectures there are, I don’t know. If the number is just two, it is uninteresting. If the number is two hundred it is impossible. So hopefully the number is some small number like ten or twenty so that one can in fact start studying. In connection to a point that Simon made about how to look at biology and cognition very seriously, to use it to inspire research on metacognition I really like that because there is a flip side to it and I think we all look at metacognition from different perspectives. The flip side to that is to build AI queries and take them to cognition and biology sometimes. So I think there is sort of these two parallel paths that are supporting and I like that Simon is looking at the pathway from cognition and biology to AI and I’m looking at pathway from AI to biology and cognition. But they clearly are supporting each other. Thank you.

KAMILLA JÓHANNSDÓTTIR: Michael

MICHAEL ANDERSON: Yea, these are some very deep interesting issues of course. Ashok talked about that perhaps there is a kind of task specificity or task targetedness to metacognition is an interesting one. One of the ways we try to deal with this trade off between generality and specificity is one of the central components of metacognitive assistant is a series of baise nodes that essentially work under the assumption that there is only a limited number ways in which systems can fail. Maybe not five, but not a million either. So if there is only a limited number of ways in which th the system can fail, there is a limited number of ways in which they can be fixed. Then you ought to be able to take the symptoms of the failure and rerun it through a baise net if architecture of the choice isn’t so important and you get some sort of generality of diagnosis and once you have a diagnosis you can figure out or choose from amongst your options for fixing, whether that’s going back and doing things again or learning something new. And I think that’s right, with metacognition, I think Aaron said this, is involved in all of these things. You have to know what you don’t know or that you don’t know. You have to decide what to learn and most importantly in a way, and this something that a lot of AI learning systems don’t do well at all, you have know when to stop learning. These stopping rules are actually some of the harder things to get right here. And that’s a very important metacognitive judgment. The broader point is that Aaron raises is really important and its really hard to know how to get a handle on it, and kind of also is a nice place to start with this notion that we need to stop thinking in terms of encapsulating modules if we’re ever going to get at machine intelligence. A different way to put the same point, I think, is to say that AI is still largely thought in 19
th century engineering practices where you take a system and break it down into its component parts which are all separately isolated, you can implement them. We can understand why we try to do this especially when working with large groups of engineers trying to build a complicated system, but it doesn’t look very much like that’s the way the brain does it. The brain does it by putting together parts in lots of different combinations by changing the way they are put together, both development and evolutionary time and in real time and that suggests that there has to be not just a different way of thinking what the architecture of intelligent system looks like, but how we should go about designing that kind of an architecture. So its not just getting beyond the concept of modularity, its getting beyond the engineering practices that always have modularity as its output. That of course I think is a bigger point that just metacognition, except that it relates in the following way, which is that one the reasons its possible for systems to, as it were, self assemble in real time is because they’re consciously making metacognitive judgments even if tacit ones as to which parts need to go together in which order in order to get the right kinds of sets of behaviors out.

KAMILLA JÓHANNSDÓTTIR: Thank you, Simon?

SIMON LEVY: Its, as the conversation progresses it becomes more difficult for me to add something of substance. But let me just add one thing, that came out of Aaron’s comments which is, I think, perhaps, metacognition as a distinction falls under the more general category of something that I refer to or alluded to as rules and exceptions. I would sort of make a strong conjecture that its only when cognitive science and AI get beyond that distinction that true progress will be made. Not to belittle, that actual progress people have made in research that Michael and Ashok have worked on, I’m pretty impressed by that work, but in terms of having a philosophical framework, we are actually still stuck in the 19
th century. The tendency in linguistics to discard all the Chomsky and rule based universalism in favor of something called construction grammar, that’s kind of the big new alternative. And its promising in that it does sort of try to reject this rule based universals, but I always think there’s a danger of swinging back in the direction of pure statistical solutions and hacks frankly. Which are enable by much more faster computers with more memory and storage and much more data on the internet available for free in the form of text. And I feel like, the long, big picture over perhaps over centuries is this endless swing back and forth between those two extremes of rule based versus exception and I mean the early promise of connectionism was to get beyond that and I think connectionism got saddled with a bunch of sort of unfair associations that are really hard to get rid of twenty years later, but I seem some promise from newer forms of connectionism to overcome that. Specifically as I said, ones that are more realistically related to brain structures and things that are more biologically plausible or biologically inspired. In conclusion, again I think metacognition is just special case of what we consider as rules versus exceptions and if we can kind of get beyond that in some way then we can make some real progress and these kind of terms will simply dissolve in the way that prescientific terms in physics dissolved in the modern era.

KAMILLA JÓHANNSDÓTTIR: Thank you. I want to get a question from the audience.

SCOTT FAHLMAN: I found this discussion very interesting and we’re talking about a bunch of different things here. To me they feel qualitatively rather different. When I look at metacognition for example, I see problem solving, a plane called metacognition that is very important. You have some sort of conscious access to plans you form, you’re able to mentally simulate them or execute them. Either way you’re able to look back at the plan and say gee, I wish I had had the screwdriver closer to me before I started to screw thing together, so lets go back and insert that into any future plan. So you’re stepping back and examining plans. I think there is a different thing having to do with knowledge representation and I don’t remember who said, Michael or possibly Aaron, seemed to be indicating that any sort of higher order logic should be thought of as statements about statements, might be thought of as metacognition. And the same thing about statements in your mental store of what you know and don’t know. Finally, I think in communication I think there is a lot of other stuff going on, as I’m talking to you all I’m constantly thinking what does the audience know, what can I refer to and what do I have to explain. Your mostly computer scientists out there or psychologists who have talked to computer scientists, so I can probably get away with using a word like compiler without having to explain it. Constantly, we are massaging our models of who we’re talking to and making judgments and that feels to me like a very meta thing. Those three all feel very different to me and I don’t know if there is anything unifying there its just different ways of looking at it. Not a really question more of a comment, but I would be interested in any reactions.

KAMILLA JÓHANNSDÓTTIR: Would any of the panelists like to go ahead and respond?

AARON SLOMAN: I would like to go ahead and make a couple of comments. Aaron speaking. I think that something that hasn’t come out clearly is that we can make a fairly short distinction between systems which have at least two, possibly more than two, going on in parallel, where one of them is monitoring perhaps modulating, keeping records of what the other is doing, theres a lot of work in metacognition that doesn’t have that factor, rather it assumes just one CPU and it switches between different modes working on a problem and then thinking about how it should work on the problem and the book by Russell Effalt, that do the right thing is a good example of that. The whole assumption is that its all one CPU and you have to get all the tables right. I think biology took a different tact and discovered that its quite useful to have some dedicated subsystems looking at other dedicated subsystems and I think that’s also the assumption of Minsky and Society of Mind, the emotion machine. And I think that it doesn’t necessarily map onto our distinction between cognitive and metacognitive because you can find lots of computing examples that you wouldn’t necessarily want to describe that with. For example, early paging systems or perhaps all paging systems, where you have some piece of hardware monitoring what another is doing, detecting illegal access and taking the appropriate action. And that might be more efficient than having the one being observed constantly tested for doing the right thing, so its better to let something else test if its doing the right thing and if necessary deal with it. So there maybe lots of ideas from computation, from information systems engineering that give us much deeper and wider variety of kinds of concepts and distinctions. If you just focus on these issues from a logical or introspective viewpoint, I think that just happens anyway because people use things they learn. If people learn things in engineering, they will then apply them in other fields and if they learn in other fields they will apply them in engineering.

SCOTT FAHLMAN: Can I ask a point of clarification there?

AARON SLOMAN: Yes?

SCOTT FAHLMAN: So when you are taking about low level systems watching each other concurrently, is there any feedback or metacognitive system? I sort of want to reserve that term for sort of the higher level, more conscious level and I feel like to me there is only one, maybe one and half of those.

AARON SLOMAN: I started right at the beginning saying that I think there are lots of distinctions not just one and then if you a try to survey them all you’ll find there is no one big division between the cognitive and metacognitive, but you might want to say some simple cases are not what we’re talking about, like the simplest feedback, Watt governor for instance or something like that. And in other cases, as they start getting more complicated might just become slightly more convincing and you have lots of smaller steps and then you find you’re dealing with something very different from what you started with and I think that’s what evolution did. So, I don’t want to claim that there is a clear division which is just is as soon as you got one thing watching another then you got cognitive, I’m just saying that if you have everything running on one system then the issues are different. And if you have dedicated systems, some which monitor others, then you can do different things that way.

KAMILLA JÓHANNSDÓTTIR: Would anybody else like to comment?

MICHAEL ANDERSON: The point is really well taken. There are lots of relationships, metalinguistic, metaknowledge sort of kinds of control and these may or may not be similar except at some unfortunately too general, too high level away. A couple comments, some practical things. Mike Hopps a has a simple definition he uses for what constitutes something as metacognitive. I think he got it from John Dunlosky, a psychologist who works on these issues. So first of all you have to have a notion of what’s the cognitive component is, of course that’s a whole other debate to be had, but if have a cognitive component of some kind that shares state information with another component and if that component sends control information to the first then those things are in a metacognitive relationship. So its all about understanding the flow of state information and the flow of control and when you can identify components that have that relationship, for him, that’s his way of anchoring the definition of metacognition. And, I think that many of the examples that you raise are all exactly that sort. When you are monitoring yourself and vocabulary use based on the audience, your sharing state information with monitoring and then controlling your word choice as result of how that state information gets processed. So that’s sort of practical use. And even though it’s sort of a general definition. In my group we tend to talk more about self-monitoring and self-control then metacognition per say, partly in response to these kinds of concerns. What we’re mostly interested is capturing the value of self-monitoring and self-control both of course in human case, where we’re training learners to be better learners, but also in the technology case where you are trying to improve intelligence systems or have them self improve.

KAMILLA JÓHANNSDÓTTIR: I’m wondering if there is another question maybe from the audience.

ASHOK GOEL: Here is a comment, is okay if I share it? I think one thing that perhaps all of would agree with is that we are very, very early in our understanding of metacognition. Its been very useful to make the distinctions we are making, but I don’t know if these distinctions would necessarily stand the text of time. So I really like Simon’s and Michael’s and Aaron’s perspective because they are in some sense coming from a distance, so they have philosophy aspect to it. Here’s what I have in mind, we have recently started some work in autism and I was surprised to find that some of the new theories of autism have to do with metacognition. I did not go into this area thinking that metacognition would play an important role in autism. So for instance, we all know what a theory of mind, mind theory of autism, so a theory of mind recently says that most neurotypical people have a theory of other people’s minds, goals, desires, so on, and that some people with autism may not necessarily have a theory of mind of others. So that’s one theory, which is really, has a metacognitive framework because if I do not understand goals, beliefs, desires and I don’t have a representation for them then I cannot ascribe to you or myself. Another theory of autism, which has a really strong metacognitive framework, is that perhaps some people with autism are unable to access what they are doing, so that perhaps they do not have the same type of metacognitive abilities that neurotypical people do. This was a big surprise to me. I did not know any of these theories before two or three years back, so I think metacognition is going to play or be critical in many areas and I do not know how to bring it all together, how to use theories that are being proposed in one area to inform others. So, I am really excited to see people from different communities coming together like this and hopefully we can inform each other.

SCOTT FAHLMAN: Well it probably goes back to Freud. You’ve got the id, the ego, the super ego…

ASHOK GOEL: Absolutely.

SCOTT FAHLMAN: watching each other warily and trying to trip each other up in some cases.

SIMON LEVY: Minsky very readily acknowledged Freud especially in his public lectures as a strong influence in Society of Mind, he readily credits Freud with the origin of that idea in my experience.

KAMILLA JÓHANNSDÓTTIR: I’m wondering if…

AARON SLOMAN: Sorry, go ahead Kamilla.

KAMILLA JÓHANNSDÓTTIR: Alexei, I was wondering how we were doing in terms of time.

ALEXEI SAMSONOVICH: We are fine, I have plenty of disk space available.

AARON SLOMAN: In that case, this Aaron, if I could go back to the statistical versus rule based issue, I think that there is something going on there that is quite interesting and I think I might be repeating what was said before, but in a slightly different way. There were, well you probably all the heard slogan, if you want to make progress in natural language, fire all your linguistics and hire statisticians. What’s interesting is that up to a point that works and I actually think that that reflects something that goes on in our brains too, I personally have never believed all statistics, but I do believe that we do use what we’ve learnt about what works and store it for future use instead of having to constantly recomputed. So there will be a growing memory in any language users of the special cases that are generated by there more general rule based language understanding capability which enables them to cope with novelty, come up with creative sentences and understand them. I once heard a child say something you’ve probably never heard, but you’ll instantly understand it. He said today will be much more hotter that it usually bes and that has perfectly good sense, but it violates a lot of exceptions to English although it fits in with a lot of the rules of English. I think what happens is that the corpus based systems are able to use the product of languages to build up a kind of memory that humans can also build up, but humans languages have something else that enable them to deal with much more complex novel constructions that just can’t be coped with by statistical systems, which, as far as I understand it, can only deal with what’s in the envelope, defined by the date it is entered with, but we don’t have those kind of restrictions, so I think we have to go to a hybrid system and I think that’s a very general point, not just about language, but it can be brought in to (Audio Incoherent) point that we acquire behavioral domain then you can get a view of it by going through representational redescription which means you have something like a rule based system which generates small cases and I think that actually the origin of human mathematical competencies, but that’s another long story.

SIMON LEVY: If I could comment, this is Simon, briefly comment on that. With regard to hybrid systems, probably if you’re interested in doing good AI, hybrid systems are certainly a way to go or at least a way to go. I think that most modern examples of a hybrid system is something like a baesian system or probabilistic generative grammar if you want to go back to language, which I look at as taking generative grammar rules and just assigning probabilities to them, but I think if you want to get a deeper understanding to the way cognition works, independent of design artifacts, you need to move away from hybrid systems and because I think you want something where both the rule based and the exceptional and the statistical nature of the system emerges from the data to which it is exposed. And that’s the kind of thing again that people like Crystal Liasmith at Waterloo and to a lesser extent Ross Galer and I when we have the time to work on these things, that’s sort of our big push, to work on a system that can be better readily modeled by posterior probabilities, where there is no actual encoding of those things in the system. And again, connectionism, I feel like that’s always been the big promise of connectionism has been to avoid that sort of hybridizing.

AARON SLOMAN: Can I give you a counter example?

SIMON LEVY: OK.

AARON SLOMAN: I wonder how many people know about Nicaragua and deaf children. For anyone who doesn’t, I’ll just summarize real briefly. Its well known that if deaf children are not taught a sign language that they tend to be seriously cognitively held back, if they’re expected to lip read for instance. The Nicaraguan government, some years ago, decided to provide the deaf children in their country with sign language teaching and there weren’t enough teachers to go around so they brought the children to the teachers. So there was a community of deaf children and a small number of teachers who were trying to teach them sign language. And then something happened that nobody had predicted, kids started signing to each other very rapidly and obviously communicating and doing things that the teachers couldn’t understand. They brought in anthropologists and linguists from other places who worked out what had happened was that the kids had developed their own language which went far beyond what the teacher had because the teacher didn’t have benefit of growing up from a very young age with signing. Now that seems to me to suggest, what a normal view of language learning as being based on deriving information from the data as entirely wrong. Instead it’s a creative problem solving process trying to find ways to communicate. It just so happens that most of the observers, most young language learners, they’re in a tiny minority so there problem solving is guided by solutions that people have already developed. But if there aren’t those solutions to guide them, they will create their own and that means that what they’re doing cannot be construed as finding patterns in data and being driven by that, if that’s what you were suggesting.

SIMON LEVY: No, I think that’s a wonderful example and I think I probably misstated my case. Just to backtrack a little bit, I think there is a large community of people who have predicted that, that would be the Chomsky, that’s exactly what they would have predicted. That’s another example of the poverty of the stimulus argument. So I don’t want to come as across as being someone who argues in favor of that, the idea that everything needed is in the data. I mean that’s kind of the modern approach, the guru approach. Peter Norvik has a piece on this recently, I can’t remember where, maybe it came out of the TED conference. My fear is that the pendulum is swinging to far in that direction, the direction of deriving everything from the data. I don’t want to present myself as arguing for the purely statistical view, but what I also want to argue against is the pure philosophical standpoint, is the idea of a hybrid system shedding light on it. And I think for building engineering artifacts it’s the right way to go, but I think you need a system that has some sort of built in, learning through biases. The system is biased towards giving you something like the Nicaraguan sign language. But that will also take seriously into account the nature of what’s in the data and what can be derived from that.

AARON SLOMAN: Yes, I agree that there is both going on. I think John McCarthy summed it up quite nicely in his little essay, the world designed by a child. Which is a statement I probably can’t quote exactly. John McCarthy’s statement goes something like, evolution solved a different problem then making a baby that knows nothing about the environment. I was talking to biologists about what kind of trade offs there might be between various kinds prejudices, various kinds of meta assumptions and how they can drive the process of exploring the environment to see what’s there and this links up with old philosophical problems, like Immanuel Kant’s view that you have some et priori concepts and knowledge in order to be able to learn anything in from experience. How exactly that works out in detail, he wasn’t able to say, but I think (adio incoherent).

MICHAEL ANDERSON: It might be worth adding, whether there is really a conflict between these views it depends on a lot on understanding what the limitations of our learning systems actually are. And I think they’re a lot better than most people think they are. We’re incredibly sensitive to contingencies in our environment and it doesn’t take us long to notice these contingencies. So, I’m not sure there is really a conflict here between the deaf children learning from each other as it were, language development or concept with one another and that statistical learning is at the core of our language competence. I think that’s the environment they were in and they were all responding to the changes in the contingencies in the environment which, of course, is not to say that everything is in the data because you have something to extract the data with. So I think these are kind of false dichotomies in a way. Charles
Garlstone in particular is a nice place to look for some good information about the kinds of learning systems we have that we share with other animals for instance. So, for what its worth.

AARON SLOMAN: Just a tiny observation there. I think what I understand of those deaf children it was very important that they created and experimented with and selected novel ways of doing things. That nobody else had done. And they weren’t learning from something that was already there, they were creating what they learned. Although you’re right to say they learned from each other there’s something much deeper going on as well, which is that they are creating new solutions.

MICHAEL ANDERSON: Of course you’re right, I just wanted to point out that are sensitivity to environmental contingencies is extremely sensitive to that. We’re very quick. It doesn’t take long to learn new contingencies. I just don’t think there was a big a conflict as there might appear on the surface.

ASHOK GOEL: So, if I could connect this to one of the questions Kamilla raised earlier. How does metacognition connect with creativity? I think to pull back to the earlier discussion it seems to me that metacognition is one of the fundamental processes of creativity. When we are dealing with novelty, one of the difficulties in dealing with novelty one of the major cornerstones of human cognition is we must always begin with either what we know or what is available in the world, there is nothing else, no other place to begin with. And yet, every novel situation is new, so how do people deal with novelty, if we must always begin with only what we already know. So metacognition plays a fundamentally important process because it tells what we don’t know, what we know, how we can acquire information from the world around us to fill in what we don’t know, so that we can, in fact, create the kind of solutions that deal with novel situations. This, perhaps, is not the only role of metacognition, but it’s one role.

SIMON LEVY: I think it might also be worth point out that there are two different kinds of AI, in a sense. There’s the Minsky AI, and Minsky very specifically rejects appeals to the ways things actually work in biology in a way because he interested, basically thinks that people are irrational and stupid and that we can build artifacts that avoid that kind of thing. One imagines that those kind of artifacts would have much more metacognition and self-reflection than ordinary people. One of things as a linguist that I quickly realized, in fact in the graduate program I was in at Connecticut, my roommate was in the program as well and his fiancé visited the lab and said if I hadn’t visited this lab I wouldn’t have thought that anybody cared about these questions or asked them. So I think that we may also have an exaggerated sense of how much ordinary people, at least with respect to language, which is always my bias, reflect about their own thinking and language. Most people, if you were to point out today the way that they just produced a certain utterance, the order of the words or something they would look at you with complete bafflement. There is not a lot of self-reflection, that I’m aware of, that goes on in ordinary language users, as opposed to very highly educated linguists and artificial intelligence researchers, who by the nature of their work have to reflect on these things. (Audio incoherent) I think one’s experience, especially if you field work, is astonishment over the lack of reflection that people have about the language they speak.

MICHAEL ANDERSON: Explicit reaction…

SIMON LEVY: Right, right…

MICHAEL ANDERSON: But there’s a lot of implicit. I did a corpus study of the British National Corpus some years ago and that it turns out that about 10 or 12 percent of all utterances are metacognitive in character, metalinguistic in character. People are constantly talking about their own language, they’re adjusting in light of that and these things tend to happen at transitions in the conversation, so people, well the British National Corpus is just ordinary people talking to each other, or at least largely, and so there is a lot of actual metalinguistic usage, metalinguistic activity in ordinary language users, but you’re right that they don’t notice it, its nevertheless there and plays an extremely important role in managing conversations, so its sort of interesting.

SCOTT FAHLMAN: There’s an old joke in AI that each of loses the capability to skillfully do what we study. People who work on manipulation are always stopping in the middle of dinner looking at their hands. Linguists are always stopping mid-sentence because they just did something interesting and I think its important for the progress of the field to bubble up and to be looked at in a metacognitive way rather than just a highly compiled way. It worries me a little bit because I study common sense, I hope that doesn’t’ completely go by the way side. So I think sometimes, introspection is always suspect, but I think we can sometimes really tell the shift of just doing something to the metacognitive role.

ASHOK GOEL: There’s the opposite joke in AI, that we all end up studying what we are not really good at, so because I am not very good at metacognition I study metacognition and so forth. In that sense, the robotist often make this joke because they are not very good athletes because they build systems with good motor perception and coordination. But I hope we study metacognition because we like metacognition.


AARON SLOMAN: Can I make a little distinction between episodic metacognition and something else which I don’t have a good label for. I think people are really good at noticing what is happening at that time like someone interrupts, or someone seems not to understand or is repeating himself or is looking bored. But not at all good at noticing deep irregularities in their own language. For example, a psychologist, I think his name was Roger Brown, used to use in his lectures was tag questions, if you raise a question about whether someone for instance has arrived or not your saying, you might make an assertion and then question it at the with a tag question. You’d say, Johnny’s come, hasn’t he? Or if you think he hasn’t arrived, you’d say Johnny hasn’t come, has he? In other words, you negate the assertion in the tag question, now, I doubt anyone other than professionals notice that and yet people follow that kind of rule, except occasionally you get a special emphasis by repeating the same modality in the tag question. You going do that, are you? Says some expressing great doubt about the claim that’s being made. Anyway, whatever it is there are different kinds of things that are done with tag questions, whether I’ve got the rules right is neither here nor there, the main is that most point learn those things and use them and never notice, where as they may notice particular episodes and be able to adjust what they notice from that point of view. And I think that this is connected with Karmiloff-Smith's point about going beyond behavioral mastery and reorganizing your knowledge that when that happens, the learner unless, relatively sophisticated, doesn’t notice it, it happens automatically and that uses deep mechanisms that presumably are a product of evolution that we need to understand.

SIMON LEVY: Just quickly, based on the comments of Michael and Aaron and Ashok as well, I guess what I am arguing is that we want to make a distinction between metacognition and conscious manipulation our self-reflection. Especially in sort of a lot of the philosophy of mind and even neuroscience have been overtaken, even people’s individual careers have been overcome by this issue of studying consciousness. And I think, especially with regards to what Michael about the British Corpus, if I understood it correctly there’s a lot of metacognition at the level of symbol manipulation, but sort of conscious awareness of the sort linguist and philosophers and cognitive scientists make probably don’t play a big role in this.

ASHOK GOEL: I agree with that.

KAMILLA JÓHANNSDÓTTIR: I am wondering Alexei if we should move to closing statements.

ALEXEI SAMSONOVICH: How do you feel?

KAMILLA JÓHANNSDÓTTIR: How do the panelists feel? I’m now going to ask each of you to make a brief closing statement, starting with you Aaron.

AARON SLOMAN: Brief closing statement, well, I think the various kind of things that we might call metacognition or meta-semantics are very important we still have a lot to learn about them and that we need to bring together the approach of designers of working systems, mainly people in AI and engineering and the people who study natural systems, both from a human development, performance and in other animals because there is lots of variance going on in things in the study of non human animals that are quite surprising in a way. And we might get a better idea of what the space of design options is if we look at the space of possibilities already realized in biology.

ASHOK GOEL: Thank you Kamilla and what pleasure it has been to talk with Aaron, Michael and Simon and Scott. I hope to see you at some conference. Briefly, to me, creativity is a fundamentally important characteristic of intelligence and one of the criticisms of AI for a very long time is that AI has not really shown creativity. Going further, I think metacognition is a fundamental process of creativity; I really am excited that we all are looking at metacognition because I really think that it needs to be looked at very carefully if we are ever to going competent creativity. And if we are ever going to understand what human creativity is, why can humans understand stories and crack jokes and machines can not and I think metacognition has something to say about it. Thank you.

KAMILLA JÓHANNSDÓTTIR: Michael?

MICHAEL ANDERSON: Yea, sure. I think that the conversation is a bit too wide ranging to offer up a closing summary which is, but I do want to say how much I enjoyed speaking with everyone and this is clearly an interesting ongoing topic and I think we’ll see a lot more on this in the future or at least I hope so.

KAMILLA JÓHANNSDÓTTIR: Thank you, Simon?

SIMON LEVY: Yea, I would also like to thank Alexei and Kamilla for giving me the opportunity to talk to, even remotely, people’s whose work I’ve been reading for quite some time now. I’m very happy to have been involved in this. Maybe I’ll just close with paraphrasing something that I think Dennis
Gobor said, the man who invented Gobor transforms, wavelets. That we need to understand, to really understand the cognitive system we need to understand its limitations, so if we can see how metacognition, what are the limitations as its scales up, how much can we know about how much we about how much someone else knows about what we know, that sort of thing. That will take us farther than the approach of trying to make a system that does gets the most out of the data it is given and that’s kind of the direction I’m trying to take in some preliminary work with Ross Galer looking at what are limitations on the depth of embedding our own self-awareness, modeling somebody else’s awareness. If we only try to do what AI has done, historically we have to deal with solving bigger and harder problems and I don’t think we’re going to make much progress in that direction.

KAMILLA JÓHANNSDÓTTIR: Thank you. I would like to thank you all for participating. Its been very interesting and I’ll now turn it over to Alexei.

ALEXEI SAMSONOVICH: Maybe short comment from the audience.

SCOTT FAHLMAN: Well, on behalf of the audience I’d like to thank the organizers and the panelists. Its been fun and very interesting, lots of different perspectives here.

KAMILLA JÓHANNSDÓTTIR: Bye, hopefully see everyone soon in another panel, hopefully we can continue the discussion.

ALEXEI SAMSONOVICH: I would also like to make a metacognitive comment. I think todays panel was a great success and I’m very happy with it and I would like to thank everyone for their participation and to wish everyone a happy Father’s Day and see you soon with other panels.


Recorded by Michael Kalish





Advanced Web Counter