Evolution of AGI

March 18th, 2010

Watching David Suzuki tonight, i had a thought. It’s unusual in the sense that it seems simple enough, but i don’t recall having heard it from any other source. What if AGI is evolving? Current attempts at AI are narrowly focused, finding relatively interesting ways to solve local problems. But as those solutions become ubiquitous, more complexity will be required to solve larger and larger problems. This, in a sense, is what AI has been doing.

I don’t recommend David Suzuki’s recent shows. Originally a fruit fly specialist become science champion for the masses, he has evolved (no pun intended) into a pontificating self-righteous prick who is never at a loss to criticize anyone who doesn’t share his eco-philosophies. (Witness his recent idea of jailing politicians who resist GHG reduction policies, for which his university student audience gave him a standing ovation.) The Nature of Things was once a fascinating show where you could be dazzled with the latest biological, geological, *-ological findings. It’s still that – and the nature footage has only become more astounding, i must admit – but the formula of every show is now a double facepalm of hook-em, reel-em-in, and pound their faces in – using his trademark lulling, gentle intonations – with how the human race is raping Mother Gaia.

Anyway, what can i say, i was inspired. Not so much by David or the show, but by how life has evolved to thrive near a 200 degree sulphur-belching sea floor volcano. I won’t get into the details of the show (go and watch, it if you can stand the preaching). But over the years i’ve developed an annoying habit of thinking about everything i perceive as what-does-this-tell-me-about-AGI.

Orange Roughy can live over 100 years. Longer than most humans. But just from the one hour show i can pretty much conclude that they’re about as intelligent as any other fish. (No, they don’t live near the sulphur volcanoes… forget about that for now.) But as the show pointed out, life in the deep sea is pretty stable. One day is a lot like the next. That’s assuming they even recognize that such a thing as a day exists, because if they don’t, one 3 seconds is much like the next 3 seconds. In an environment like that, for what would you need intelligence? Here on the surface we’re confronted with extreme light, temperature, and humidity variations. Maybe our 2 dimensional existence makes thing easier, but then maybe it makes things harder?

But back to the point. Watching the little crabs and squid and see-through shrimp i became amazed at how they had adapted so perfectly to their specific environments. Kind of like narrow AI. Max Harms wrote a narrow AI-style script that demolished every bio-inspired script to date in the GoiD donut task, but that’s the way it is: narrowly focused approaches will be faster and more efficient than AGI. But the same AGI will survive in more contexts. Thus, what we need is a problem with many unrelated contexts for which only a single solution will do. Then we might finally start down the AGI road.

The AGI maze

February 19th, 2010

Jeff Hawkins, in his book On Intelligence, likened the study of AGI to a massive jigsaw puzzle. I may be making up some of this, but imagine a billion piece puzzle where the the pieces are black on both sides and there is no clear border. Furthermore, you only start with a few hundred pieces, and every month a few more pieces – some of which are duplicates and others are for completely different puzzles – are mailed to you. Something like that. It’s a good analogy.

But, like a definition of intelligence, i have to contribute my own. Imagine a maze like the ones on kids’ place mats in family restaurants. You need to show the monkey the way through the maze to get to the french fries on the other side. If you’re like me, the first line you drew was around the outside of the maze. I mean, sheesh, if you’re that hungry, why bother going through?

But, after you ate your fries and decided to solve the maze, the easiest way to do it is not to start at the beginning and find the end, or vice versa. The easiest way is to start a both the beginning and the end, and use the information about where each leads to link them up.

Now imagine a maze like that the size of a province (or state – a big one – for those unfamiliar with the term). There are only a few ways in/out, and all you get to decide on is where exactly you are going to be helicopter-dropped into the middle of it. You can choose memory (somewhere central), speech (north west), perception (north), vision (south), or cognition (the location of which is a secret, jealously guarded). No one knows where the ways out are. There are plenty of paths that quickly lead nowhere, and most of the rest lead nowhere after a lengthy exploration. In 60 years, no one has found a way out.

My thinking – and i’m not alone, it should be noted – is to find a way in, a way out, and try to connect the two. Returning to reality, the way in, in my opinion, is sensory data. If you can inspect the signals that the brain receives (including the spinal cord or not, but preferably including), you can begin to understand what different constructs in the brain are trying to do.

The way out is two-fold. Obviously there is movement, the physical manifestation of brain output. But there is also unprovoked thought, such as imagination. It may be debatable, but i think it’s clear that the simpler an animal is, the less unprovoked thought there is going on. So, for a basic understanding of what the brain is doing, which i contend is what we all desperately need, we should first focus on movement.

I’d like to write a longer post, but paying work beckons, and so i’ll need to leave this for now. Stay tuned for more!

RALA and The Minsky Test

February 9th, 2010

I’d like to be able to say that some household names in the AI field are finally coming around to my way of thinking, but the ridicule would probably be overwhelming. If there’s one thing i’ve learned to expect from talking with other people working on this stuff, it’s that they will tolerate about 3 minutes of my theories before they launch into a lengthy description of everything that is wrong with them, and sum up by suggesting i read all about their theories. Naturally, i respond in kind.

So it was with excitement that i read about Peter Voss’s approach, since i felt that they coincided pretty strongly with my own. I was disappointed when he didn’t immediately offer me a signing bonus and an exotic car to go work with him, but maybe he needs to save up for that. You never know what will happen.

My latest source of excitement is this article. I haven’t talked much about the approach of combining huge quantities of tiny processing agents in ways that produce interesting emergent behaviour, mostly because i would never do it that way. It’s just way too hard; ask the Artificial Life and Neural Net people. Even trying to genetically evolve such things is about a billion times harder than you might think, as this book will tell you. But inherent in the RALA ideas is situation in time, which is something i certainly have talked about. Of course, RALA at the time of writing is all vapour-ware, but i might find some time to implement a simulation of it in a traditional computer to explore its potential. That would be right after i create a new AGI programming language, 4 more GoID tasks, and run for Prime Minister of Canada. But, hey, it’s on the list.

Also interesting in the article is the Minsky Test, a desperately needed replacement for the Turing Test. Basically, the Minsky Test says that if an AGI implementation can read a childrens’ book (assumedly an arbitrary one) and then somehow summarize what happened (obviously, the details need to be worked out a bit), then the implementation is likely at least approaching intelligence. Hopefully Marvin can fill in a few more blanks.

Anticipation

February 1st, 2010

Ok, i’m back. Between reviewing the Ben Goertzel / Cassio Pennachin edited book Artificial General Intelligence, and learning Objective C to write an iPhone app for a client, things have been pretty busy. (Although i must admit that playing the remarkably original games for my new iPod has munched through quite a few hours too, but let’s call that “research”.)

My favourite new topic of fascination these days has been anticipation. In fact, this one goes back over half a year now, but it hasn’t been until lately that i’ve been able to give it the attention i’ve wanted. Last summer i was working on a robotics control system that became the precursor to the Robotic Arm task in GoID. (The intention was that the control system would become a commercial project, but alas, well… you know how it goes.) I was reading about motor systems in Eric Kandel et al’s Principles of Neural Science (4th ed), and got caught up in the diagram on page 656, captioned, “Catching a ball requires feed-forward and feed-back controls”. My head was into how the non-linear systems of shoulder and elbow joints can still produce linear movement of the hand, but i was for a moment completely distracted with how the subject in the diagram learned to anticipate the impact of the ball against her hand. The system i was building was entirely reaction-based: it would assess current conditions – mostly proprioception – and determine the correct output to achieve the known goal. The idea of looking ahead based upon past experience – although i wasn’t so blind as to never had considered it before – made my motor systems approach seem trite. This is why the system ended up as a GoID task instead of a booming business.

I was inspired to write this post today after watching my almost two year old daughter turn the pages of a book. (To anyone interested in AGI, i sincerely declare that there is nothing more educational in the field than raising children.) Only months ago she would flip the page of a cardboard-leafed book only to squeeze her other hand between that page and the remainder of the book. It would take a moment for her to realize the situation and surmise that she needed to get her other hand out of the way. Tonight though, she would initiate the turn of the page and then at precisely the last opportunity let go of the book with her other hand to allow the page to turn completely, and finally grasp the book again. There was no concentration in her eyes besides intently studying what was on the pages. The movement of her hand had successfully transitioned from consciously reasoned reaction to subconscious anticipation. And precise! Out of maybe 20 page turns her hand brushed a page once. Moreover, i’m certain that that single brush resulted in some manner of learning that will make her movements more accurate in the future.

I had already watched some of the videos of robots being produced at places like Willow Garage. Having had big hopes when i started down the robotics road, it was at once depressing and inspiring to see how much others had already achieved. I recall one (although i can’t be certain that it was a Willow Garage project anymore, and for once i’m too apathetic to get Google to bail me out) where the researcher tossed a ball to the robot, and i think once out of 3 or 4 tries the robot caught it. Now, make no mistake, anything even close to a catch is remarkable; actually making a catch is grounds for a prolonged golf clap. But what i couldn’t get over was the clumsiness of the machine. The action of moving it’s arm to put its hand into the trajectory of the ball caused the rest of the robot to convulse so much i thought it would bust a rivet for sure. The only reason it remained standing was because it was held up by suspension wires.

I assumed that this had not gotten past the researchers unnoticed, and so i gave a few minutes to thinking about how to fix it. The answer was anticipation, but the implementation was not nearly as simple. Consider yourself in a snowy field in the middle of a snowball fight. You glace to your right and see an icy yellowish orb headed for your frontal lobe. (For some reason i couldn’t help throwing in a completely gratuitous scene. Oh well. Carrying on…) Let’s say that your response is to block the projectile with your hand, if only because it’s a more interesting anticipation problem than ducking. Interestingly, you’re first movement is not the activation of your triceps to extend your arm. Assuming that your arm’s center of balance was originally somewhere in front of your body, your first movement is rather to twist your hips to the left in order to compensate for the counter force that moving your arm to the right will generate. (You will likely also activate your deltoids to raise your arm, but since this primarily generates a downward force that is absorbed by your skeletal structure, it’s not quite as interesting.) But, twisting your hips itself generates an unbalancing force, so a split second prior to that muscles in your legs activated to counter it. There is a pattern here of counter-actions preceding activation forces that likely has many steps. Remember, too, that stopping your arm at the proper place implies undoing all of the momentum that it created. Upon spying the incoming danger your brain started calculating the exact muscle forces to apply and the precise timings of each application in order for your arm to effectively deflect the thread, but without thwarting your balance. Actually, your brain probably didn’t calculate much. It’s probably done more of a look-up based upon the decades of motion experience with your particular body configuration that it has accumulated. In fact, it probably did some kind of a look-up on the size and shape of that yellowish orb too in order to know how much to unbalance the body toward its trajectory to absorb the impact, however slight. Doing the calculations from scratch would take way too long.

Once again, we see that time is critical to the entire endeavour, not just in saving your frontal lobe, but in coordinating the various muscle activations. It’s true that this could be a case of “fancy programming”, but what is undeniable – i believe – is that the whole behaviour was learned. Two year olds don’t know how to deflect an incoming projectile (as i now intimately know), but 12 year olds do (as i affectionately recall). So, how does this anticipation come about?

It’s difficult to know where to start the study. I would define anticipation as a reaction to a predicted event with the intention of maximizing the outcome in the favour of the agent.Fair enough, you might say, but, 1) how is the event recognized, 2) upon what contextual data is it predicted, and 3) most profoundly, how does it choose an action (which as we see from the example above can only be considered distinct in a high abstracted way)? The abilities that must already exist for something like anticipation to work are extensive. One is forced to assume that, for it work in a biological context it must be simplistic in its base form and achieve complexity in a relational way, say hierarchically.

So that is what i’m going with, at least for the moment. And, as any rational researcher would do, instead of building spatial-temporal pattern recognition and classification, relational memory, and motor action optimization systems, i’m going to assume they all exist in a sufficient way such that anticipation can work.

Multidimensional Lotka-Volterra models

December 29th, 2009

I think it was grade 11 math when i first saw the predator-prey model. I recall that the teacher had called it the “Richardson” equation, which i had associated at the time with the last name of a friend. But when i look it up now, Lord Google insists that it is in fact called the Lotka-Volterra equation, and i’m inclined to concede.

I’ve always had a particular fascination with systems of non-linear equations. These are particularly useful for modeling in Artificial Life projects. But since they are not very practical in the development of AGI, i have to restrain myself from toying with them too much.

About a decade ago, though, i indulged. I remembered that math class, but not the details of the equations. Back then search engines did not so effortless correct my mistakes as they do now, and so my searches for “Richardson” kept coming up dry. Overall, it didn’t bother me too much to try and recreate the concepts on my own; i have a long history of working out mathematics from first principles rather than memorize equations, even during university exams. And it gave me the opportunity to expand the original ideas a bit: instead of just predator-prey, i used predator-prey-plant. Plants would grow at some exponential rate, prey would eat the plants and reproduce, and predators would eat the prey. As in the original, expansion of the “eaters” is constrained by the availability of food, so a cyclical relationship between the species would emerge.

That was the thinking anyway. Something was wrong with my version. I no longer have the original work, but i remember doing it all pretty much in a spreadsheet. The population of each species was simply represented by a number, and the numbers were repeatedly plugged into the equations. In my version no balance was ever achieved, even though i tweaked the constants and coefficients endlessly. Sometimes the numbers would fly off to infinity, sometimes they would dwindle to zero, and other times they would become cyclic but only by regularly dipping into negative values, which was unsightly.

I played with it for an hour or so, and never did go back and correct it. Instead, i got caught up in an idea. What if the problem was not necessarily just the equations? Put another way, could the equations be made to work in a different environment?

Up until that point the equations were effectively zero-dimensional: there was a single integer that represented each species. I wondered what would happen if i expanded the environment to be two-dimensional. The plants would still grow according to the x(a – by) equation, but i would also introduce a process by which some of the growth within a cell would diffuse to neighbouring cells. The predators and prey would similarly “roam” the flat world. (The “world” was represented by a three-dimensional array: two dimensions were the spatial coordinates, and the third was the populations of the species at that cell. A continuous-space model was not attempted.)

Spreadsheets would no longer cut it, especially due to the complexity of the diffusion. Also, i couldn’t visualize the results that way. So i ended up creating a little Java program to experiment with, recreated here as an Applet. The plants are represented as the colour green, prey as blue, and predators as red. Remember that this code uses only a spatial variant of equations that failed utterly in a non-dimensional setting. Now, a distinct and interesting pattern emerges that quickly becomes stable.

I remember this little experiment when i do tests of AGI concepts. The human brain is complex at a galactic level, and so it is necessary to simplify concepts and test them in a reductionist setting. For years and years these tests have succeeded in the limited setting, but have quickly and absolutely failed when tried in any more interesting environment. The point is that we don’t always need to throw the baby out with the bath water. The idea and the approach may not be wrong.

This isn’t to say that Deep Blue would make a good AGI model if only it were refactored into a two-dimensional environment. It wouldn’t. But there are likely a lot of decent AGI approaches that would benefit hugely from the introduction of space and time into their models. Otherwise, we will end up with the problem that i had, but in reverse. We’ll imagine a world in which predators, prey, and plants are wandering Flatland, but then in an effort to simplify we’ll reduce it to zero-dimensions and watch it fail.

So, what is intelligence anyway?

December 22nd, 2009

Each AGI project that has ever existed seems to have (or have had) its own definition of what intelligence exactly is, although some are more exact than others. Some are similar, some are starkly original, many fall into conceptual groups. And even when researchers write approvingly of another project’s definition, they still deign to conjure one of their own. Of course, i’m no exception. And why not? The system i’m writing is different. Why shouldn’t my working definition be different too?

But let’s look at some existing definitions:

Intelligence is the capacity of a system to adapt to its environment while operating with insufficient knowledge and resources. – Pei Wang

General Intelligence is the ability to achieve complex goals in complex environments. – Ben Goertzel

Intelligence deals with all the things which should be known in advance of initiating a course of action. – The Clark Task Force of the Hoover Commission

Great. What are we waiting for? Let’s build it.

Oh, but… we just just took the amorphous word “intelligence” and “defined” it using a bunch of equally amorphous terms: environment, insufficient, knowledge, goals, complex, things, action. (More definitions can be found here.) Naturally, the authors of these definitions continue on in their texts going to great pains to resolve this, but we still end up with something unsatiating, or perhaps even worse: a concrete definition. Something, say, like this: “Intelligence is the sum of all properties where the first derivative of the definite function of the extensional intersection of confident statements is zero, thus maximizing the current goal predicate.” (Note: this just came out of my cerebral blender. No one actually made this statement. But, could you tell?)

Others take a different approach. Rather than even attempt to reduce intelligence to 20 words or less, they merely provide a list of requirements: autonomous, goal-directed, learning, adaptive, capacity for reason and abstract thought, capacity for knowledge and the ability to acquire it, ability to solve problems. More amorphous words. (No, i did not just learn the word “amorphous” today. But i admit i haven’t had much of a chance to use it in the past.) Clearly, we’re no better off.

Some commentators say that the word “intelligence” itself is steeped in human experience, and so could only hope to describe human intelligence, and subjectively at that. This is a valid argument, especially since it seems widely accepted that there is a substantial difference between human intelligence and general intelligence. But now our question becomes even harder. Now we need to create a definition of a concept that we don’t even understand in ourselves, much less in some general sense.

The answer still eludes, but the method of finding it is sound. Let everyone submit their own hypotheses, build systems based upon them, and see what happens. After all, the soil in which human knowledge grows is fertilized by ideas that made sense at the time. And there’s no doubt this soil is fertile, having decomposed ideas for roughly 60 years now.

And so without further ado, i present the latest version of my own working definition:

Intelligence is the system that allows an agent to discover predictable patterns in its environment, which it can thereafter use to maximize its goals. – Matthew Lohbihler

And now, let me try to nail down the Jello™ words.

  • System: a software application running on a computer, or a biological brain, or anything else someone may want to try.
  • Pattern: a set of spatial-temporal events detectable by the agent’s senses. E.g. a stationary or moving scene, a sound, a tactile sensation, etc. Patterns can be built up in hierarchies to produce more abstract patterns, i.e. sounds into phonemes into words into sentences into ideas.
  • Environment: the universe of possible sensory input the agent can experience.
  • Goals: motivations that carry benefits for the agent. First order examples are food/nourishment and sex/reproduction corresponding to ontological and phlogenical survival respectively. Higher order examples include social cooperation and social domination. Pathological examples include drug additions and gambling. Come to think of it, there could be “zero-th order” motivations that they all fold into: the release of dopamine and other neurotransmitters that act as the most basic reward/punishment mechanisms of the brain.

(To me the rest of the words are self-explanatory, but i’ve been ruminating on this definition for a long time. Add a comment if you would like more explanation.)

This definition allows a continuum of intelligence. It doesn’t need to be there or not. Obviously, non-humans have some degree of intelligence; it’s accepted that humans are simply more intelligent than apes, not that humans are intelligent and apes are not. Thus, the capacity for intelligence exists in all animals, just to different degrees. Also, intelligence can take many forms depending upon the senses and the capacity for discovery, storage, and utility of pattern unique to the animal. On this scale, insects often seem to be void of intelligence since they appear to lack the capacity to learn patterns: they just recognize and react to the patterns that are hard-coded into their genomes. But bees may have short-term intelligence in that they can remember the locations of food sources and communicate this information to comrades. Humans take it to whole new levels.

But intelligence levels do not necessarily climb as species evolve. Evolutionary neurology seems to be a struggle between the flexibility of intelligence, and the efficiency of hard-coding. In survival terms, intellectual capacity is very expensive. (About 30% of all the calories you consume go to keeping your relatively massive brain running.) And so in practice we regularly see that animals have as little intelligence as they can get away with.

This also provides a nearly plausible explanation for why we still haven’t built an AGI in a computer: it’s too easy not to build it. Narrow systems built for specialized purposes (spreadsheets, e-commerce web sites, global weather simulations, data mining) are multiple orders of magnitude more efficient in CPU, memory, and human developer utilization than the most efficient AGI that we can currently imagine.

The order of the places

December 21st, 2009

The order of the places will preserve the order of the things – Cicero

Simonides of Ceos was the lone survivor of a tragic building collapse, having luckily left the feast of grandees to step outside just as the walls fell in, crushing everyone else. The only way he could identify the bodies was by association with where he recalled they had been sitting. This technique came to be called Method of Loci or the “Memory Palace”, a powerful way to remember potentially large amounts of information.

Used to remember speech, as Joseph Brean says, “the idea is to imagine the speech as a journey through the various rooms of the Memory Palace”. Autist Daniel Tammet who famously recited pi to 22,514 decimal places, described his memory (in his book Embracing the Wide Sky) of the number as a landscape through which he traveled.

I’ve written before how the one thing that all living things with a nervous system have in common is that they move. I’ve claimed that movement is universally the motivation for having a nervous system in the first place, and as such the areas of the brain not directly to do with movement will—because of evolutionary conservation—have their origins in those areas that do. (The ability to feel as a warning of danger is a good reason for a nervous system too, but only if you plan to use that sensory information to cause movement. Otherwise, armour, needles, or poisons make much better deterrents.)

A great deal of the human experience is based upon movement and our situation in the spatial world, and locomotive concepts infuse our day to day lives. Why does the engine “run” even when the car isn’t moving? Where is it exactly that we “take” a bath? Where do we find ourselves after we “go there” in a conversation? What map describes the various “crossroads” in our lives? Analogies to space and movement abound in even our more abstract ideas. Indeed, such analogies help us understand these abstractions, putting them into frames of reference that we all can understand.

As such, when trying to understand how intelligence arises, it makes sense to understand how movement is actuated. It is possible that areas of the brain dealing with higher level cognition may have evolved significantly away from motor areas, but both social and neurophysiological evidence suggests otherwise: it’s all the same.

Emergent behaviours appear intelligent

December 18th, 2009

I bet the human brain is a kluge – Marvin Minsky

Alan Turing

Alan Turing

During the early 1940′s, Alan Turing was busy helping the war effort with his continued work on cryptanalysis. In addition he was also thinking hard on issues of computability, artificial intelligence, and his famous Turing Test. He is now recognized as having played a critical role in the WWII allied victory.

At the same time, his colleagues in ballistics were stumped. They couldn’t figure out how the German V-1 flying bomb could hit London after having flown potentially hundreds of kilometers through unknowable weather conditions. How had the Germans done it? Recall that this was a time long before transistors and electronic control, much less auto pilot, but yet the bombs regularly and uncannily found their targets.

In typical big-brained fashion, the scientists had approached the unraveling of the mystery by starting with the assumption of perfection. Perhaps the bombs were guided by radio signals somehow, or there was some sort of sensitivity to the earth’s electromagnetic waves. That might serve to keep it going in the right direction, but how did it “know” to fall out of the sky at just the right moment? It may have been in the air for 20-30 minutes, but yet it could fall on London accurately when there was less than a 30 second window in which to do so. Did it navigate by the stars or the sun? Did it have some kind of “eye” that recognized the city? The scientists all scratched their big heads.

The answer arrived literally on their doorsteps when one of the bombs failed to explode, providing them the opportunity to open it up an take a look. I can only imagine there being a chuckle or two by the good-natured fellows in the group, and guffaws from the rest. At the tip of the bomb was an anemometer attached to a counter. As the device flew the anemometer turned, and each 30 turns the counter decremented by one. When the counter reached zero a combination of internal devices (explosive bolts and wire cutters) caused spoilers to activate and the rudder to be set to neutral, putting the V-1 into a dive. That was as fancy as it got. When launching, the German engineers merely set the counter based upon their distance from London (accommodating for wind speed), pointed the bomb in the right direction, and sent it on its way. Good enough. Probability did the rest.

It’s always amazing and intriguing to see how simple things that creatively exploit statistics can appear to behave intelligently. I’ve mentioned Boids, by Craig Reynolds, before. The video to the left shows how as few as 3 simple rules can cause otherwise independent agents to fly in seemingly organized flocks. It starts off random, but the flocking behaviour quickly emerges.

Sometime, maybe 20 years from now or so, MRI technology may let us watch brains operate in real time. We’ll be able to see thoughts move from one area of the brain to others and back, and as beliefs are established and settled. And we’ll be able to play back the tapes and analyze exactly how it all happened. Will a few of us chuckle and say, “is that it? is that all?” It’s been good enough so far.

Robotics and kinetics

December 17th, 2009

The field of kinetics in robotics, it must be said, is probably the most successful failure in all of the extended AI world. Robots that use the principles of kinetics are nearly ubiquitous in the manufacturing world. This blog is not intended to be a forum for political or economic ideas, but i’ll hazard to say that these robots are likely one of the most important reasons for the wide availability of cheap goods throughout the modern world. The industrial productivity that they make possible is enormous.

So, why are they failures? It’s a cheap shot to be sure, since nearly all of these robots were never intended to solve AI problems, much less AGI problems; they were meant to solve manufacturing problems. To the AI world though, their prevalence eclipses the work that is being done with non kinetics-based robots.

In general, kinetics-based robots are like computer programs. They run a scripted set of operations according to strict rules. The code never learns or changes. The equipment is unaware of goals beyond their immediate movements. In fact, most of these robots are unaware of themselves: they have few or no sensors that indicate such things as position in space, proprioception, pressure, temperature, etc. To compensate for this sensory lacking, the field uses heuristics like the weight of the robot needs to be 10 times the weight of anything it manipulates so that the robot’s calibration is not disturbed. Imagine if an organism’s survival was threatened by moving things more than one tenth its weight. They would be considered fragile indeed.

But yet kinetics-based robots have reached levels of truly impressive sophistication. For example, Honda’s iconic humanoid, Asimo, has now hit 9 years old. In the video, groups of them can be seen dancing together following a fluid and very natural looking choreography. But if, say, the left side of the stage were to be lifted a few feet, would the robots fall over and continue their dance on their sides, compensate and complete the dance on their feet, or walk off stage in a huff? I suspect the former, but i would be thrilled to be shown otherwise (assuming the programmers hadn’t planned for this specific challenge in advance). Especially walking off stage like angry prima donnas. That would really be great.

The Robotic Arm task in GoiD takes an entirely different, although certainly not new, approach. The arm can sense its environment in that it knows its angles of extension and the rate at which the angles are changing. This alone allows very realistic movement with only a very simple script. To be sure, the task that the arm has to solve is simplistic as well, but after review it should be clear that the arm’s behaviour can be extended in modular ways. It differs from classical kinetics in that, for example, to get the manipulator (the end of the arm) to a particular spot, kinetics would (putting it very simply) calculate how much each joint would need to move to achieve the goal, and then move each joint, assuming once done that the manipulator is where it needs to be. The GoiD arm also changes joint positions based upon calculations, but in real time rather than by formulating a plan in advance. And, the GoiD arm knows it’s done because its sensors tell it the manipulator is in the correct spot.

It is likely the same principles are used by the Ishikawa Komuro Laboratory in their “sensory fusion” robots. Their use of sensory feedback to direct movement at this speed and accuracy is, to my knowledge, unique. My hope is that their techniques can be modularized, such that they can be directed by higher and higher level abstractions.

Scripting subsumption architecture

December 15th, 2009

One of the most useful architectural approaches in scripting GoiD agents is a concept originally developed (or at least first coined) by Rodney Brooks called Subsumption Architecture. In its GoiD incarnation it takes the form of a kind of vetoing of control from low level behaviours up to the higher levels. My first exposure to the concept was in Brook’s book Cambrian Intelligence, although i assume he published the idea in papers or such before then. By the time i was sketching out the code for the Collector task i had much forgotten about it. It wasn’t until i had organized about three or four behaviour modules in a dead-straight vetoing line that i recalled the diagrams from the book that. So much for being original.

But as usual it wasn’t until i wrote the code for myself—and while doing so had considered and sometimes attempted the alternatives—that i grasped the power of the concept. As implemented in the collector sample script, it looks like this (see the Collector task to review the full sample script):

// Remember where we are, relatively speaking.
displacementTracking.execute();

// Check if there is a target we can collect.
if (!collectTarget()) {
    // If not, check if we recently collected a target.
    if (!postTargetCollection.execute()) {
        // If not, check if we need to avoid an obstacle.
        if (!obstacleAvoidance.execute())
            // If not, search around randomly.
            randomSearching.execute();
    }
}

The displacement tracking is separated from the subsumption structure so that it occurs at every time step. This is important, since for this agent knowing where it is located is key to its success. If the behaviour were put anywhere where it could be vetoed, it would be akin to a person walking absentmindedly in the woods, and later looking up and realized he was entirely lost.

As i mentioned before, behaviours are arranged typically with the lowest-levels having veto over the higher levels, but “level” is an ambiguous term. What is happening above is as follows.

  • The collectTarget behaviour checks the visual field for a target. If one can be seen, the agent is moved towards it with a declining speed (to make sure that the target is not missed due to floating point rounding errors). In this case, true is returned, which vetoes the execution of the other behaviours. If no targets are visible, false is returned allowing the next behaviour to execute.
  • The postTargetCollection is a behaviour that is relevant only immediately after a target has been collected. The idea is that if a target was found, there is a good chance that another target is nearby. If another were visible the collectTarget behaviour would have seen it and vetoed further execution, and so the only time this behaviour can run is when no targets are visible. But because the visual field of the agent is limited there still could be other targets nearby, and just by turning around they would be exposed. So, this behaviour executes over a period of time such that immediately after a target is found the agent is turned around until the immediate vicinity is scanned. While the turn is being executed the behaviour vetoes downstream behaviours, but of course if a target is found this behaviour itself will get vetoed. Note that the implementation of this behaviour includes a maxLookAroundCounter variable. This is a very simple and somewhat abstract implementation of learning within the agent. Simply put, if the agent finds a target when it was about to give up looking, it extends how long it will look in the future. This can also be considered a form of automatic calibration: rather than have the developer manually figure out how many turns it takes to completely scan the neighbourhood, let the agent determine that from experience.
  • The obstacleAvoidance behaviour only executes if there is no target in sight, and the agent is not currently executing a post-target collection turn. Quite simply, it allows the agent to avoid banging into walls. When an obstacle is in the visual field, the agent turns away from it at a radius proportional to the distance to the obstacle; the closer to the wall, the tighter the turn. Some edge cases are handled, but as anyone who has run the sample script knows, there are a couple of places in the environment in which the agent can get stuck. Fixes for these cases are simple to implement.
  • Finally, the randomSearching behaviour is allowed to run only if it has not been vetoed by any of the others. In general it alternates between making relatively gentle turns in either direction. By probability it will eventually search an entire space, but there are likely more efficient ways that this can be done.

The details of the implementations aside (as interesting as they are, right?), it is intriguing to note how even this trivial agent implementation has analogies with real-world organisms. While watching the agent run in GoiD, it is sometime uncanny how similar it can look to flying insects (which is why the task was originally called “bees”). This sort of phenomenon has been noted before, of course, as far back as 1986 with Craig Reynolds Boids. In essence, a simple set of rules can result in remarkably realistic behaviour. The film industry has also used this to great graphic effect in creating armies of Orcs, herds of dinosaurs, stampedes of mammoths, squadrons of Cylons, etc.

But even more so, the code structure can be imagined to resemble brain structure. The mammalian brain is largely hierarchical in organization. Sensory input is distributed to multiple systems, but “winner take all” scenarios are common in determining which behaviour the organism eventually enacts. Mammals are many orders of magnitude more complicated, but it’s possible that code structures like that above, if replicated and organized hierarchically, might begin to act like the real thing.

Also, more abstractly, subsumption might resemble emotions. For example, certain behaviours are nearly impossible while a person is consumed with anger. It could be said that emotion can veto behaviour in the same time-dependent way that post target collection can veto searching.

More concretely, we know that multiple areas of the brain receive the same sensory input and process it in different ways. This is similar to how displacement tracking receives all the same input that the subsumption structure receives. The two operate independently, but are able to share information in order to improve their own effectiveness.