# RoboCup Synthetic Agent Challenge 97

## Overview

RoboCup Challenge offers a set of challenges for intelligent agent researchers using a friendly competition in a dynamic, real-time, multi-agent domain: synthetic Soccer. While RoboCup in general envisions longer range challenges over the next few decades, RoboCup Challenge presents three specific challenges for the next two years: (i) learning of individual agents and teams; (ii) multi-agent team planning and plan-execution; and (iii) opponent modeling. RoboCup Challenge provides a novel opportunity for researchers in planning and multi-agent arenas --- it not only supplies them a concrete domain to evalute their techniques, but also challenges them to evolve these techniques to face key constraints fundamental to this domain: real-time and teamwork.

## Overview of The RoboCup Synthetic Agent Challenge

For the RoboCup Synthetic Agent Challenge 97, we offer three specific targets, critical not only for RoboCup but also for general AI research. These challenges will specifically deal with the software agent league, rather than the real robot league. (Challenges for physical robots will be described elsewhere.)

The fundamental issue for researchers who wish to build a team for RoboCup is to design a multiagent system that behaves in real-time, performing reasonable goal-directed behaviors. Goals and situations change dynamically and in real-time. Because the state-space of the soccer game is prohibitively large for anyone to hand-code all possible situations and agent behaviors, it is essential that agents learn to play the game strategically. Research issues on this aspect of the challenge involves: (1) machine learning in a multiagent, collaborative and adversarial environment, (2) multiagent architectures, enabling real-time multiagent planning and plan execution in service of teamwork, and (3) opponent modelling.

Therefore, we propose following three challenges as areas of concentration for the RoboCup Synthetic Agent Challenge 97:

• Learning challenge
• Teamwork challenge
• Opponent modeling challenge
Evaluating how well competing teams meet these challenges in RoboCup is clearly difficult. If the task is to provide the fastest optimization algorithm for a certain problem, or to prove a certain theorem, the criteria are evident. However, in RoboCup, while there may be a simple test set to examine basic skills, it is not generally possible to evaluate the goodness of a team until they actually play a game. Furthermore, a standard, highly skilled team of opponents is useful to set an absolute basis for such evaluation. We hope to use hand-coded teams, possibly with highly domain-specific coordination, to provide such a team of opponents. Indeed, in a series of preliminary competitions such as PreRoboCup-96 held at the IROS-96 conference, and several other local competitions, teams with well-designed hand-coded behaviors, but without learning and planning capabilities, have performed better than teams with learning and planning schemes. Of course, these hand-coded teams enjoyed the advantage of very low game complexities in initial stages of RoboCup --- increasingly complex team behaviors, tactics and strategies will necessitate agents to face up to the challenges of learning, teamwork and opponent modeling.

Therefore, responses to this challenge will be evaluated based on (1) their performance against some standard hand-coded teams as well as other teams submitted as part of the competition; (2) behaviors where task specific constraints are imposed, such as probabilistic occurance of unexpected events, (3) a set of task specific sequences, and (4) novelty and technical soundess of the apporach.

## The RoboCup Learning Challenge

### Objectives

The objectives of the RoboCup Learning Challenge is to solicit comprehensive learning scheme which can be applied to the learning of multiagent systems which need to adapt to the situation, and to evaluate merits and demerits of proposed approaches using the standard tasks.

Learning is an essential aspect of intelligent systems. In the RoboCup learning challenge, the task is to create a learning and training method for a group of agents. The learning opportunities in this domain can be broken down into several types:

• Off-line skill learning by individual agents;
• Off-line collaborative learning by teams of agents;
• On-line skill and collaborative learning;
The distinction between off-line and on-line learning is particularly important in this domain since games last for only 20 minutes. Thus on-line techniques, particularly if they are to learn concepts that are specific to an individual game, must generalize very quickly. For example, if a team is to learn to alter its behavior to an individual opponent, the team had better be able to improve its performance before the game is over and a new opponent appears. Such distinctions in learning can be applied to broad range of multi-agent system which involves learning capability.

### Technical Issues

Technical issues anticipated in meeting this challenge is the development of novel learning scheme which can effectively train indivdual agents and their teamworks in both off-line and on-line. One example of possible learning scheme for meeting this challenge is as follows:

Off-line skill learning by individual agents:
learning to intercept the ball or learning to kick the ball with the appropriate power when passing.
Since such skills are challenging to hand-code, learning can be useful during a skill development phase. However, since the skills are invariant from game to game, there is no need to relearn them at the beginning of each new game.

Off-line collaborative learning by teams of agents:
learning to pass and receive the ball.
This type of skill is qualitatively different from the individual skills in that the behaviors of multiple agents must be coordinated. A "good" pass is only good if it is appropriate for the receivers receiving action, and vice versa. For example, if the passer passes the ball to the receiver's left, then the receiver must at the same time move to the left in order to successfully complete a pass. As above, such coordination can carry over from game to game, thus allowing off-line learning techniques to be used.

On-line skill and collaborative learning:
learning to play positions.
Although off-line learning methods can be useful in the above cases, there may also be advantages to learning incrementally as well. For example, particular aspects of an opposing teams' behavior may render a fixed passing or shooting behavior inefective. In that case, the ability to adaptively change collaborative or individual behaviors during the course of a game, could contribute to a team's success.
At a higher level, team issues such as role (position) playing on the field might be best handled with adaptive techniques. Against one opponent it might be best to use 3 defenders and 8 forwards; whereas another opponent might warrant a different configuration of players on the field. The best teams should have the ability to change configurations in response to events that occur during the course of a game.

learning to react to predicted opponent actions.
If a player can identify patterns in the opponents' behaviors, it should be able to proactively counteract them. For example, if the opponent's player number 4 always passes to its teammate number 6, then player 6 should always be guarded when player 4 gets the ball.

### Evaluation

For challenge responses that address the machine learning issue (particularly the on-line learning issue), evaluation should be both against the publicly available teams and against at least one previously unseen team.
First, teams will play games against other teams and publicly available teams under normal circumstances. This evaluates the team's general performance. This involves both AI-based and non-AI based teams.
Next, teams will play a set of defined benchmarks. For example, after fixing their programs, challengers must play a part of the game, starting from the defined player positions, with the movement of the opponents pre-defined, but not disclosed to the challengers. After several sequences of the game, the performance will be evaluated to see if it was able to improve with experience. The movement of the opponents are not coded using absolute coordinate positions, but as a set of algorithms which generates motion sequences. The opponent algorithms will be provided by the organizers of the challenge by withholding at least one successful team from being publicly accessible.
Other benckmarks which will clearly evaluate learning performance will be announced after discussing with challenge participants.

## The RoboCup Teamwork Challenge

### Objectives

The RoboCup Teamwork Challenge addresses issues of real-time planning, re-plannig, and execution of multi-agent teamwork in a dynamic adversarial environment. Major issues of interest in this specific challenge for the 97-99 period are architectures for real-time planning and plan execution in a team context (essential for teamwork in RoboCup). In addition, generality of the architecture for non-RoboCup applications will be an important factor.
Teamwork in complex, dynamic multi-agent domains such as Soccer mandates highly flexible coordination and communication to surmount the uncertainities, e.g., dynamic changes in team's goals, team members' unexpected inability to fulfil responsibilities. Unfortunately, implemented multi-agent systems often rely on preplanned, domain-specific coordination that fails to provide such flexibility. First, it is difficult to anticipate and preplan for all possible coordination failures; particularly in scaling up to complex situations. Thus, it is not robust enough for dynamic tasks, such as soccer games. Second, given domain specificity, reusability suffers. Furthermore, planning coordination on the fly is difficult, particularly, in domains with so many possible actions and such large state spaces. Indeed, typical planners need significantly longer to find even a single valid plan. The dynamics of the domain caused by the unpredictable opponent actions make the situation considerably more difficult.
A fundamental reason for these teamwork limitations is the current agent architectures. Architectures such as Soar, RAP, IRMA, and BB1 facilitate an individual agent's flexible behaviors via mechanisms such as commitments and reactive plans. However, flexible individual behaviors, even if simultaneous and coordinated, do not sum up to teamwork. A common example provided is ordinary traffic, which even though simultaneous and coordinated, is not teamwork. Indeed, theories of collaboration point to fundamentally novel mental constructs as underlying teamwork, such as team goals/plans, and joint commitments, lacking in current agent architectures. In particular, team goals and plans are not explicitly represented; furthermore, concepts of team commitments are absent. Thus, agents cannot explicitly reason about their dynamic team goals and plans; nor flexibly communicate/coordinate when unanticipated events occur. For instance, an agent cannot itself reason about its coordination responsibilities when it suddenly realizes that the team's current plan is unachievable --- e.g., that in the best interest of the team, it should inform its teammates. Instead, agents must rely on domain-specific coordination plans that address such contigencies on a case-by-case basis.
The basic architectural issue in the teamwork challenge is then to construct architectures that can support planning of team activities, and more importantly execution of generated team plans. Such planning and plan execution may be accomplished via a two tiered architecture, but the entire system must operate in real-time. In RoboCup Soccer Server, sensing will be done in every 300 to 500 milli-seconds, and action command can be dispatched every 100 milli-second. Situation changes at milli-second order, thus planning, re-planning, and execution of plans must be done in real-time.

### Technical Issues

We present a key set of issues that arise assuming our particular two tiered planning and plan-execution approach to teamwork. Of course, those who approach the problem from different perspective may have different issues, and the issues may change depending on the type of architecture employed.
The following is the envisioned teamwork challenge in this domain: (i) a team deliberatively accumulates a series of plans to apply to games with different adversarial teams; (ii) each game plan is organized as a graph-structured network of different plan segments labeled with specific contingencies that should trigger the shift in plan traversal; (iii) game plans are defined at an abstract level that needs to be refined for real execution; (iv) real-time execution in a team-plan execution framework/architecture that is capable of addressing key contigencies. Such an architecture also alleviates the planning concerns by providing some commonsense'' teamwork behaviors --- not all of the coordination actions are required to be planned in detail as a result. The key research tasks here are:
Contingency planning for multiagent adversarial game playing: Before a game starts, one would expect the team to generate a strategic plan for the game that includes contingency plan segments that are to be recognized and eventually slightly adapted in real-time. Two main challenges can be identified in this task:
• Definition of strategic task actions with probabilistic applicability conditions and effects. Uncertainty in the action specification is directly related to the identification of possible probabilistic disruptive or favorable external events.
• Definition of objectives to achieve. In this domain, the goal of winning and scoring should be decomposed in a variety of more concrete goals that serve the ultimate final scoring goal. Examples are actions and goals to achieve specific attacking or defending positioning.
Plan decomposition and merge: A correspondence between team actions and goals and individual actions and goals must be set. The team plan decomposition may create individual goals that are not necessarily known to all the team players. Furthermore, within the contingency team plan, it is expected that there may be a variety of adversary-independent and adversary-dependent goals. The decomposition, coordination, and appropriate merge of individual plans to the service of the main team plan remain open challenging research tasks. RoboCup provides an excellent framework to study these issues

Executing Team Plans: Team plan execution during the game is the determining factor in the performance of the team. It addresses the coordination contigencies that arise during the execution, without the need for detailed, domain-specific coordination plans. Execution also monitors the contingency conditions that are part of the global contingency team plan. Selection of the appropriate course of action is driven by the state information gathered by execution.

### Evaluations

The Teamwork Challenge scenario described above has been idealized by several AI researchers, at least in the planning and multiagent communities. RoboCup, both in its simulated and real leagues, provides a synergistic framework to develop and/or test dynamic planning multiagent algorithms. Specifically, we are planning to evaluate the architecture and teams in the following evaluation scheme:
• Basic Performance:
The team must be able to play reasonably well against both the best hand-coded teams, which has no planning, and against other planning-based systems. Relative performance of the team can be measured by actually playing a series of games against other unknown teams. Thus, basic performance will be measured by:
1. Performance against hand-coded teams.
2. Performance against other teams.
• Robustness: The robustness in teamwork means that the team, as a whole, can continue to carry out the mission even if unexpected changes, such as accidental removal of the players in the team, sudden change of team conposition, or changes in operation environment. For example, if one of players in the team was disabled, the team should be able to cope with such accidents, by taking over the role of disabled players, or reformulating their team strategy. Thus, this evalution represents a set of unexpected incidents during the game, such as:
1. Some players will be disabled, or their capability will be significantly undermined by these accidents. Also, some disabled players may be enabled later in the game.
2. Opponent switch their strategy, and the team must cope with their new strategy in real time.
3. Some of opponent's players will be disabled, or their performance will be significantly undermined. These disabled players may come back to the game later.
4. Teammate changes during the game.
5. Weather factor changes.
The RoboCup Teamwork Challenge therefore is to define a general set of teamwork capabilities to be integrated with agent architectures to facilitate flexible, reusable teamwork. The following then establish the general evaluation criteria:
• General Performace:
General performance of the team, thus the underlying algorithms, can be measured by a series of games against various teams. This can be divided into two classes (1) normal compeitions where no accidental factors involved, and (2) contigency evaluaiton where accidental factors are introduced.
• Real-Time Operations:
The real-time execution, monotoring, and replanning of the contingency plan is an important factor of the evaluaiton. For any team to be successful in the RoboCup server, it must be able to react in real time: sensory information arrives between 2 and 8 times a second and agents can act up to 10 times a second.
• Generality:
Reuse of architecture in other applications: Illustrate the reuse of teamwork capabilities in other applications, including applications for information integration on the internet, entertainment, training, etc.?
• Conformity with Learning:
Finally, given the premises above and the complexity of the issues, we argue and challenge that a real-time multiagent planning system needs to have the ability to be well integrated with a learning approach, i.e., it needs to refine and dynamically adapt and refine its complete behavior (individual and team) based on its past experience.
Other issues such as reuse of teamwork architecture within the RoboCup community, and planning for team players that are not yet active in order to increase their probability of being useful in future moves, such as role playing and positioning of the team players that {\it do not} have the ball, will be considered, too.

## RoboCup Opponent Modeling Challenge

Agent modeling -- modeling and reasoning about other agent's goals, plans, knowledge, capabilities, or emotions --- is a key issue in multi-agent interaction. The RoboCup opponent modeling challenge calls for research on modeling a team of opponents in a dynamic, multi-agent domain. The modeling issues in RoboCup can be broken down into three parts:
• On-line tracking:
Involves individual players' real-time, dynamic tracking of opponents' goals and intentions based on observations of actions. A player may use such tracking to predict the opponents' play and react appropriately. Thus if a player predicts that player-5 is going to pass a ball to player-4, then it may try to cover player-4. Such on-line tracking may also be used in service of deception. The challenges here are (i) real-time tracking despite the presence of ambiguity; (ii) addressing the dynamism in the world; (iii) tracking teams rather than only individuals -- this requires an understanding of concepts involved in teamwork. On-line tracking may feed input to the on-line planner or the on-line learning alogrithm.
• On-line strategy recognition:
"Coach" agents for teams may observe a game from the sidelines, and understand the high-level strategies employed by the opposing team. This contrasts with on-line tracking because the coach can perform a much higher-level, abstract analysis, and in the absence of real-time pressures, its analysis can be more detailed. The coach agents may then provide input to its players to change the team formations, or play strategy.
• Off-line review:
"Expert" agents may observe the teams playing in an after-action review, to recognize the strenghts and weaknesses of the teams, and provide an expert commentary. These experts may be trained on databases of human soccer play.
These issues pose some fundamental challenges that will significantly advance the state of the art in agent modeling. In particular, previous work has mostly focused on plan recognition in static, single-agent domains, without real-time constraints. Only recently has attention shifted to dynamic, real-time environments, and modeling of multi-agent teamwork. A realistic challenge for IJCAI-99 will be to aim for on-line tracking. Optimistically, we expect some progress towards on-line strategy recognition; off-line review will likely require further research beyond IJCAI-99. For evaluation, we propose, at least, following evaluation to be carried out to measure the progress:
• Game Playing:
A team of agents plays against two types of teams:
1. One or two unseen RoboCup team from IJCAI-97, shielded from public view.
2. The same unseen RoboCup teams from IJCAI-97 as above, but modified with some new behaviors. These teams will now deliberately try out new adventurous strategies, or new defensive strategies.
• Disabled Tracking:
Tracking functionality of the agents will be turned off, and compared with normal performance.
• Deceptive Sequences:
Fake teams will be created which generates deceptive moves. The challenger's agent must be able to recognize the opponent's deceptive moves to beat this team.
For each type of team, we will study the performance of the agent-modelers. Of particular interest is variations seen in agent-modelers behaviors given the modification in the opponents' behaviors. For each type of team, we will also study the advise offered by the coach agent, and the reviews offered by the expert agents, and the changes in them given the changes in the opponents' behaviors.