RoboCup Synthetic Agent Challenge 97
Overview
RoboCup Challenge offers a set of challenges for intelligent agent
researchers using a friendly competition
in a dynamic, real-time, multi-agent domain: synthetic Soccer.
While RoboCup in general envisions longer range challenges over
the next few decades, RoboCup Challenge presents three specific
challenges for the next two years: (i) learning of individual agents and teams;
(ii) multi-agent team planning and plan-execution; and (iii) opponent
modeling.
RoboCup Challenge provides a novel opportunity for researchers in
planning and multi-agent arenas --- it not only supplies them
a concrete domain to evalute their techniques, but also challenges
them to evolve these techniques to face key constraints
fundamental to this domain: real-time and teamwork.
Overview of The RoboCup Synthetic Agent Challenge
For the RoboCup Synthetic Agent Challenge 97, we offer three
specific targets, critical not only for RoboCup but also
for general AI research. These challenges will specifically
deal with the software agent league, rather than the real robot
league. (Challenges for physical robots will be described elsewhere.)
The fundamental issue for researchers who wish to build a team for
RoboCup is to design a multiagent system that behaves in real-time,
performing reasonable goal-directed behaviors. Goals and situations
change dynamically and in real-time. Because the state-space of the
soccer game is prohibitively large for anyone to hand-code all
possible situations and agent behaviors, it is essential that agents
learn to play the game strategically. Research issues on this
aspect of the challenge involves:
(1) machine learning in a multiagent, collaborative and
adversarial environment,
(2) multiagent architectures, enabling real-time multiagent planning and
plan execution in service of teamwork, and (3) opponent modelling.
Therefore, we propose following three challenges as
areas of concentration for the RoboCup Synthetic Agent Challenge 97:
- Learning challenge
- Teamwork challenge
- Opponent modeling challenge
Evaluating how well competing teams meet these challenges in RoboCup is
clearly difficult. If the task is to provide the fastest optimization
algorithm for a certain problem, or to prove a certain theorem, the
criteria are evident. However, in RoboCup, while
there may be a simple test set to examine basic skills,
it is not generally possible to evaluate
the goodness of a team until they actually play a game.
Furthermore, a standard, highly skilled team of opponents
is useful to set an absolute basis for such evaluation.
We hope to use hand-coded teams, possibly with highly domain-specific
coordination, to provide such a team of opponents.
Indeed, in a series of preliminary competitions such
as PreRoboCup-96 held at the IROS-96 conference, and several other
local competitions, teams with well-designed hand-coded
behaviors, but without learning and planning capabilities, have
performed better than teams with learning and planning schemes.
Of course, these hand-coded teams enjoyed the advantage of very low
game complexities in initial stages of RoboCup --- increasingly
complex team behaviors, tactics and strategies will necessitate
agents to face up to the challenges of learning, teamwork and
opponent modeling.
Therefore, responses to this challenge will be evaluated based on
(1) their performance against some standard hand-coded teams as well as
other teams submitted as part of the competition;
(2) behaviors where task specific constraints are imposed, such as
probabilistic occurance of unexpected events, (3) a set of task specific
sequences, and (4) novelty and technical soundess of the apporach.
The RoboCup Learning Challenge
Objectives
The objectives of the RoboCup Learning Challenge is to solicit
comprehensive learning scheme which can be applied to the learning of
multiagent systems which need to adapt to the situation, and to
evaluate merits and demerits of proposed approaches using the standard
tasks.
Learning is an essential aspect of intelligent systems. In the RoboCup
learning challenge, the task is to create a learning and training
method for a group of agents. The learning opportunities in this
domain can be broken down into several types:
- Off-line skill learning by individual agents;
- Off-line collaborative learning by teams of agents;
- On-line skill and collaborative learning;
- On-line adversarial learning.
The distinction between off-line and on-line learning is particularly
important in this domain since games last for only 20 minutes. Thus
on-line techniques, particularly if they are to learn concepts that
are specific to an individual game, must generalize very quickly. For
example, if a team is to learn to alter its behavior to an individual
opponent, the team had better be able to improve its performance
before the game is over and a new opponent appears.
Such distinctions in learning can be applied
to broad range of multi-agent system which involves learning capability.
Technical Issues
Technical issues anticipated in meeting this challenge is
the development of novel learning scheme which can effectively train
indivdual agents and their teamworks in both off-line and on-line.
One example of possible learning scheme for meeting this challenge is
as follows:
Off-line skill learning by individual agents:
learning to
intercept the ball or learning to kick the ball with the appropriate
power when passing.
Since such skills are challenging to hand-code, learning can be useful
during a skill development phase. However, since the skills are
invariant from game to game, there is no need to relearn them at the
beginning of each new game.
Off-line collaborative learning by teams of agents:
learning to pass and receive the ball.
This type of skill is qualitatively different from the individual
skills in that the behaviors of multiple agents must be coordinated.
A "good" pass is only good if it is appropriate for the receivers
receiving action, and vice versa. For example, if the passer passes
the ball to the receiver's left, then the receiver must at the same
time move to the left in order to successfully complete a pass. As
above, such coordination can carry over from game to game, thus
allowing off-line learning techniques to be used.
On-line skill and collaborative learning:
learning to play positions.
Although off-line learning methods can be useful in the above cases,
there may also be advantages to learning incrementally as well. For
example, particular aspects of an opposing teams' behavior may render
a fixed passing or shooting behavior inefective. In that case, the
ability to adaptively change collaborative or individual behaviors
during the course of a game, could contribute to a team's success.
At a higher level, team issues such as role (position) playing on the
field might be best handled with adaptive techniques. Against one
opponent it might be best to use 3 defenders and 8 forwards; whereas
another opponent might warrant a different configuration of players on
the field. The best teams should have the ability to change
configurations in response to events that occur during the course of a
game.
On-line adversarial learning:
learning to react to
predicted opponent actions.
If a player can identify patterns in the opponents' behaviors, it
should be able to proactively counteract them. For example, if the
opponent's player number 4 always passes to its teammate number 6,
then player 6 should always be guarded when player 4 gets the ball.
Evaluation
For challenge responses that address the machine learning issue
(particularly the on-line learning issue), evaluation should be both
against the publicly available teams and against at least one
previously unseen team.
First, teams will play games against other teams and publicly
available teams under normal circumstances. This evaluates the team's
general performance. This involves both AI-based and non-AI based teams.
Next, teams will play a set of defined benchmarks. For example, after
fixing their programs, challengers must play a part of the game,
starting from the defined player positions, with the movement of the
opponents pre-defined, but not disclosed to the challengers. After
several sequences of the game, the performance will be evaluated to
see if it was able to improve with experience. The movement of the
opponents are not coded using absolute coordinate positions, but as a
set of algorithms which generates motion sequences. The opponent
algorithms will be provided by the organizers of the challenge by
withholding at least one successful team from being publicly
accessible.
Other benckmarks which will clearly evaluate learning performance will
be announced after discussing with challenge participants.
The RoboCup Teamwork Challenge
Objectives
The RoboCup Teamwork Challenge addresses issues of real-time
planning, re-plannig, and execution of multi-agent teamwork in a
dynamic adversarial environment. Major issues of interest
in this specific challenge for the 97-99 period are
architectures for real-time planning and plan execution in a team context
(essential for teamwork in RoboCup).
In addition, generality of the architecture for non-RoboCup
applications will be an important factor.
Teamwork in complex, dynamic multi-agent domains such as
Soccer mandates highly flexible coordination and communication to
surmount the uncertainities, e.g., dynamic changes in team's goals,
team members' unexpected inability to fulfil responsibilities.
Unfortunately, implemented multi-agent systems
often rely on preplanned, domain-specific coordination that fails
to provide such flexibility. First, it is difficult
to anticipate and preplan for all possible coordination failures;
particularly in scaling up to complex situations. Thus, it is not
robust enough for dynamic tasks, such as soccer games. Second,
given domain specificity, reusability suffers. Furthermore,
planning coordination on the fly is difficult, particularly, in domains
with so many possible actions and such large state spaces. Indeed, typical
planners need significantly longer to find even a single valid plan.
The dynamics of the domain caused by the unpredictable opponent
actions make the situation considerably more difficult.
A fundamental reason for these teamwork limitations is the
current agent architectures. Architectures such as
Soar, RAP, IRMA, and BB1
facilitate an individual agent's flexible behaviors via
mechanisms such as commitments and reactive plans. However, flexible
individual behaviors, even if simultaneous and coordinated, do not sum
up to teamwork. A common example provided is ordinary traffic,
which even though simultaneous and coordinated, is not teamwork.
Indeed, theories of collaboration point to fundamentally novel
mental constructs as underlying teamwork, such as team goals/plans,
and joint commitments, lacking in
current agent architectures. In particular,
team goals and plans are not explicitly represented; furthermore,
concepts of team commitments are absent. Thus,
agents cannot explicitly reason about their dynamic
team goals and plans; nor flexibly communicate/coordinate
when unanticipated events occur. For instance, an agent
cannot itself reason about its coordination responsibilities
when it suddenly realizes that the team's current plan
is unachievable --- e.g., that in the best interest
of the team, it should inform its teammates.
Instead, agents must rely on domain-specific coordination plans
that address such contigencies on a case-by-case basis.
The basic architectural issue in the teamwork challenge is then to construct
architectures that can support planning of
team activities, and more importantly execution of
generated team plans. Such planning and plan execution may be
accomplished via a two tiered architecture, but the entire
system must operate in real-time.
In RoboCup Soccer Server, sensing will be done in every 300 to 500
milli-seconds, and action command can be dispatched every 100
milli-second. Situation changes at milli-second order, thus planning,
re-planning, and execution of plans must be done in real-time.
Technical Issues
We present a key set of issues that arise assuming our particular
two tiered planning and plan-execution approach to teamwork.
Of course, those who approach the problem from different
perspective may have different issues, and the issues may
change depending on the type of architecture employed.
The following is the envisioned teamwork challenge in this domain: (i) a team
deliberatively accumulates a series of plans to apply to games with
different adversarial teams; (ii) each game plan is organized as a
graph-structured network of different plan segments labeled with
specific contingencies that should trigger the shift in plan
traversal; (iii) game plans are defined at an
abstract level that needs to be refined for real execution;
(iv) real-time execution in a team-plan execution framework/architecture
that is capable of addressing key contigencies. Such an architecture
also alleviates the planning concerns by providing some ``commonsense''
teamwork behaviors --- not all of the coordination actions are required
to be planned in detail as a result. The key research tasks here are:
Contingency planning for multiagent adversarial game
playing: Before a game starts, one would expect the team to generate
a strategic plan for the game that includes contingency plan segments
that are to be recognized and eventually slightly adapted in real-time.
Two main challenges can be identified in this task:
- Definition of strategic task actions with probabilistic
applicability conditions and effects. Uncertainty in the action
specification is directly related to the identification of possible
probabilistic disruptive or favorable external events.
- Definition of objectives to achieve. In this domain, the goal of
winning and scoring should be decomposed in a variety of more concrete
goals that serve the ultimate final scoring goal. Examples are
actions and goals to achieve specific attacking or defending positioning.
Plan decomposition and merge:
A correspondence between team actions and goals and individual actions
and goals must be set. The team plan decomposition may create
individual goals that are not necessarily known to all the team
players. Furthermore, within the contingency team plan, it is expected
that there may be a variety of adversary-independent and
adversary-dependent goals. The decomposition, coordination, and
appropriate merge of individual plans to the service of the main team
plan remain open challenging research tasks. RoboCup provides an
excellent framework to study these issues
Executing Team Plans:
Team plan execution during the game is the determining factor in the
performance of the team. It addresses the coordination contigencies
that arise during the execution, without the need for detailed,
domain-specific coordination plans. Execution also monitors the contingency
conditions that are part of the global contingency team plan. Selection
of the appropriate course of action is driven by the state
information gathered by execution.
Evaluations
The Teamwork Challenge scenario described above has been
idealized by several AI researchers, at least in the planning and
multiagent communities. RoboCup, both in its simulated and real
leagues, provides a synergistic framework to develop and/or test
dynamic planning multiagent algorithms.
Specifically, we are planning to evaluate the architecture and teams
in the following evaluation scheme:
- Basic Performance:
The team must be able to play reasonably well against both the best
hand-coded teams, which has no planning, and against other
planning-based systems. Relative performance of the team can be
measured by actually playing a series of games against other unknown
teams.
Thus, basic performance will be measured by:
- Performance against hand-coded teams.
- Performance against other teams.
- Robustness:
The robustness in teamwork means that the team, as a whole, can
continue to carry out the mission even if unexpected changes, such as
accidental removal of the players in the team, sudden change of team
conposition, or changes in operation environment. For example, if one
of players in the team was disabled, the team should be able to cope
with such accidents, by taking over the role of disabled players,
or reformulating their team strategy. Thus, this evalution
represents a set of unexpected incidents during the game, such as:
- Some players will be disabled, or their
capability will be significantly undermined by these
accidents. Also, some disabled players may be enabled later in the game.
- Opponent switch their strategy, and the team
must cope with their new strategy in real time.
- Some of opponent's players will be disabled, or
their performance will be significantly undermined. These disabled
players may come back to the game later.
- Teammate changes during the game.
- Weather factor changes.
The RoboCup Teamwork Challenge therefore is to define
a general set of teamwork capabilities to be integrated with
agent architectures to facilitate flexible, reusable teamwork.
The following then establish the general evaluation criteria:
- General Performace:
General performance of the team, thus the underlying algorithms, can
be measured by a series of games against various teams.
This can be divided into two classes (1) normal compeitions where no
accidental factors involved, and (2) contigency evaluaiton where
accidental factors are introduced.
- Real-Time Operations:
The real-time execution, monotoring, and replanning of the
contingency plan is an important factor of the evaluaiton.
For any team to be successful in the RoboCup server,
it must be able to react in real time: sensory information arrives
between 2 and 8 times a second and agents can act up to 10 times a
second.
- Generality:
Reuse of architecture in other applications: Illustrate
the reuse of teamwork capabilities in other applications,
including applications for information integration on the
internet, entertainment, training, etc.?
- Conformity with Learning:
Finally, given the premises above and the complexity of the issues, we
argue and challenge that a real-time multiagent planning system needs
to have the ability to be well integrated with a learning approach,
i.e., it needs to refine and dynamically adapt and refine its complete
behavior (individual and team) based on its past experience.
Other issues such as reuse of teamwork architecture within the RoboCup
community, and planning for
team players that are not yet active in order to increase their
probability of being useful in future moves, such as role playing and
positioning of the team players that {\it do not} have the ball, will
be considered, too.
RoboCup Opponent Modeling Challenge
Agent modeling -- modeling and reasoning
about other agent's goals, plans, knowledge,
capabilities, or emotions --- is a key issue in
multi-agent interaction. The RoboCup opponent
modeling challenge calls for research on modeling a
team of opponents in a dynamic, multi-agent domain.
The modeling issues in RoboCup can be broken down
into three parts:
- On-line tracking:
Involves individual players' real-time,
dynamic tracking of opponents' goals and intentions based on
observations of actions. A player may use such tracking
to predict the opponents' play and react appropriately.
Thus if a player predicts that player-5 is going to pass a ball
to player-4, then it may try to cover player-4.
Such on-line tracking may also be used in service of deception.
The challenges here are (i) real-time tracking despite
the presence of ambiguity; (ii) addressing the dynamism
in the world; (iii) tracking teams rather than only
individuals -- this requires an understanding of concepts
involved in teamwork.
On-line tracking may feed input to the on-line planner or
the on-line learning alogrithm.
- On-line strategy recognition:
"Coach" agents for teams
may observe a game from the sidelines, and understand
the high-level strategies employed by the opposing team.
This contrasts with on-line tracking because the coach
can perform a much higher-level, abstract analysis, and
in the absence of real-time pressures, its analysis can
be more detailed.
The coach agents may then provide input to its players
to change the team formations, or play strategy.
- Off-line review:
"Expert" agents may observe the teams
playing in an after-action review, to recognize the
strenghts and weaknesses of the teams, and provide an
expert commentary. These experts may be trained on
databases of human soccer play.
These issues pose some fundamental challenges that
will significantly advance the state of the art in agent
modeling. In particular, previous work has mostly focused on
plan recognition in static, single-agent domains, without real-time
constraints. Only recently has attention shifted
to dynamic, real-time environments, and modeling
of multi-agent teamwork.
A realistic challenge for IJCAI-99 will be to aim for
on-line tracking. Optimistically, we expect some progress
towards on-line strategy recognition; off-line review will
likely require further research beyond IJCAI-99.
For evaluation, we propose, at least, following evaluation
to be carried out to measure the progress:
- Game Playing:
A team of agents plays against two types of teams:
- One or two unseen RoboCup team from IJCAI-97, shielded
from public view.
- The same unseen RoboCup teams from IJCAI-97 as above, but
modified with some new behaviors. These teams will now deliberately
try out new adventurous strategies, or new defensive strategies.
- Disabled Tracking:
Tracking functionality of the agents will be
turned off, and compared with normal performance.
- Deceptive Sequences:
Fake teams will be created which generates
deceptive moves. The challenger's agent must be able to recognize the
opponent's deceptive moves to beat this team.
For each type of team, we will study the performance
of the agent-modelers. Of particular interest is variations
seen in agent-modelers behaviors given the modification
in the opponents' behaviors. For each type of team,
we will also study the advise offered by the coach agent, and the reviews
offered by the expert agents, and the changes in them given
the changes in the opponents' behaviors.