December 18, 2011

2011 Google AI Challenge (Ants)

With just less than 24 hours to go before the end of the 2011 Google AI Challenge, xathis is way out in front of the rest of the field (as has been the case for most of the competition). I certainly can’t wait to see how he did it.

This post will be a pretty thorough overview of my algorithm, which has kept my entry (delineate) in the top 10 for the past couple months.



The core idea is quite simple: use lots of breadth-first searches (BFS) in order to evaluate all the squares we care about.

It’s somewhat similar to the collaborative diffusion approach that has been discussed on the forums, but I look at everything from an ant’s perspective which allows for easier local improvements. Of course, many numbers are pre-computed by starting a BFS from features on the map to speed up computation.

For each ant, we’ll calculate a “score” for the 5 possible moves it can make (N, S, E, W, or staying put) and then move the ant to the highest scoring square.

In general, squares are more highly scored if they’re close to:

  1. ground
  2. food
  3. an enemy hill
  4. our own hill
  5. an enemy ant
The value for most of these features should depend on the local conditions. For example, an enemy hill is much more valuable when we “locally outnumber” the enemy. Food should only be valuable to the ant closest to it. I list all the refinements later on in this post. The pseudo-code (without any local improvements) for figuring out the value of a square (x,y) is as follows.
do a BFS up to depth D starting at (x,y)
for each square s encountered in the BFS:
   d = current depth (i.e. distance to (x,y))
   decay = 1/(1+d*d)
   if s is ground:
       ground_score = kGroundValue*decay
   if s is food:
       food_score = kFoodValue*decay
   if s is an enemy_hill:
       enemy_hill_score = kEnemyHillValue*decay
   etc.
return ground_score + food_score + enemy_hill_score + ...

Now we need to refine this behavior locally:

  • The food value is only given to the ant if all the following conditions are met:
    1. This ant is “closest” to the food (ties are broken randomly).
    2. If by moving to (x,y), the ant is moving “closer” to the food. (Note: there are also some special cases, like on turn 1, that need to be dealt with here)
    3. There is not an enemy ant much closer to the food. (Specifically, if distance_from_food_to_this_ant < distance_from_food_to_closest_enemy + 2).
  • The enemy hill value is only given when the ant is moving towards the hill. If the number of ants “nearby” is greater than the number of enemies “nearby”, then give a bonus enemy hill value (applicable only to ants “nearby”). Furthermore, if one of our ants has a direct path to the enemy hill where no enemy ant can stop it, then give an even bigger bonus.

  • Only give points to our own hill when an enemy is nearby and it needs defending. Also, only give these points to our “nearby” ants.

  • Dealing with the enemy ants case is a bit more complicated. We will give a penalty to the square for each enemy ant that is able to “attack” it next turn. Also, we will give a bonus to this square in some cases when we have more ants surrounding the enemy than the enemy has. Specifically, I count ants that are within the attack radius or at most a distance 2 away from the attack radius. The way the bonus is calculated is much more easily understood in code form:
    if (moving_towards) {
      if (self_count > enemy_count) {
        // encourage moving in directions that "cover" more of the
        // enemy's escape routes
        int count = 0;
        if (state.distance2(row, col, r+1, c) <= state.attackradius2() &&
            !state.square(r+1, c).is_water()) {
          ++count;
        }
        if (state.distance2(row, col, r, c+1) <= state.attackradius2() &&
            !state.square(r, c+1).is_water()) {
          ++count;
        }
        if (state.distance2(row, col, r-1, c) <= state.attackradius2() &&
            !state.square(r-1, c).is_water()) {
          ++count;
        }
        if (state.distance2(row, col, r, c-1) <= state.attackradius2() &&
            !state.square(r, c-1).is_water()) {
          ++count;
        }
        enemy_ant_score += count*kEnemyKillValue;
      }
    }
    
    (row,col) is the square we’re evaluating (i.e. the start of the BFS). (r,c) is the current square in the BFS.

  • Refining the ground value is actually one of the most important pieces of the algorithm. Ground is less valuable if we have lot of ants “nearby”. The analogy is that each ant is going to “paint” the ground around it. Then we’re going to scale down the ground value depending on how heavily it has been painted. The amount each ant paints will scale linearly depending on how close an ant is to that square. Specifically, paint += 1/(1+d) (where d is the distance to our ant). The ground score will then be scaled by 1/(1+paint). Also, I found that the bots performance is improved when we value ground more highly if it is closer to our hills. Thus, the ground value is also linearly scaled up depending on how close it is to a hill of ours.

  • The search depth dynamically changes throughout the game, depending on how close we get to hitting the time limit. If a search has to stop prematurely, then the search depth is lowered a little bit for the next turn. If we finished with time to spare, then the search depth is increased a little bit for the next turn. I initialized the search depth at 60.

  • The final adjustment is a weird one that I stumbled upon accidentally. This change interestingly launched me from the top 50 to the top 3. The adjustment is that the paint is not reset between turns. Yes, the paint just accumulates over time. This has the effect that ants will avoid areas that have been heavily traveled in the past, even if we do not have many ants there at the moment. The exploration behavior was greatly improved (even though it also results in some really silly behavior at times, such as an ant going all the way down a dead end). I’m sure that some refinements can be made where the paint can gradually fade over time, but I haven’t been able to do these yet.

  • The main areas that need improvement include the following (sadly there isn’t enough time to do them):
    1. Stopping ants from running into each other! I do a very crude filter where ants don’t score squares which are already “taken” by the moves of other ants. However, this approach sometimes results in an ant having nowhere to go!
    2. The exploration logic is still not the strongest, especially on some of the cell maze maps.
    3. Doing some basic enemy pattern recognition. The “combat” case when “self_count == enemy_count” could go either way depending on how aggressive or passive the enemy is. Right now the logic is conservative, as is the case for most of the top bots (even with xathis). There certainly is room to be more aggressive in the 1 ant vs 1 ant case when you recognize a passive opponent.

    August 1, 2011

    Flipping Squares

    Here’s an interesting puzzle/game that I recently learned. Flip any of the squares below any number of times. Every time a square is flipped, all squares directly beside it are also flipped. The goal is to make all 9 squares blue. (Note, the animation may not work in some older browsers)

    If you want even more of a challenge, try the 5 x 5 version:

    In my next post, I’ll share some of my thoughts on this puzzle. It actually leads to some pretty interesting mathematical questions as well as some nifty looking fractals.

    May 25, 2011

    Artificial Intelligence in Poker (2010)

    For the 2010 Computer Poker Competition, my entry (Rockhopper) ended up winning the head’s up limit equilibrium contest! It was undefeated against all other head’s up limit entries:

    Complete 2-player limit results (from the competition website)

    At least for a little while, I can safely say that I have the strongest head’s up limit program in the world!

    After the 2009 competition, I made numerous improvements to Rockhopper:

    • The core game tree size increased by an order of magnitude to 4,465,630,607 nodes. Storing just 3 floats per node (necessary during training) required about 54 GB of space. After the tree had been trained, it could be compressed down to around 13 GB of space (storing only 3 chars per node).
    • Much better utilization of nodes (reallocated resources away from situations where the opponent was almost certainly playing sub-optimally).

    Starting from scratch on a quad-core machine, the algorithm took several weeks to learn the entire tree.

    May 11, 2011

    Artificial Intelligence in Poker (2009)

    In early July of 2009, the University of Alberta ran their annual Computer Poker Competition. There are lots similarities between this contest and the Netflix Competition. Both require highly efficient algorithms (usually using some online-learning approach) and both require compact data structures. One variant of poker, limit head’s up, is quickly becoming a game where computers can beat even the best human players.

    My bot, Rockhopper, had a pretty good showing finishing 4th out of 13 competitors in the equilibrium competition (where you’re scored based on how many other bots you can beat). It beat one of the two Hyberborean bots (by University of Alberta) and lost in its match against the second place bot by an almost insignificant margin.

    The University of Bucharest’s bot (GGValuta) was definitely the strongest one overall and their description (using k-means to cluster nodes in later stages of the hand) has given me a lot of ideas for next year’s competition.

    You can see the full match results of the competition here.

    May 2, 2011

    The Netflix Prize Competition

    It was the fall of 2006, beginning of junior year, when two of my college friends (Lester Mackey and David Weiss) and I decided to enter the Netflix Prize Competition. We had no idea what we were getting ourselves into. This was a machine-learning competition aimed at improving Netflix’s movie recommender system with the winner getting a cool one million dollars!

    One of the first things our team had to do was decide on a name. After a couple hours of brainstorming, we settled on “Dinosaur Planet” (the first movie in the competition dataset). A close second choice was “Alien Hunter” (the last movie in the dataset). After a couple more hours of debate, Dinosaur Planet was formed!

    We set small goals for ourselves in the beginning: implement the movie average algorithm, beat the movie average algorithm, match Netflix’s baseline score, etc.

    The first time we made it onto the leaderboard was cause for celebration! As the weeks went on, we crawled slowly up the ranks: 20th..15th..10th…5th..2nd. We even momentarily made it to first place before BellKor (AT&T Labs) leapt forward to regain the lead (clearly they had been holding back some of their best results).

    With about 48 hours before the first year’s progress prize (of $50k) was to be awarded, here was what the leaderboard looked like: 

    At this point, we had already been in discussions with team Gravity about joining forces to try to overtake BellKor. Gravity shared with us the ingenious idea of adding a controlled amount of noise to our predictions. This allowed our joint team to see how well we could do without revealing our true strength on the leaderboard. This technique would become very popular among all teams over the next couple years of the competition.

    We knew we could beat BellKor’s current score, but we had no idea how much they were holding back. With about 25 hours before the first year’s deadline, we submitted our team’s joint result. Amazingly, just a minute later, BellKor themselves submitted a new result (presumably the best of theirs at the time). The leaderboard (with 24 hours to go) stood as follows:

    It was a race to see which team could improve the most in the next 24 hours. With most of these algorithms needing days to run, it was a frantic search to think of fast and new ways to eke out any improvement. BellKor amazingly managed to get another 0.05% before the deadline and thus, deservingly took the first year’s progress prize.

    Fast-forward another two years and once again, we found ourselves essentially tied for first place with 24 hours to battle it out.

    By this point, we had become part of “The Ensemble”. (The short version of how we got there: Dinosaur Planet merged with team Gravity to form “When Gravity and Dinosaurs Unite”. This team founded the Grand Prize Team (GPT), to try to amass a larger coalition. Soon after, GPT would merge with Vandelay Industries (another big coalition) and Opera Solution to form The Ensemble). Also, the prize this time around was for the entire $1 million!

    Over the next 24 hours, both teams (or more accurately, both coalitions) were able to get another 0.01% improvement. We finished first on the public (visible) leaderboard:

    However, it was the score on the hidden leaderboard that really mattered. On that leaderboard, the two teams were in fact..tied! By contest rules, the winner would be the team that first submitted their solution. Since BellKor’s Pragmatic Chaos submitted 20 minutes ahead of the Ensemble, they were declared the winner. Who would have thought that a three year long machine-learning competition would come down to the final 20 minutes!

    Other links about the contest:

    Netflix Competitors Learn the Power of Teamwork - A NY Times article about Dinosaur Planet and the contest as a whole.

    Statistics Can Find You a Movie - A fascinating recap of the contest written by Team Bellkor.