Skip to content

Battleship Heuristics

Published:

Re­cently, I stum­bled upon a very in­ter­est­ing blog post and a Red­dit thread about an al­go­rithm to play bat­tle­ship in the best pos­si­ble way. The blog post, how­ever, does not ex­plain how prob­a­bil­i­ties are cal­cu­lated. So, I de­cided to give it a try and write my own code for this prob­lem.

Dis­claimer: this is my first ever Python script. Thus, it’s not a mas­ter­piece, and I’m sure there’s room for im­prove­ment.

Basic Code

I started by writ­ing a code that sets the boards and pre­pares the game of human ver­sus the com­puter. I won’t cover this sec­tion in much de­tail, as the code can be seen below. In a nut­shell, the code places 5 ships of sizes 5, 4, 3, 3 and 2 squares on the board, mak­ing sure that they do not over­lap (but they can touch each other). There’s also an input op­tion for the dif­fi­culty, in which the easy mode cor­re­sponds to ran­dom guess­ing by the com­puter, while the hard mode uses the al­go­rithm that we’ll de­velop.

board = []

for x in range(size):
    board.append(["O"] * size)

ships = [5, 4, 3, 3, 2]

def create_ships(board, ships):
    for ship in range(0, len(ships)):
    valid = False
        while not valid:
            row = randint(1, 10 - ships[ship])-1
            col = randint(1, 10 - ships[ship])-1
            orient = randint(0, 1)
                if orient == 0:
                    orientation = "v"
                else:
                    orientation = "h"
    valid = validate(board, ships[ship], row, col, orientation)
    board = place_ship(board, ships[ship], orientation, row, col)
    return board

def place_ship(board, ship, orientation, x, y):
    if orientation == "v":
        for i in range(0, ship):
            board[x+i][y] = 1
        elif orientation == "h":
            for i in range(ship):
                board[x][y+i] = 1
    return board

def validate(board, ship, row, col, orientation):
    if orientation == "v" and row + ship > 10:
        return False
    elif orientation == "h" and col + ship > 10:
        return False
    else:
        if orientation == "v":
            for i in range(0, ship):
                if board[row + i][col] != 0:
                    return False
        elif orientation == "h":
            for i in range(0, ship):
                if board[row][col + i] != 0:
                    return False
    return True

A Sim­ple Al­go­rithm

So far so good! Now the com­puter will make its first guess. Fol­low­ing the idea of the blog post, the al­go­rithm en­ters a hunt mode until it hits a ship. The idea is to re­cur­sively place the ships (both hor­i­zon­tally and ver­ti­cally) on the op­po­nent’s board and see if there is a miss (in­di­cated by an X) in the range of squares cov­ered by that ship in that hy­po­thet­i­cal po­si­tion. If there is no X, then the al­go­rithm sums 1 to all these squares. In the end, we get a ma­trix of num­bers that are stan­dard­ized to a max­i­mum of 1 (and min­i­mum of 0) in which the high­est val­ues in­di­cate a higher prob­a­bil­ity to en­counter a ship in that spot.

import numpy as np

def probability_hunt(board, ships, size, hit):
    prob = np.zeros((size, size))
    for ship in ships:
        verify = []
        verify.append(['O'] * ship)
        for row in range(0, len(board[0])):
            for k in range(0, len(board[0]) - ship + 1):
                if 'X' not in board[row][k:k + ship]:
                    prob[row, k:k + ship] += 1
        for col in range(0, len(board[0])):
            column = []
            for row in range(0, len(board[0])):
                column.append(board[row][col])
            for j in range(0, len(board[0]) - ship + 1):
                if 'X' not in column[j:j + ship]:
                    prob[j:j + ship, col] += 1
    prob = np.divide(prob, np.amax(prob))
    for i in hit:
        prob[i[0], i[1]] = 0.1
    for row in range(0, len(board[0])):
        for i in range(0, len(board[0])):
            if board[row][i] == 'B':
                prob[row, i] = 0
    return prob

The so­lu­tion of work­ing with a board as a list of lists is far from el­e­gant, and numpy ar­rays might have been a bet­ter op­tion. How­ever, to be hon­est, I had al­ready writ­ten this piece of code by the time I thought about this, and it works!

Con­sid­er­ing a stan­dard board of 10x10 squares, let’s see a few ex­am­ples to un­der­stand how the al­go­rithm is work­ing so far. In the be­gin­ning of the game, the com­puter will al­ways guess a square close to the cen­ter of the board, as there’s a higher prob­a­bil­ity that a ship is there (com­pared to the edges). It’s very im­por­tant to con­sider that the ships are placed ran­domly, and a human plac­ing ships might even pre­fer the edges of the board. The func­tion re­turns the fol­low­ing ma­trix:

Matrix showing higher probability of finding a ship near the center of the board

After each in­cor­rect guess, we re­cal­cu­late the ma­trix con­sid­er­ing the new in­for­ma­tion. Below are some ex­am­ples:

Examples of probability matrices after some guesses

When the com­puter fi­nally hits a ship, the al­go­rithm en­ters an at­tack mode. This time, how­ever, I de­cided to spice things up a lit­tle: in the blog post, the op­po­nent should in­form the length of the hit ship and when it sunk. In this ver­sion of the game, how­ever, nei­ther is re­quired. I thought this would just save some cod­ing, but it turned out to make the game much harder.

In the first ver­sion of the al­go­rithm, when the com­puter hits a ship, it en­ters into at­tack mode. This func­tion cal­cu­lates all the pos­si­ble ship con­fig­u­ra­tions in which the hit(s) square(s) is(are) cov­ered by a ship. In the orig­i­nal blog post, the com­puter would know when to switch back to hunt mode — it sim­ply did it when the ship being at­tacked sunk. How­ever, in the ab­sence of this in­for­ma­tion, I adopted the (com­pletely ar­bi­trary yet rea­son­able) heuris­tic of re­turn­ing to hunt mode after 3 con­sec­u­tive misses. The code for this is:

def probability_attack(board, hit, ships, size):
    prob = np.zeros((size, size))
    for ship in ships:
        for row in range(0, len(board[0])):
            for i in hit:
                if i[0] == row:
                    for k in range(i[1] - ship + 1, i[1] + 1):
                        if k >= 0:
                            if 'X' not in board[row][k:k + ship]:
                                    if (k + ship) < len(board[0]):
                                        prob[row, k:k + ship] += 1
        for col in range(0, len(board[0])):
            column = []
            for i in hit:
                if i[1] == col:
                    for k in range(i[0] - ship + 1, i[0] + 1):
                        if k >= 0:
                            for row in range(0, len(board[0])):
                                column.append(board[row][col])
                            if 'X' not in column[k:k + ship]:
                                    if (k + ship) < len(board[0]):
                                        prob[k:k + ship, col] += 1
    for i in hit:
        prob[i[0], i[1]] = 0
    for row in range(0, len(board[0])):
        for i in range(0, len(board[0])):
            if board[row][i] == 'B':
                prob[row, i] = 0
    return prob

This code pro­duces some beau­ti­ful ma­tri­ces, as shown in ex­am­ples below. Again, the com­puter chooses the high­est value in the ma­trix and if there’s a tie the choice is ran­domly sam­pled from the tied high­est val­ues. In order to un­der­stand how this works, let’s go through an ex­am­ple:

Examples of attack mode matrices showing areas surrounding a hit with high probability

Icons made by Freepik from Flati­con are li­censed by Cre­ative Com­mons BY 3.0

While still in hunt mode, the com­puter hit one ship. In the first ma­trix, we can see the es­ti­mated prob­a­bil­i­ties to whether the ad­ja­cent squares also con­tain a ship. The com­puter picks the high­est value and hit the ship again! But in its next move, the com­puter misses. This changes the look of the ma­tri­ces, and the com­puter be­gins to ex­plore down­wards. Again, it’s a hit! How­ever, as we don’t know the size of the ship being hit, the com­puter it­er­a­tively con­sider all ships to es­ti­mate these val­ues; thus, al­though the in­tu­itive ap­proach would be to keep ex­plor­ing down­wards, the com­puter de­cides to go left, as big­ger ships could fit bet­ter in that di­rec­tion com­pared to the small space of 2 squares down­wards. It is, nonethe­less, a miss. This process keeps going until 3 con­sec­u­tive misses. Then, the com­puter goes back to hunt mode, as seen in the last ma­trix.

Fine-​Tuning

The first issue with this ap­proach is al­ready vis­i­ble: the al­go­rithm avoids the edges of the table, mak­ing coun­ter­in­tu­itive shifts in the ori­en­ta­tion of its pre­dic­tions. To tackle this issue, I’ve cre­ated a list vari­able hit, in which the com­puter ap­pends its cor­rect guesses re­cur­sively. The list is emp­tied when re­turn­ing to hunt mode. The code checks the ori­en­ta­tion of the collinear streak of hits and the val­ues in this row or col­umn are mul­ti­plied by 2. The mul­ti­plier is ar­bi­trary and was cho­sen by trial and error, but I’m sure bet­ter ap­proaches could es­ti­mate a more rel­e­vant value.

hit.append([guess_row, guess_col])

if len(hit) > 1 and hit[-2][0] == hit[-1][0]:
    prob[hit[-1][0], :] = np.prod([prob[hit[-1][0], :], 3])
elif len(hit) > 1 and hit[-2][1] == hit[-1][1]:
    prob[:, hit[-1][1]] = np.prod([prob[:, hit[-1][1]], 3])

An­other issue is that the com­puter makes un­nec­es­sary misses when a sit­u­a­tion sim­i­lar to the sixth ma­trix above hap­pens: when a streak of collinear hits has misses in both its ends, it’s very likely that the ship has sunk. Still, ships can­not go over one an­other, but they can (and do) touch each other. There­fore, cre­at­ing a con­straint that does not let the al­go­rithm change the ori­en­ta­tion of its guesses could (and in fact does) have a neg­a­tive im­pact over its per­for­mance.

The pro­posed so­lu­tion is to cre­ate yet an­other prob­a­bil­ity es­ti­mat­ing func­tion. Using again the hit vari­able, the code checks again the ori­en­ta­tion of the collinear streak of hits and if the sum of prob­a­bil­i­ties for the cor­re­spond­ing row or col­umn is 0. This can only hap­pen in a sit­u­a­tion very sim­i­lar to the one posed in the sixth ma­trix of the fig­ure above: there’s a streak of 3 ver­ti­cal hits, but they are fol­lowed by misses in both of its ends. There­fore, the sum of all the val­ues of that col­umn is 0, strongly sug­gest­ing that the ship has sunk or that there’s an­other ship touch­ing the al­ready found one. If this con­di­tion is true, a vari­able named count is set to 1 and the prob­a­bil­ity ma­trix is cal­cu­lated as the av­er­age of the ma­tri­ces re­turned by the probability_hunt and the probability_attack func­tions. In this way, the al­go­rithm re­turns sooner to hunt mode — avoid­ing un­nec­es­sary misses — while keep­ing in part the val­ues cal­cu­lated by the at­tack func­tion.

def probability_mixed(board, hit, count, ships, size):
    prob = probability_attack(board, hit, ships, size)
    prob2 = probability_hunt(board, ships, size, hit)
    count = 0

    if (
        len(hit) > 1 and hit[-2][0] == hit[-1][0]
        and sum(prob[hit[-1][0], :]) == 0
    ):
        count = 1
    elif (
        len(hit) > 1 and hit[-2][1] == hit[-1][1]
        and sum(prob[:, hit[-1][ 1]]) == 0
    ):
        count = 1

    if len(hit) > 1 and hit[-2][0] == hit[-1][0]:
        prob[hit[-1][0], :] = np.prod([prob[hit[-1][0], :], 3])
    elif len(hit) > 1 and hit[-2][1] == hit[-1][1]:
        prob[:, hit[-1][1]] = np.prod([prob[:, hit[-1][1]], 3])

    if np.amax(prob) > 0:
        prob = np.divide(prob, np.amax(prob))

    prob = prob * (1 - (1/2) * count)
    prob2 = prob2 * (1/2) * count
    prob3 = np.add(prob, prob2)
    return prob3

This seems like a rea­son­able heuris­tic that al­lows find­ing touch­ing ships while avoid­ing extra misses. How­ever, it also ex­ac­er­bates an­other prob­lem: while the probability_mixed func­tion is below the 3 misses thresh­old, it can hit an­other ship, un­re­lated to the pre­vi­ous hits, spe­cially when we use the mixed ma­trix. This cre­ates a weird sit­u­a­tion: the com­puter may not leave the at­tack mode for a very long time, mak­ing the hit vari­able ab­surdly long. So, you may have guessed it, we are going to cre­ate yet an­other heuris­tic. It seems rea­son­able that the most re­cent hits should have a big­ger weight when es­ti­mat­ing the ma­trix. How­ever, this can­not be over­done, or a shift in di­rec­tion of guesses would be­come highly un­likely — jeop­ar­diz­ing the al­go­rithm’s per­for­mance. So, an ad­di­tional term which came to my mind is the Eu­clid­ean dis­tance of one hit to the next one. As the dis­tance in­creases, there’s a much higher prob­a­bil­ity that we’re not hit­ting one sin­gle ship, but rather two or more. Thus, join­ing these two ideas — with yet an­other ar­bi­trary value of 1.5 rel­a­tive im­por­tance to the dis­tance com­pared to the index (newer ver­sus older hits), a new em­pir­i­cal ver­sion of the probability_attack func­tion arises:

def probability_attack(board, hit, ships, size):
    prob = np.zeros((size, size))
    for ship in ships:
        for row in range(0, len(board[0])):
            for i in hit:
                if i[0] == row:
                    for k in range(i[1] - ship + 1, i[1] + 1):
                        if k >= 0:
                            if 'X' not in board[row][k:k + ship]:
                                    if (k + ship) < len(board[0]):
                                        prob[row, k:k + ship] += (1\
                                        - 0.1 * (1.5 * distance(hit, i) - hit.index(i)))
        for col in range(0, len(board[0])):
            column = []
            for i in hit:
                if i[1] == col:
                    for k in range(i[0] - ship + 1, i[0] + 1):
                        if k >= 0:
                            for row in range(0, len(board[0])):
                                column.append(board[row][col])
                            if 'X' not in column[k:k + ship]:
                                    if (k + ship) < len(board[0]):
                                        prob[k:k + ship, col] += (1\
                                        - 0.1 * (1.5 * distance(hit, i) - hit.index(i)))
    for i in hit:
        prob[i[0], i[1]] = 0
    for row in range(0, len(board[0])):
        for i in range(0, len(board[0])):
            if board[row][i] == 'B':
                prob[row, i] = 0
    return prob

def distance(hit, i):
    if hit.index(i) == (len(hit) - 1):
        dist = 0.1
        return dist
    else:
        horizontal = i[0] - hit[hit.index(i) + 1][0]
        vertical = i[1] - hit[hit.index(i) + 1][1]
        dist = sqrt(horizontal ** 2 + vertical ** 2)
            return dist

I think there are still two un­solved prob­lems with this code: it does not take into ac­count that ships can­not over­lap. I tried to make two func­tions to de­tect ad­ja­cent ships and take into ac­count this rule, which be­came the ridicu­lously long if state­ments below:

from copy import deepcopy as dc
pboard = dc(board)

def adj_rows(k, row, board, pboard, ship):
    if k > 0:
        if (row > 0 and row < 9 and board[row][k - 1] == 'B'
        and board[row - 1][k - 1] == 'B' and board[row + 1][k - 1] == 'B')
        or (row == 0 and board[row][k - 1] == 'B' and board[row + 1][k - 1] == 'B')
        or (row == 9 and board[row - 1][k - 1] == 'B' and board[row][k - 1] == 'B'):
            pboard[row][k - 1] == 'X'
            pboard[row - 1][k - 1] == 'X'
            pboard[row + 1][k - 1] == 'X'
    if (k + ship) < len(board[0]):
        if (row > 0 and row < 9 and board[row][k + ship] == 'B'
        and board[row - 1][k + ship] == 'B' and board[row + 1][k + ship] == 'B')
        or (row == 0 and board[row][k + ship] == 'B' and board[row + 1][k + ship] == 'B')
        or (row == 9 and board[row - 1][k + ship] == 'B' and board[row][k + ship] == 'B'):
            pboard[row][k + ship] == 'X'
            pboard[row - 1][k + ship] == 'X'
            pboard[row + 1][k + ship] == 'X'
    return pboard

def adj_cols(k, col, board, pboard, ship):
    if k > 0:
        if (col > 0 and col < 9 and board[k - 1][col - 1] == 'B'
        and board[k - 1][col] == 'B' and board[k - 1][col + 1] == 'B')
        or (col == 0 and board[k - 1][col] == 'B' and board[k - 1][col + 1] == 'B')
        or (col == 9 and board[k - 1][col - 1] == 'B' and board[k - 1][col] == 'B'):
            pboard[k - 1][col - 1] = 'X'
            pboard[k - 1][col] = 'X'
            pboard[k - 1][col] = 'X'
    if (k + ship) < len(board[0]):
        if (col > 0 and col < 9 and board[k + ship][col - 1] == 'B'
        and board[k + ship][col] == 'B' and board[k + ship][col + 1] == 'B')
        or (col == 0 and board[k + ship][col] == 'B' and board[k + ship][col + 1] == 'B')
        or (col == 9 and board[k + ship][col - 1] == 'B' and board[k + ship][col] == 'B'):
            pboard[k + ship][col - 1] = 'X'
            pboard[k + ship][col] = 'X'
            pboard[k + ship][col] = 'X'
    return pboard

This hideous code, how­ever, wors­ened the al­go­rithm’s per­for­mance, so the prob­lem re­main un­solved.

The other one is that the al­go­rithm does not take into ac­count the prob­a­bil­ity that a cer­tain ship has al­ready sunk and in­cludes all ships equally in all its cal­cu­la­tions. Some at­tempts to cre­ate an es­ti­ma­tor to weight each ship con­tri­bu­tion did not suc­ceed as well.

Eval­u­at­ing Per­for­mance

The num­ber of turns re­quired to com­plete a game (sink all ships) was recorded for 5000 sim­u­la­tions. I’ve es­ti­mated the em­pir­i­cal cu­mu­la­tive dis­tri­b­u­tion with a Gauss­ian ker­nel for each ver­sion of the code. Ver­sion 1 cor­re­sponds to ran­dom guess­ing. Ver­sion 2 only guesses through the hunt mode; ver­sion 3 in­cludes the first ver­sion of the at­tack mode; ver­sion 4 in­cludes the heuris­tics of the mixed ma­trix; and ver­sion 5 is the final one with the dis­tance heuris­tics.

Performance plot with kernel cumulative probability of turns to complete the game

The im­prove­ment was sub­stan­tial until the third ver­sion, which in­cluded the basic hunt/at­tack al­ter­nat­ing al­go­rithm. How­ever, fur­ther im­prove­ment was very sub­tle. The Hedge’s g ef­fect size es­ti­ma­tor was used to com­pare each ver­sion with the next one, yield­ing the fol­low­ing val­ues: ver­sions 1 vs 2: g=5.062g = 5.062; ver­sions 2 vs 3: g=0.867g = 0.867; ver­sions 3 vs 4: g=0.176g = 0.176; ver­sions 4 vs 5: g=0.051g = 0.051. Thus, it seems that until ver­sions 4 there was sig­nif­i­cant im­prove­ment; how­ever, the change be­tween ver­sions 4 and 5 is very sub­tle, sug­gest­ing that maybe the dis­tance heuris­tic is not as good as it may look.

The Python script for an in­ter­ac­tive ver­sion of Bat­tle­ship im­ple­ment­ing this al­go­rithm is avail­able through this GitHub Repos­i­tory.

This al­go­rithm is not per­fect, and I’m sure there’s room for im­prove­ment. If you have any ideas, please leave a com­ment!


Previous Post
Exploring Fractals With Pytorch