Thursday, 10 May 2007

Using Hostip Berkeley DB Files From Python

I needed to process some IP addresses the other day to associate them with their originating Country. I implemented the code in Python (as usual), and decided to try out Hostip for the IP address to Country mappings. Hostip provides freely downloadable bdb data files (along with a mysql db and a CSV file). Using the bdb files isn't straightforward, so here is a general recipe for consuming them:

  • Download the bdb files from hostip. There are three files, hip_ip4_country.db (maps IP to country id), hip_countries.db (maps country id to country name), and hip_ip4_city_lap_lng.db (maps IP to city location).

  • Open the files with Python. They're bdb btree format files, so I found it easiest to open them with bsddb.btopen(FILENAME, 'r')

  • Convert your IP address to the correct format. Hostip bdb entries are keyed on class C block addresses, in integer format. The following code should do it:

    def dottedQuadToNum(ip):
    "convert decimal dotted quad string to long integer"
    return struct.unpack('!L',socket.inet_aton(ip))[0]
    ip = "255.255.255.255"
    ipparts = ip.split('.')
    ipparts = ipparts[:3]
    ipparts.append('0')
    cclass = '.'.join(ipparts)
    intcclass = dottedQuadToNum(cclass)

    dottedQuadToNum from from ASPN.

  • Now, you can use the string representation of intcclass to get a country id from hip_ip4_counties.db, and use the id to get a country name from hip_countries.db

1 comment:

  1. Cool, I didn't know about this.

    It doesn't seem too accurate though. I tried 69.31.32.151, 69.31.32.152 and 69.31.32.153, and it gave me a different (and incorrect) state for each... although admittedly it almost got my office address right.

    Pretty hit-or-miss though.

    ReplyDelete