Ronie Martinez

May 09, 2017

The Free And Simple Way To Know Who Visits Your Site

How does IP address to country lookup works?

UPDATE: (May 20, 2017) IPToCC has been submitted to PyPi. You can also find it on Github.

Regional Internet Registry

This Wikipedia article explains everything about the Regional Internet Registry.

TLDR: These are organizations that controls the allocation of IP addresses and decides which will be assigned to which country. These organizations are:

RIR Statistics Exchange Format

Each RIR stores a list of IP addresses they allocate to every country they were in-charge with and these lists are accessible to anyone... for FREE! These are stored on a delimiter-separated values format that follows a standard called RIR Statistics Exhange Format and can be found on these FTP sites:


Building a database

Why build a database? You can read delimiter-separated values directly - SURE! But you cannot use these files efficiently - no filtering, and are bloated with additional information that are not useful to you. Basically, the most important columns are:

  • country code
  • type
  • start
  • value

I used Python csv module to parse these files and SQLAlchemy to store the data on a SQLite database rir_statistics_exchange.db. Filter and store only the entries that have the allocated and assigned status. This is how my model look like:

#!/usr/bin/env python
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Record(Base):
    __tablename__ = 'record'

    id = Column(Integer, nullable=False, primary_key=True)
    country_code = Column(String(2), nullable=False)
    type = Column(String, nullable=False)
    start = Column(String, nullable=False)
    value = Column(Integer, nullable=False)

    def __repr__(self):
        return '<Record {} {} {} {} {}>'.format(self.registry, self.country_code, self.type, self.start, self.value)

IPv4 and IPv6 Lookup

If you have read the RIR Statistics Exhange Format carefully, you will see that the value column for IPv4 and IPv6 addresses stands for different values. For IPv4, it is the number of hosts from start value, while for IPv6, it is the CIDR prefix length.

#!/usr/bin/env python
import ipaddress
import os

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

from models import Record

dir_path = os.path.dirname(os.path.realpath(__file__))
engine = create_engine("sqlite:///{}".format(os.path.join(dir_path, 'rir_statistics_exchange.db'))) 
Session = sessionmaker(bind=engine)
session = Session()

def get_ipv4_country_code(ip_address):
    for record in session.query(Record).filter_by(type='ipv4'):
        start_address = ipaddress.IPv4Address(record.start)
        if start_address <= ip_address < start_address + record.value:
            return record.country_code
    return None

def get_ipv6_country_code(ip_address):
    for record in session.query(Record).filter_by(type='ipv6'):
        network = ipaddress.IPv6Network('{}/{}'.format(record.start, record.value))
        if ip_address in network:
            return record.country_code
    return None

def get_country_code(ip_address):
    if type(ip_address) is str:
        ip_address = ipaddress.ip_address(ip_address)
    if type(ip_address) is ipaddress.IPv4Address:  # IPv4
        return get_ipv4_country_code(ip_address)
    else:  # IPv6
        return get_ipv6_country_code(ip_address)

Other failed experiments

I had this idea that what if I expand the IP addresses (generate all addresses given start and value columns) before storing them to the database so that the lookup will be faster?. To satisfy my curiosity, I actually wrote the code to only find out that building the database is taking a lot longer and the database file size is increasing. Yep, I might have overestimated what my machine can do. :|