InterLap does fast interval overlap testing with a simple python data structure.
It works well on the types of querying done in genomic datasets where we have 10’s of thousands of intervals and we check for overlap millions of times. It is very simple and has no dependencies.
It takes tuples or lists where the first 2 elements are start, end and the remaining elements can be anything.
>>> from interlap import InterLap
>>> inter = InterLap()
Here create 10K random intervals:
>>> import random
>>> sites = random.sample(range(22, 100000000, 12), 50000)
>>> ranges = [(i, i + random.randint(2000, 20000)) for i in sites]
Add them to the interval tree (this takes < 0.5 seconds):
>>> inter.update(ranges)
We can also add one at a time:
>>> inter.add((20, 22, {'info': 'hi'}))
Now do overlap testing:
>>> [20, 21] in inter
True
>>> next(inter.find((21, 21)))
(20, 22, {'info': 'hi'})
>>> inter.add((2, 3, {'info': 'hello'}))
NOTE: below shows how edge-cases are handled.
>>> list(inter.find((2, 2)))
[(2, 3, {'info': 'hello'})]
>>> list(inter.find((3, 3)))
[(2, 3, {'info': 'hello'})]
Test every item in the InterLap. These 50K queries take < 0.5 seconds:
>>> for s, e in ranges:
... assert (s, e) in inter