Find all intervals that overlap with a given interval

Question Detail: 

Note: I moved this question from stackoverflow.com

I have an algorithmic problem where I would like to see if it can be solved in better than $O(n)$:

I have given a table $T$ of $n$ elements where each element is a tuple $(s_i, e_i)$ with $s_i, e_i \in \mathbb{N}$ and $s_i < e_i$, i.e. each tuple is some kind of an interval. I have to find all intervals that overlap with a given interval $[t_0, t_1]$ with $t_0, t_1 \in \mathbb{N}$ and $t_0 < t_1$. Further, I have available two sorted lists $S$ and $E$, containing the $s$ values, or $e$ values respectively, together with the index $i$ pointing to the respective entry in $T$. The lists are sorted by $s$ values, or $e$ values respectively. (Let's assume both, $s$ and $e$ values, are unique.)

Problem:

We have to find each interval/tuple $(s_i, e_i) \in T$ where $s_i \leqslant t_1$ and $e_i \geqslant t_0$.

My thoughts so far:

We can exclude some elements by either applying one of the interval boundaries, i.e. searching $t_1$ in $S$ or $t_0$ in $E$. This gives us a list $L$ of remaining elements: $$ L \leftarrow \{e \in E \mid e \geqslant t_0\} \text{ or } L \leftarrow \{s \in S \mid s \leqslant t_1\} $$ However, there is no lower bound on the number of elements in $L$, no matter which search we perform. Further, we have to check every element in $L$ if $s \leqslant t_1$, or $e \geqslant t_0$ respectively depending on which search we performed before.

The complexity for this solution is $O(n)$.

However, let's say that $k$ is the maximum number of elements overlapping with interval $[t_0, t_1]$. If we assume $k \ll n$, then the complexity is $O(n/2)$ since we can exclude at least $n/2$ elements by choosing the appropriate search for $L$. Still $O(n/2)$ is in $O(n)$.

Can you think of a better approach to solve this problem?

For record:

The complexity for finding all intervals overlapping with a certain given interval using an interval tree is $O(\log n + k)$ where $k$ is the number of overlapping intervals. However, in my practical case I am using a MySQL database which provides index trees for each value, i.e. $s$ and $e$, separately. This way I can not find overlapping intervals in less than $O(n)$. I would need to create an interval tree which is a search tree that stores both interval boundaries, i.e. $s$ and $e$, in a single data structure. The complexity for constructing the interval tree is $O(n \log n)$. [http://www.dgp.utoronto.ca/people/JamesStewart/378notes/22intervals/]

Asked By : sema
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/14311

Answered By : D.W.

I believe that interval trees offer a solution to your problem. Basically, you store your intervals in the interval tree data structure; then, to find all intervals that overlap with $[t_0,t_1]$, you do a query into the interval tree. This should solve your problem and run faster than $O(n)$ time.

No comments

Powered by Blogger.