Find all intervals that overlap with a given interval
Note: I moved this question from stackoverflow.com
I have an algorithmic problem where I would like to see if it can be solved in better than $O(n)$:
I have given a table $T$ of $n$ elements where each element is a tuple $(s_i, e_i)$ with $s_i, e_i \in \mathbb{N}$ and $s_i < e_i$, i.e. each tuple is some kind of an interval. I have to find all intervals that overlap with a given interval $[t_0, t_1]$ with $t_0, t_1 \in \mathbb{N}$ and $t_0 < t_1$. Further, I have available two sorted lists $S$ and $E$, containing the $s$ values, or $e$ values respectively, together with the index $i$ pointing to the respective entry in $T$. The lists are sorted by $s$ values, or $e$ values respectively. (Let's assume both, $s$ and $e$ values, are unique.)
Problem:
We have to find each interval/tuple $(s_i, e_i) \in T$ where $s_i \leqslant t_1$ and $e_i \geqslant t_0$.
My thoughts so far:
We can exclude some elements by either applying one of the interval boundaries, i.e. searching $t_1$ in $S$ or $t_0$ in $E$. This gives us a list $L$ of remaining elements: $$ L \leftarrow \{e \in E \mid e \geqslant t_0\} \text{ or } L \leftarrow \{s \in S \mid s \leqslant t_1\} $$ However, there is no lower bound on the number of elements in $L$, no matter which search we perform. Further, we have to check every element in $L$ if $s \leqslant t_1$, or $e \geqslant t_0$ respectively depending on which search we performed before.
The complexity for this solution is $O(n)$.
However, let's say that $k$ is the maximum number of elements overlapping with interval $[t_0, t_1]$. If we assume $k \ll n$, then the complexity is $O(n/2)$ since we can exclude at least $n/2$ elements by choosing the appropriate search for $L$. Still $O(n/2)$ is in $O(n)$.
Can you think of a better approach to solve this problem?
For record:
The complexity for finding all intervals overlapping with a certain given interval using an interval tree is $O(\log n + k)$ where $k$ is the number of overlapping intervals. However, in my practical case I am using a MySQL database which provides index trees for each value, i.e. $s$ and $e$, separately. This way I can not find overlapping intervals in less than $O(n)$. I would need to create an interval tree which is a search tree that stores both interval boundaries, i.e. $s$ and $e$, in a single data structure. The complexity for constructing the interval tree is $O(n \log n)$. [http://www.dgp.utoronto.ca/people/JamesStewart/378notes/22intervals/]
Asked By : sema
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/14311
Answered By : D.W.
I believe that interval trees offer a solution to your problem. Basically, you store your intervals in the interval tree data structure; then, to find all intervals that overlap with $[t_0,t_1]$, you do a query into the interval tree. This should solve your problem and run faster than $O(n)$ time.
Post a Comment