In this post, we review the technique of -approximation, which is a very useful tool in computational geometry to design efficient approximation algorithms. Consider the following problem:
Problem 1 (Range Counting). Given a set of points, we want to preprocess into a structure so that for any query range , for some collection of ranges , we can efficiently count .
It turns out that Problem 1 is very difficult. Efficient (linear space and polyloagrithmic query time) data structures are only possible when is a collection of orthogonal ranges. Even for the case of halfspaces, the current best near-linear space data structure requires roughly query time. So instead, researchers turn to the approximate version of the problem.
Problem 2 (Approximate Range Counting). Given a set of points and an error parameter , we want to preprocess into a structure so that for any query range , for some collection of ranges , we can efficiently count .
The natural idea to solve Problem 2 is to take random samples. It seems natural that the sample size should depend on the error , the complexity of , as well as the size of . (Surprisingly, as we will see, this intuition is somewhat incorrect, the sample size we need does not depend on the set size !) We will model the complexity using the notion of VC-dimensions. More specifically, we consider set systems in the problem. Now we define the notion of -approximations for set systems, which is a synonym to a ''good random sample'' in our setting.
Definition 1 (-approximation). Given a set system , an -approximation for is a set such that
for all .
Note that for a set system of a range counting problem. To approximate for any , it suffices to count and the error will be bounded by .
But how large could be? If we want efficient approximate range counting structures, should not be too large. We now bound the size of for set systems with bounded VC-dimension. We will need the following lemma of the Chernoff bound.
Lemma 1 (Chernoff Bound). Let where each are independent indicator random variable such that and and . Then
We first show a weaker bound. The main idea is to take a random sample of .
Theorem 1. For any set system with VC-dimension for some constant , there is an -approximation for of size .
Proof. We take a random sample of size of for a big enough constant . Consider any . Let be an indicator random variable such that
Note that and so , where . Note that
where we the last inequality follows by picking to be a big enough constant.
Since has VC-dimension , . Then by the union bound, with high probability (), is an -approximation for .
We now show that we can improve the bound in Theorem 1 to . To prove this, we need the following Lemma.
Lemma 2. If is an -approximation of set system and is an -approximation of where is the induced set system of on . Then is an -approximation of .
Assuming Lemma 2, we now prove the following bound for -approximations.
Theorem 2. For any set system with VC-dimension for some constant , there is an -approximation for of size .
Proof. The main idea behind Theorem 2 is to take samples over samples repetitively with increasing . Let , we take -approximation over for for some parameter we set later.
By Theorem 1, the size of each is
Thus . Now we bound the error.
By Lemma 2, the total we get satisfies
Set , we get
Then .
Remark 1. The independence between the -approximation size and the set size is very interesting. We only need a very small sample size even if the ground set is very big, but the sample is still very accurate in terms of approximation range counting purposes.
Remark 2. The size bound of -approximations can be improved for specific set systems. Specifically, for orthogonal ranges, the size can be improved to , and for halfspaces, the size can be improved to [MK'17].
References:
[MK'17] Mustafa, Nabil, and Kasturi Varadarajan. "Epsilon-approximations and epsilon-nets." Handbook of Discrete and Computational Geometry. Chapman and Hall/CRC, 2017.
Comments
Post a Comment