Epsilon-approximations

In this post, we review the technique of ε-approximation, which is a very useful tool in computational geometry to design efficient approximation algorithms. Consider the following problem:

Problem 1 (Range Counting). Given a set P of n points, we want to preprocess P into a structure so that for any query range γΓ, for some collection of ranges Γ, we can efficiently count |PΓ|.

It turns out that Problem 1 is very difficult. Efficient (linear space and polyloagrithmic query time)  data structures are only possible when Γ is a collection of orthogonal ranges. Even for the case of halfspaces, the current best near-linear space data structure requires roughly O(n11/d) query time. So instead, researchers turn to the approximate version of the problem.

Problem 2 (Approximate Range Counting). Given a set P of n points and an error parameter ε, we want to preprocess P into a structure so that for any query range γΓ, for some collection of ranges Γ, we can efficiently count (1+ε)|PΓ|.

The natural idea to solve Problem 2 is to take random samples. It seems natural that the sample size should depend on the error ε, the complexity of Γ, as well as the size of P. (Surprisingly, as we will see, this intuition is somewhat incorrect, the sample size we need does not depend on the set size |P|!) We will model the complexity Γ using the notion of VC-dimensions. More specifically, we consider set systems (P,Γ) in the problem. Now we define the notion of ε-approximations for set systems, which is a synonym to a ''good random sample'' in our setting.

Definition 1 (ε-approximation). Given a set system S=(S,R), an ε-approximation for S is a set AS such that

|RAARSS|ε,

for all RR.

Note that for a set system Γ=(P,Γ) of a range counting problem. To approximate γP for any γΓ, it suffices to count γA and the error will be bounded by (1+ε)|γP|.

But how large could A be? If we want efficient approximate range counting structures, A should not be too large. We now bound the size of A for set systems S=(S,R) with bounded VC-dimension. We will need the following lemma of the Chernoff bound.

Lemma 1 (Chernoff Bound). Let X=i=1nXi where each Xi are independent indicator random variable such that Pr[Xi=1]=p and Pr[Xi=0]=1p and μ=E[X]. Then

Pr[|Xμ|ε]eΘ(ε2/μ).

We first show a weaker bound. The main idea is to take a random sample of S.

Theorem 1. For any set system S=(S,R) with VC-dimension c for some constant c, there is an ε-approximation A for S of size O(1ε2lnn).

Proof. We take a random sample A={A1,,Ar} of size r=c1ε2log1ε of S for a big enough constant c. Consider any RR. Let Xi be an indicator random variable such that

Xi={1, if AiR0, otherwise.

Note that Pr[Xi=1]=|RS||S| and so μ=E[X]=|A||RS||S|, where X=i=1rXi. Note that

Pr[||RA||A||RS||S||ε]=Pr[||RA||A||RS||S|||A|ε]=Pr[||RA|μ||A|ε]eΘ((|A|ε)2/μ)=eΘ((|A|ε)2/μ)=eΘ(|S|logn|RS|)eΘ(lnn)n2c,

where we the last inequality follows by picking c to be a big enough constant.

Since S has VC-dimension c, |R|<nc. Then by the union bound, with high probability (1nc), A is an ε-approximation for S

We now show that we can improve the bound in Theorem 1 to O(1ε2ln1ε). To prove this, we need the following Lemma.

Lemma 2. If A1 is an ε-approximation of set system S and A2 is an ε-approximation of A1 where A1=(A1,{RA1:RR}) is the induced set system of S on A1. Then A2 is an (ε+ε)-approximation of S.

Assuming Lemma 2, we now prove the following bound for ε-approximations.

Theorem 2. For any set system S=(S,R) with VC-dimension c for some constant c, there is an ε-approximation A for S of size O(1ε2log1ε).

Proof. The main idea behind Theorem 2 is to take samples over samples repetitively with increasing ε. Let A0=S, we take εi=ln|Ai1|/2|Ai1|/2-approximation Ai over Ai1 for i=1,2,,logεt for some parameter t we set later. 

By Theorem 1, the size of each Ai is

|Ai|=1εiln|Ai1|=|Ai1|/2ln|Ai1|/2ln|Ai1||Ai1|2.

Thus |At|=O(n/2t). Now we bound the error.

By Lemma 2, the total εtotal we get satisfies

εtotal=i=1tεi=O(i=1tlnn/2in/2i)=O(lnn/2tn/2t).

Set lnn/2tn/2t=O(ε), we get

t=O(logε2logε1n).

Then |At|=O(1ε2ln1ε).

Remark 1. The independence between the ε-approximation size and the set size is very interesting. We only need a very small sample size even if the ground set is very big, but the sample is still very accurate in terms of approximation range counting purposes. 

Remark 2. The size bound of ε-approximations can be improved for specific set systems. Specifically, for orthogonal ranges, the size can be improved to O~(1ε), and for halfspaces, the size can be improved to O~(ε2d/(d+1)) [MK'17].


References:

[MK'17] Mustafa, Nabil, and Kasturi Varadarajan. "Epsilon-approximations and epsilon-nets." Handbook of Discrete and Computational Geometry. Chapman and Hall/CRC, 2017.

Comments

Popular Posts