您的当前位置：首页 MIT麻省理工学院算法导开课Problem Set 2 solution

MIT麻省理工学院算法导开课Problem Set 2 solution

来源：微智科技网

Handout 12: Problem Set 2 Solutions From this, we calculate that:

ln 1 − �− .

We then conclude the following:

ln (1/3)ln

�

⎛

k � �

1 − n

ln (1/3) −2/n n ln 3 � .

�

We conclude that Snape’s algorithm is correct only if k = �(n).

(b)󰀃Imagine you are given a bag of n balls. You are told that at least 10% of the balls are󰀂

blue, and no more than 90% of the balls are red. Asymptotically (for large n) how󰀂many balls do you have to draw from the bag to see a blue ball with probability at󰀂least 2/3? (You can assume that the balls are drawn with replacement.)󰀂

Solution: Since the question only asked the asymptotic number of balls drawn, �(1)󰀂(plus some justiﬁcation) is a sufﬁcient answer. Below we present a more complete󰀂answer.󰀂

Assume you draw k balls from the bag (replacing each ball after examining it).󰀂Lemma 3 For some constant k sufﬁciently large, at least one ball is blue with probability 2/3. Proof.

Deﬁne indicator random variables as follows:

Xi =

�

1 if ball i is blue 0 if ball i is red

Notice, then, that Pr Xi = 1 = 1/10 and Pr Xi = 0 = 9/10. We then calculate the probability that at least one ball is blue:

=1 −

k �

Pr (Xi = 0)

⎛k

=󰀃1 −

10 2 � .

�

i=1

Therefore, if k = lg(1/3)/ lg 0.9, the probability of drawing at least one blue ball is at least 2/3.

4 Handout 12: Problem Set 2 Solutions (c) Consider performing a “binary search” on an unsorted list:

BINARY-SEARCH(A,key,left,right) � Search for key in A[left ..right].󰀂1 if left = right󰀂2 then return left󰀂3 else mid ��(left + right)/2�󰀂4 if key < A[mid]󰀂5 then return BINARY-SEARCH(A,key,left,mid −1)󰀂6 else return BINARY-SEARCH(A,key,mid,right)󰀂Assume that a binary search for key1 in A (even though Ais not sorted) returns slot i. Similarly, a binary search for key2 in A returns slot j. Explain why the following fact is true: if i < j, then key1 �key2. Draw a picture. Hint: First think about why this is obviously true if list Ais sorted. Solution:

15 10 25 7 15 20229 7 10 15 16 20 25 22Figure 1: An example of a binary-search decision tree of an unordered array. For the purpose of understanding this problem, think of the decision-tree version of the binary search algorithm. (Notice that unlike sorting algorithms, the decision tree for binary search is relatively small, i.e., O(n) nodes.)

Consider the example in Figure 1, in which a binary search is performed on the unsorted array [9 7 10 15 16 20 25 22]. Assume key1 = 20 and key2 = 25. Both 20 and 25 are � 25, and choose the right branch from the root. At this point, the two binary searches diverge: 20 < 25 and 25 � 25. Therefore key1 takes the left branch and key2 takes the right branch. This ensures that eventually key1 and key2 are ordered correctly.

We now generalize this argument. For key1, let x1,x2,...,xk be the k elements that are compared against key1 in line 4 of the binary search (where k = O(lg n)). (In the example, x1 = 15, x2 = 25, and x3 = 20.) Let y1,y2,...,yt be the t elements compared against key2. We know that x1 = y1. Let �be the smallest number such that x� = y�. (In particular, � > 1.) Since i < j, by assumption, we know that key1 cannot

Handout 12: Problem Set 2 Solutions󰀂branch right while key2 simultaneously branches left. Hence we conclude that

key1 < x�−1 = y�−1 � key2 .

(Note that a relatively informal solution was acceptable for the problem, as we simply asked that you “explain why” this is true. The above argument can be more carefully formalized.)

(d)󰀃Professor Snape proposes a randomized algorithm to determine whether a list is 90%

sorted. The algorithm uses the function RANDOM(1,n) to choose an integer independently and uniformly at random in the closed interval [1,n]. The algorithm is presented below.

IS-ALMOST-SORTED(A,n,k) � Determine if A[1 ..n] is almost sorted.󰀂1 for r � 1 to k󰀂2 do i � RANDOM(1,n) � Pick iuniformly and independently.󰀂3 j � BINARY-SEARCH(A,A[i],1,n)󰀂4 if i =󰀂 j 5 then return false 6 return true

Show that the algorithm is correct if k is a sufﬁciently large constant. That is, with k chosen appropriately, the algorithm always outputs true if a list is correctly sorted and outputs false with probability at least 2/3 if the list is not 90% sorted.

Solution: Overview: In order to show the algorithm correct, there are two main lemmas that have to be proved: (1) If the list Ais sorted, the algorithm always returns true; (2) If the list A is not 90% sorted, the algorithm returns false with probability at least 2/3. We begin with the more straightforward lemma, which essentially argues that binary search is correct. We then show that if the list is not 90% sorted, then at least 10% of the elements fail the “binary search test.” Finally, we conclude that for a sufﬁciently large constant k, if the list is not 90% sorted, then the algorithm will output false with probability at least 2/3.

Lemma 4 If the list Ais sorted, the algorithm always returns true.

Proof. This lemma follows from the correctness of binary search on a sorted list, which was shown in recitation one. The invariant is that key is in array Abetween left and right.

For the rest of this problem, we label the elements as “good” and “bad” based on whether they pass the binary sort test.

label(i) =

�

good if i = BINARY-SEARCH(A,A[i],1,n)bad if i = BINARY-SEARCH(A,A[i],1,n)

6 Handout 12: Problem Set 2 Solutions Notice that it is not immediately obvious which elements are good and which elements are bad. In particular, some elements may appear to be sorted correctly, but be bad because of other elements being missorted. Similarly, some elements may appear entirely out of place, but be good because of other misplaced elements. A key element of the proof is showing that a badly sorted list has a lot of bad elements.

Lemma 5 If the list Ais not 90% sorted, then at least 10% of the elements are bad. Proof. Assume, by contradiction, that fewer than 10% of the elements are bad. Then, at least 90% of the elements are good. Recall the deﬁnition of a 90% sorted list: if 10% of the elements are removed, then the remaining elements are in sorted order. Therefore, remove all the bad elements from the array. We now argue that the remaining elements are in sorted order. Consider any two of the remaining good elements, key1 and key2, where key1 is at index i and key2 is at index j. If i < j, then Part(c) shows that key1 � key2. Similarly, if j < i, then Part(c) shows that key2 � key1. That is, the two elements are in correctly sorted order. Since all pairs of elements are in sorted order, the array of good elements is in sorted order.

Once we have shown that there are a lot of bad elements, it remains to show that we ﬁnd a bad element through random sampling.

Lemma 6 If the list Ais not 90% sorted, the algorithm returns false with probability at least 2/3.

Proof. From Lemma 5, we know that at least 10% of the elements are bad. From Part(b), we know that if we choose k > lg(1/3)/lg 0.9, then with probability 2/3 we ﬁnd a bad element. Therefore, we conclude that the algorithm returns false with probability at least 2/3.

(e)󰀃Imagine instead that Professor Snape would like to determine whether a list is 1 − �󰀂

sorted for some 0 <�<1. (In the previous parts �= 0.10.) For large n, determine the󰀂appropriate value of k, asymptotically, and show that the algorithm is correct. What is󰀂the overall running time?󰀂

Solution: Lemma 4 is the same as in Part(d). A simple modiﬁcation of Lemma 5󰀂shows that if the array is not (1−�)-sorted, then there must be at least �nbad elements;󰀂otherwise, the remaining (1 − �)nelements would form a (1 − �)-sorted list. Finally,󰀂it remains to determine the appropriate value of k󰀂In this case, we want to choose ksuch that󰀂

(1 − �)k �

3󰀂

We choose k= c/�, then we can conclude (using CLRS 3.11) that:󰀂

(1 − �)

� �

�

�⎛c

(1 − �)1 e

�1/� c

Handout 12: Problem Set 2 Solutions We therefore conclude that if k = �(1/�), the algorithm will ﬁnd a bad element with probability at least 2/3. The running time of the algorithm is O(lg n/�). Problem 2-2. Sorting an almost sorted list.

On his way back from detention, Harry runs into his friend Hermione. He is upset because Professor Snape discovered that his sorting spell failed. Instead of sorting the papers correctly, each paper was within k slots of the proper position. Hermione immediately suggests that insertion sort would have easily ﬁxed the problem. In this problem, we show that Hermione is correct (as usual). As before, A[1 ..n] in an array of ndistinct elements.

(a)󰀃First, we deﬁne an “inversion.” If i < j and A[i] > A[j], then the pair (i,j) is called an󰀂

inversion of A. What permutation of the array {1,2,...,n} has the most inversions?󰀂How many does it have?󰀂

Solution: The permuation {n,n− 1,...,2,1} has the largest number of inversions. � � It has n= n(n− 1)/2 inversions. 2 (b)󰀃Show that, if every paper is initially within k slots of its proper position, insertion󰀂sort runs in time O(nk). Hint: First, show that INSERTION-SORT(A) runs in time󰀂O(n+ I), where I is the number of inversions in A.󰀂

Solution: Overview: First we show that INSERTION-SORT(A) runs in time O(n+ I), where I is the number of inversions, by examining the insertion sort algorithm. Then we count the number of possible inversions in an array in which every element is within k slots of its proper position. We show that there are at most O(nk) inversions. Lemma 7 INSERTION-SORT(A) runs in time O(n + I), where I is the number of inversions in A.

Proof. Consider an execution of INSERTION-SORT on an array A. In the outer loop, there is O(n) work. Each iteration of the inner loop ﬁxes exactly one inversion. When the algorithm terminates, there are no inversions left. Hence, there must be I iterations of the inner loop, resulting in O(I) work. Therefore the running time of the algorithm is O(n+ I).

We next count the number of inversions in an array in which every element is within k slots of its proper position.

Lemma 8 If every element is within k slots of its proper position, then there are at most O(nk) inversions.

Proof. We provide an upper bound on the number of inversions. Consider some particular element, A[i]. There are at most 4k elements that can be inverted with A[i],

Handout 12: Problem Set 2 Solutions (The last step follows because of our assumption that k � n/2.)

(d)󰀃Devise an algorithm that matches the lower bound, i.e., sorts a list in which each paper󰀂

is within k slots of its proper position in �(nlg k) time. Hint: See Exercise 6.5-8 on󰀂page 142 of CLRS.󰀂

Solution: For the solution to this problem, we are going to use a heap. We assume that we have a heap with the following subroutines:

•󰀃MAKE-HEAP() returns a new empty heap.

•󰀃INSERT(H,key,value) inserts the key/value pair into the heap.

•󰀃EXTRACT-MIN(H) removes the key/value pair with the smallest key from the heap, returning the value.

First, consider the problem of merging tsorted lists. Assume we have lists A1,...,At, each of which is sorted. We use the following strategy (pseudocode below): 1. Make a new heap, H.

2. For each of the t lists, insert the ﬁrst element into the list. For list i, perform INSERT(H,At[1],i). 3. Repeat ntimes:

(a) Remove the smallest element from the heap using EXTRACT-MIN(H). Let v

be the value returned, which is the identity of the list.

(b) Put the extracted element from list v in order in a new array. (c) Insert the next element from the list v. The key invariant is to show that after every iteration of the loop, the heap contains the smallest element in every list. (We omit a formal induction proof, as the question only asked you to devise an algorithm.) Notice that each EXTRACT-MIN and INSERT operation requires O(lg k) time, since there are never more than 2k elements in the heap. The loop requires only a constant amount of other work, and is repeated n times, resulting in O(nlg k) running time.

In order to apply this to our problem, we consider A as a set of sorted lists. In particular, notice that A[i] < A[i+ 2k+ 1], for all i � n− 2k− 1: the element at slot ican at most move forwards during sorting to slot i+ k and the element at slot i+ 2k + 1 can at most move backwards during sorting to slot i+ k + 1.

For the moment, assume that n is divisble by 2k. We consider the the t =2k lists deﬁned as follows:

A1 = A[1],A[2k+ 1],A[4k+ 1],...A[n− (2k− 1)]

�󰀃�󰀃

�

�󰀃

� �

A2 = A[2],A[2k+ 2],A[4k+ 2],...A[n− (2k− 2)] ...At = A[2k],A[4k],A[6k],...,A[n]

10 Handout 12: Problem Set 2 Solutions Each of these lists is sorted, and each is of size � n. Therefore, we can sort these lists󰀂in O(nlg k) time using the procedure above.󰀂We now present the more precise pseudocode:󰀂

SORT-ALMOST-SORTED(A,n,k) � Sort Aif every element is within k slots of its proper position.󰀂1 H � MAKE-HEAP()󰀂2 for i � 1 to 2k󰀂3 do INSERT(H,A[i],i)󰀂4 for i � 1 to n󰀂5 do j � EXTRACT-MIN(H)󰀂6 B[i] � A[j]󰀂7 if j + 2k � n󰀂8 then INSERT(H,A[j + 2k],j)󰀂9 return B

Recall that a heap is generally used to store a key and its associated value, even though󰀂we often ignore the value when describing the heap operations. In this case, the value󰀂is an index j, while the key is the element A[j]. As a result, the heap returns the index󰀂of the next smallest element in the array.󰀂

Correctness and performance follow from the argument above.󰀂

Notice that there is a second way of solving this problem. Recall that we already󰀂know how to merge two sorted lists that (jointly) contain n elements in O(n) time. It󰀂is possible, then to merge the lists in a tournament. We give an example for k =8,󰀂where A+ B means to merge lists Aand B:󰀂

Round 1: (A1 + A2), (A3 + A4), (A5 + A6), (A7 + A8) Round 2: (A1 + A2 + A3 + A4), (A5 + A6 + A7 + A8) Round 3: (A1 + A2 + A3 + A4 + A5 + A6 + A7 + A8)

Notice that there are lg k merge steps, each of which merges n elements (dispersed󰀂through up to k lists) and hence has a cost of O(n). This leads to the desired running󰀂time of O(nlg k).󰀂

Problem 2-3. Weighted Median.

For ndistinct elements x1,x2,...,xn with positive weights w1,w2,...,wn such that the weighted (lower) median is the element xk satisfying

� 1

wi <

2xi1 wi � .

2xi>xk

�

�n

i=1 wi = 1,

Handout 12: Problem Set 2 Solutions󰀂(a) Argue that the median of x1,x2,...,xn is the weighted median of x1,x2,...,xn with

weights wi =1/nfor i =1,2,...,n.

Solution: Let xk be ⎠the of x1,x2,...,xn. By the deﬁnition of median, xk 󰀃 median⎝

is larger than exactly n+1− 1 other elements xi. Then the sum of the weights of 2 elements less than xk is

��

�

⎛

xi�

wi󰀃=

= � < <

1 n+1 ·− 1 n 2� � 1󰀃n− 1 · n 2 n− 1 2n n 2n 1 2

Since all the elements are distinct, xk is also smaller than exactly n − elements. Therefore

�

�⎛

⎠󰀃

n+1 2

⎝

other

xi>xk

�

1 n+1 = · n−

n 2

� � 1󰀃n+1

=1−·

n 2 � ⎛� ⎛1 n

� 1−

n 2 1 �

Therefore by the deﬁnition of weighted median, xk is also the weighted median. (b)󰀃Show how to compute the weighted median of n elements in O(nlgn) worst-case

time using sorting.

Solution: To compute the weighted median of n elements, we sort the elements and then sum up the weights of the elements until we have found the median.

14 Handout 12: Problem Set 2 Solutions To prove this algorithm is correct, we show that the following precondition holds for every recursive call: the weighted median y of the initial A is always present in the recursive calls of A, and l is the total weight of all elements xi less than all the the elements of A. This precondition is trivially true for the initial call. We prove that the precondition is also true in every recursive call by induction. Assume for induction that the precondition is true. First let us consider the case in which l+wB > 1. Since 2

y must be in A, at line 14 y must be either in B or C. Since the total weight of all elements less than any element in C is greater than 1/2, then by deﬁnition the weighted median cannot be in C, so it must be in B. Furthermore, we have not discarded any elements less than any element in B, so l is correct and the precondition is satisﬁed.

on line 14, then y must be in C. All elements of C are greater than all If l+wB � 12

elements of B, so the total weight of the elements less than the elements of C is l+wB and the precondition of the recursive call is also satisﬁed. Therefore by induction the precondition is always true.

This algorithm always terminates because the size of A decreases for every recursive call. When the algorithm terminates, the result is correct. Since the weighted median is always in A, then when only one element remains it must be the weighted median. The algorithm runs in �(n)time. Computing the median and splitting A into B and C takes �(n)time. Each recursive call reduces the size of the array from n to �n/2�. Therefore the recurrence is T(n)=T(n/2)+�(n)=�(n).

(d) The post-ofﬁce location problem is deﬁned as follows. We are given npoints p1,p2,...,pn

with associated weights w1,w2,...,wn. We wish to ﬁnd a point p(not necessarily one

�

of the input points) that minimizes the sum ni=1 wi d(p,pi), where d(a,b)is the dis-tance between points aand b.

Argue that the weighted median is a best solution for the one-dimensional post-ofﬁce location problem, in which points are simply real numbers and the distance between points aand b is d(a,b)=|a− b|.

Solution:

We argue that the solution to the one-dimensional post-ofﬁce location problem is the weighted median of the points. The objective of the post-ofﬁce location problem is to choose pto minimize the cost

c(p)=

n �i=1

wid(p,pi)

We can rewrite c(p)as the sum of the cost contributed by points less than pand points

greater than p:

c(p)

� � � � � � ⎞ � ⎞ = wi(p− pi)+wi(pi − p)�

pipi>p

16 Handout 12: Problem Set 2 Solutions �g

Notice also that �pdoes not depend on the y coordinates of the input points and has x

exactly the same form as dp

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文

MIT麻省理工学院 算法导开课Problem Set 2 solution

MIT麻省理工学院算法导开课Problem Set 2 solution