Hacked Exam (8pts, 6pts, 25pts) Google Code Jam 2021 round 1A

 



Hacked Exam (8pts, 6pts, 25pts)


 CODE:


In this problem we want to maximize our expected score. Let pq be the probability of question q's answer being T, and let QT and QF be the sets of questions we answer with T and F, respectively. By linearity of expectation, the expected score to maximize can be written as

(qQTpq)+qQF(1pq).
Notice that linearity of expectation can be used regardless of any conditional probability between different questions. In this case, those conditional probabilities can be quite complicated, so being able to ignore them and treat each question independently despite them not being independent in the probability theory sense simplifies the solution process significantly. This is a trick that is important in many problems about expected values.

We operate with fractions throughout most of the proposed solutions. As noted in the sample, the needed numbers may exceed 64 bits for Test Set 3, and potentially for Test Set 2 as well, depending on the exact implementation of the algorithm and the fraction operations. It is absolutely possible to solve Test Set 2 with simply 64-bit integers and Test Set 3 with just 128-bit integers. Our most popular languages all support this in some way: C and C++ have __int128 support, Java and C# have BigInteger, JavaScript has BigInt, Bash has bc, and Python, Ruby and Haskell have native long integer support. We usually strive to make most of our problems solvable with just 64-bit integer arithmetic, but in this case, limiting the number of questions so much would allow suboptimal solutions in fast languages to pass Test Set 3.

Notice that we can have information of 1 or 2 other students in the first two test sets, and also 3 in the last test set. We could solve each number of students independently, but there is no need for that. Adding a student with the same answers and score as a student in the input results in a completely equivalent case, so we can always assume that there are a maximum number of students by copying any student the needed number of times.

Test Set 1

In Test Set 1, the number of questions Q is small enough that we can simply enumerate all possible 2Q sequences of answers, and filter out those who are inconsistent with the input (i.e., those for which one of the students would obtain a different score than they actually got). From the consistent ones, we can estimate the pqs above as the ratio between the number of sequences that answer T to question q, over the total. Then, we can simply choose to answer T to those questions with pq>12 and F to those with pq<12. We can answer the questions with pq=12 either way.

Test Set 2

In Test Set 2 Q is large, so we cannot enumerate the sequences of answers. We can, on the other hand, figure out the probabilities pq in a different way, and then proceed as before with choosing the answers by comparing those values to 12.

An insight-based solution

One way to solve Test Set 2 is by splitting the questions into types. If two questions q1 and q2 received the same answer from each student, then by symmetry pq1=pq2. Then, let pab be equal to the probability of a question's answer being T given that the first student answered a and the second student answered b to it. By the first observation, each pq is equal to one of the 4 values pTTpTFpFT or pFF. Moreover, by the symmetry of complementing answers, pTT=1pFF and pTF=1pFT. Therefore, we can express every pq as a linear function of up to two variables pTT and pTF. That means we can express the expected score of both students as linear functions on those two variables too. Given that their expected score should match their actual score, that gives us two equations with two unknowns. We can derive the real values of pTT and pTF from that system of equations. With every pq calculated, we can simply choose, for each question, an answer that has maximum probability, as in the solution for Test Set 1.

Notice that there are 4 possible cases depending on how our two variables compare with 12 (if one is exactly 12 that means two cases are equivalent and if both are 12 then all cases are equivalent). The 4 cases exactly match with either answering the same as one of the students, or answering the opposite of one of the students. We can use this observation to greatly simplify the implementation as the score of a sequence given by a student is given to us, and the score of the complement of the answers of student i is QSi, so we can easily obtain the scores of the 4 options and pick a highest one.

A more competitive-programming-standard solution

In case of having 2 students, we can use dynamic programming to calculate the probability of each question having a particular answer. We compute the recursive function f(s1,s2,q) defined as "how many ways are there to answer questions q,q+1,q+2,,Q such that student 1 gets exactly s1 of them right and student 2 gets exactly s2 of them right?" We can define that function recursively as

f(s1,s2,q)=f(s1I1(q,T),s2I2(q,T),q+1)+f(s1I1(q,F),s2I2(q,F),q+1)
where Ii(q,c) is 1 if student i answered c to question q and 0 otherwise. The base cases are f(0,0,Q+1)=1 and f(s1,s2,Q+1)=0 whenever one of s1 or s2 is not 0. To simplify memoization, we may want to add a case to simply answer 0 whenever s1 or s2 are negative, but the recursive equation holds as written for those cases.

By memoizing that function, we can compute it in time O(Q3). We can calculate the probability of question 1's answer being T as

f(S1I1(1,T),S2I2(1,T),2)f(S1,S2,1).
Then, by symmetry, we can reorder the questions to make any question the first one and re-run to compute the probability for any question. After having all probabilities, we simply answer the most likely answer for each question and sum its probability to our expected score. Since we need to run the probability computation O(Q) times (once per question), the overall algorithm takes O(Q4) time. This can be a little too slow.

If we add only the first observation of the insight-based solution, we can notice that two questions that were answered the same by both students have identical probabilities. Then, we only need to calculate the probability of up to 4 question types (in the notation of the previous solution, we use dynamic programming to calculate all the pabs). This improves the overall running time to O(Q3), which fits better within the time limit. An observation about complement could further reduce this to only 2 question types, but that does not change the time complexity and it is not needed to pass this test set.

Test Set 3

The dynamic programming solution for Test Set 2 can be generalized to Test Set 3 by adding an additional score as another parameter to the recursive function. However, the additional dimension and larger limit for Q can easily make such solutions too slow.

Combining the full insights of the first solution to Test Set 2 with the solution to Test Set 1 works, though: there are 8 probability variables pabc, and pairs of complementary variables have complementary probabilities, so we only care about 4 different ones.

Let us call the subindex of the variables (the abc part) the "type" of a question. Let us number the types 1 through 4 in any order. If there are qj questions of type j, we can use quadruples (t1,t2,t3,t4) with 0tjqj for all j to represent sequences of answers that answer T to exactly tj questions of type j. We know that there are

(q1t1)(q2t2)(q3t3)(q4t4)
sequences of answers represented by this particular quadruple. If we filter the quadruples by the ones that give each student their actual score, we are effectively enumerating answers that are consistent with the input. This is what we did for Test Set 1! In this way, we can count which amount tj of questions of type j has the largest probability and choose that one, for each j.

There are at most (Q/4)4 quadruples to check. This makes the time complexity of the algorithm O(Q4), but the 1/256 constant is pretty significant, and a good implementation runs comfortably in time.

But wait! We can refine this solution even more by using the solution for Test Set 2 that ends with solving the system of equations! We can express the score of each student as a linear function of t1,t2,t3 and t4. That gives us a system of 3 equations and 4 unknowns. That means that we only need to try all possible values for one of the tj and then simply solve the system to find unique values for the other three. That refines the solution above to requiring only O(Q) time.

Men' fashion , Dating Tips , Relationship Advice

Post a Comment