1.1 Devise formulas for the functions that calculate my first i and my last i in
the global sum example. Remember that each core should be assigned roughly
the same number of elements of computations in the loop. Hint: First consider
the case when n is evenly divisible by p. Get solution
1.2 We’ve implicitly assumed that each call to Compute next value requires
roughly the same amount of work as the other calls. How would you change
your answer to the preceding question if call i D k requires kC1 times as much
work as the call with i D 0? So if the first call (i D 0) requires 2 milliseconds,
the second call (i D 1) requires 4, the third (i D 2) requires 6, and so on. Get solution
1.3 Try to write pseudo-code for the tree-structured global sum illustrated in
Figure 1.1. Assume the number of cores is a power of two (1, 2, 4, 8, . . . ).
Hints: Use a variable divisor to determine whether a core should send its
sum or receive and add. The divisor should start with the value 2 and be
doubled after each iteration. Also use a variable core difference to determine
which core should be partnered with the current core. It should start
with the value 1 and also be doubled after each iteration. For example, in
the first iteration 0 % divisor = 0 and 1 % divisor = 1, so 0 receives and
adds, while 1 sends. Also in the first iteration 0 + core difference = 1 and
1 - core difference = 0, so 0 and 1 are paired in the first iteration. Get solution
1.4 As an alternative to the approach outlined in the preceding problem, we can use
C’s bitwise operators to implement the tree-structured global sum. In order to
see how this works, it helps to write down the binary (base 2) representation of
each of the core ranks, and note the pairings during each stage:
From the table we see that during the first stage each core is paired with the
core whose rank differs in the rightmost or first bit. During the second stage
cores that continue are paired with the core whose rank differs in the second
bit, and during the third stage cores are paired with the core whose rank differs
in the third bit. Thus, if we have a binary value bitmask that is 0012 for the
first stage, 0102 for the second, and 1002 for the third, we can get the rank of
the core we’re paired with by “inverting” the bit in our rank that is nonzero in
bitmask. This can be done using the bitwise exclusive or ^ operator.
Implement this algorithm in pseudo-code using the bitwise exclusive or and
the left-shift operator. Get solution
1.5 What happens if your pseudo-code in Exercise 1.3 or Exercise 1.4 is run when
the number of cores is not a power of two (e.g., 3, 5, 6, 7)? Can you modify the
pseudo-code so that it will work correctly regardless of the number of cores? Get solution
1.6 Derive formulas for the number of receives and additions that core 0 carries out
using
a. the original pseudo-code for a global sum, and
b. the tree-structured global sum.
Make a table showing the numbers of receives and additions carried out by core
0 when the two sums are used with 2, 4, 8, : : : ,1024 cores. Get solution
1.7 The first part of the global sum example—when each core adds its assigned
computed values—is usually considered to be an example of data-parallelism,
while the second part of the first global sum—when the cores send their partial
sums to the master core, which adds them—could be considered to be an
example of task-parallelism. What about the second part of the second global
sum—when the cores use a tree structure to add their partial sums? Is this an
example of data- or task-parallelism? Why? Get solution
1.8 Suppose the faculty are going to have a party for the students in the department.
a. Identify tasks that can be assigned to the faculty members that will allow
them to use task-parallelism when they prepare for the party. Work out a
schedule that shows when the various tasks can be performed.
b. We might hope that one of the tasks in the preceding part is cleaning the
house where the party will be held. How can we use data-parallelism to
partition the work of cleaning the house among the faculty?
c. Use a combination of task- and data-parallelism to prepare for the party. (If
there’s too much work for the faculty, you can use TAs to pick up the slack.) Get solution
1.9 Write an essay describing a research problem in your major that would benefit
from the use of parallel computing. Provide a rough outline of how parallelism
would be used. Would you use task- or data-parallelism?
Get solution