With college bowl season just around the corner, football fans across the nation will be dazzled, not just by the on-field action, but also by the intricate "card stunts" performed by members of the stadium's audience. The highly-coordinated crowd work is capable of producing detailed images that resemble the pixelated images on computer screens — and which are coded in much the same manner.
Michael Littman's new book, Code to Joy: Why Everyone Should Learn a Little Programming, is filled with similar examples of how the machines around us operate and how we need not distrust an automaton-filled future so long as we learn to speak their language (at least until they finish learning ours). From sequencing commands to storing variables, Code to Joy provides an accessible and entertaining guide to the very basics of programming for fledgling coders of all ages.
Excerpted from Code to Joy: Why Everyone Should Learn a Little Programming by Michael L Littman. Published by MIT Press. Copyright © 2023 by Michael L Littman. All rights reserved.
“GIMME A BLUE!”
Card stunts, in which a stadium audience holds up colored signs to make a giant, temporary billboard, are like flash mobs where the participants don’t need any special skills and don’t even have to practice ahead of time. All they have to do is show up and follow instructions in the form of a short command sequence. The instructions guide a stadium audience to hold aloft the right poster-sized colored cards at the right time as announced by a stunt leader. A typical set of card-stunt instructions begins with instructions for following the instructions:
listen to instructions carefully
hold top of card at eye level (not over your head)
hold indicated color toward field (not facing you)
pass cards to aisle on completion of stunts (do not rip up the cards)
These instructions may sound obvious, but not stating them surely leads to disaster. Even so, you know there’s gotta be a smart alec who asks afterward, “Sorry, what was that first one again?” It’s definitely what I’d do.
Then comes the main event, which, for one specific person in the crowd, could be the command sequence:
Breathtaking, no? Well, maybe you have to see the bigger picture. The whole idea of card stunts leverages the fact that the members of a stadium crowd sit in seats arranged in a grid. By holding up colored rectangular sign boards, they transform themselves into something like a big computer display screen. Each participant acts as a single picture element— person pixels! Shifts in which cards are being held up change the image or maybe even cause it to morph like a larger-than-life animated gif.
Card stunts began as a crowd-participation activity at college sports in the 1920s. They became much less popular in the 1970s when it was generally agreed that everyone should do their own thing, man. In the 1950s, though, there was a real hunger to create ever more elaborate displays. Cheer squads would design the stunts by hand, then prepare individual instructions for each of a thousand seats. You’ve got to really love your team to dedicate that kind of energy. A few schools in the 1960s thought that those newfangled computer things might be helpful for taking some of the drudgery out of instruction preparation and they designed programs to turn sequences of hand-drawn images into individualized instructions for each of the participants. With the help of computers, people could produce much richer individualized sequences for each person pixel that said when to lift a card, what color to lift, and when to put it down or change to another card. So, whereas the questionnaire example from the previous section was about people making command sequences for the computer to follow, this example is about the computer making command sequences for people to follow. And computer support for automating the process of creating command sequences makes it possible to create more elaborate stunts. That resulted in a participant’s sequence of commands looking like:
up on 001 white
up on 022 white
up on 036 white
up on 045 white
057 metallic red
Okay, it’s still not as fun to read the instructions as to see the final product—in this actual example, it’s part of an animated Stanford “S.” To execute these commands in synchronized fashion, an announcer in the stadium calls out the step number (“Forty-one!”) and each participant can tell from his or her instructions what to do (“I’m still holding up the white card I lifted on 36, but I’m getting ready to swap it for a blue card when the count hits 43”).
As I said, it’s not that complicated for people to be part of a card stunt, but it’s a pretty cool example of creating and following command sequences where the computer tells us what to do instead of the other way around. And, as easy as it might be, sometimes things still go wrong. At the 2016 Democratic National Convention, Hillary Clinton’s supporters planned an arena-wide card stunt. Although it was intended to be a patriotic display of unity, some attendees didn’t want to participate. The result was an unreadable mess that, depressingly, was supposed to spell out “Stronger Together.”
These days, computers make it a simple matter to turn a photograph into instructions about which colors to hold up where. Essentially, any digitized image is already a set of instructions for what mixture of red, blue, and green to display at each picture position. One interesting challenge in translating an image into card-stunt instructions is that typical images consist of millions of colored dots (megapixels), whereas a card stunt section of a stadium has maybe a thousand seats. Instead of asking each person to hold up a thousand tiny cards, it makes more sense to compute an average of the colors in that part of the image. Then, from the collection of available colors (say, the classic sixty-four Crayola options), the computer just picks the closest one to the average.
If you think about it, it’s not obvious how a computer can average colors. You could mix green and yellow and decide that the result looks like the spring green crayon, but how do you teach a machine to do that? Let’s look at this question a little more deeply. It’ll help you get a sense of how computers can help us instruct them better. Plus, it will be our entry into the exciting world of machine learning.
There are actually many, many ways to average colors. A simple one is to take advantage of the fact that each dot of color in an image file is stored as the amount of red, green, and blue color in it. Each component color is represented as a whole number between 0 and 255, where 255 was chosen because it’s the largest value you can make with eight binary digits, or bits. Using quantities of red-blue-green works well because the color receptors in the human eye translate real-world colors into this same representation. That is, even though purple corresponds to a specific wavelength of light, our eyes see it as a particular blend of green, blue, and red. Show someone that same blend, and they’ll see purple. So, to summarize a big group of pixels, just average the amount of blue in those pixels, the amount of red in those pixels, and the amount of green in those pixels. That basically works. Now, it turns out, for a combination of physical, perceptual, and engineering reasons, you get better results by squaring the values before averaging, and square rooting the values after averaging. But that’s not important right now. The important thing is that there is a mechanical way to average a bunch of colored dots to get a single dot whose color summarizes the group.
Once that average color is produced, the computer needs a way of finding the closest color to the cards we have available. Is that more of a burnt sienna or a red-orange? A typical (if imperfect) way to approximate how similar two colors are using their red-blue-green values is what’s known as the Euclidean distance formula. Here’s what that looks like as a command sequence:
take the difference between the amount of red in the two colors square it
take the difference between the amount of blue in the two colors square it
take the difference between the amount of green in the two colors square it add the three squares together
take the square root
So to figure out what card should be held up to best capture the average of the colors in the corresponding part of the image, just figure out which of the available colors (blue, yellow green, apricot, timberwolf, mahogany, periwinkle, etc.) has the smallest distance to that average color at that location. That’s the color of the card that should be given to the pixel person sitting in that spot in the grid.
The similarity between this distance calculation and the color averaging operation is, I’m pretty sure, just a coincidence. Sometimes a square root is just a square root.
Stepping back, we can use these operations — color averaging and finding the closest color to the average — to get a computer to help us construct the command sequence for a card stunt. The computer takes as input a target image, a seating chart, and a set of available color cards, and then creates a map of which card should be held up in each seat to best reproduce the image. In this example, the computer mostly handles bookkeeping and doesn’t have much to do in terms of decision-making beyond the selection of the closest color. But the upshot here is that the computer is taking over some of the effort of writing command sequences. We’ve gone from having to select every command for every person pixel at every moment in the card stunt to selecting images and having the computer generate the necessary commands.
This shift in perspective opens up the possibility of turning over more control of the command-sequence generation process to the machine. In terms of our 2 × 2 grid from chapter 1, we can move from telling (providing explicit instructions) to explaining (providing explicit incentives). For example, there is a variation of this color selection problem that is a lot harder and gives the computer more interesting work to do. Imagine that we could print up cards of any color we needed but our print shop insists that we order the cards in bulk. They can only provide us with eight different card colors, but we can choose any colors we want to make up that eight. (Eight is the number of different values we can make with 3 bits — bits come up a lot in computing.) So we could choose blue, green, blue-green, blue-violet, cerulean, indigo, cadet blue, and sky blue, and render a beautiful ocean wave in eight shades of blue. Great!
But then there would be no red or yellow to make other pictures. Limiting the color palette to eight may sound like a bizarre constraint, but it turns out that early computer monitors worked exactly like that. They could display any of millions of colors, but only eight distinct ones on the screen at any one time.
With this constraint in mind, rendering an image in colored cards becomes a lot trickier. Not only do you have to decide which color from our set of color options to make each card, just as before, but you have to pick which eight colors will constitute that set of color options. If we’re making a face, a variety of skin tones will be much more useful than distinctions among shades of green or blue. How do we go from a list of the colors we wish we could use because they are in the target image to the much shorter list of colors that will make up our set of color options?
Machine learning, and specifically an approach known as clustering or unsupervised learning, can solve this color-choice problem for us. I will tell you how. But first let’s delve into a related problem that comes from turning a face into a jigsaw puzzle. As in the card-stunt example, we’re going to have the computer design a sequence of commands for rendering a picture. But there’s a twist—the puzzle pieces available for constructing the picture are fixed in advance. Similar to the dance-step example, it will use the same set of commands and consider which sequence produces the desired image.