Nature includes many distributed systems made up of units that interact using local rules, including, for example, insect swarms, neural networks, developing cells, bird flocks, and even human populations. The collective effect of these simple local actions is often complex global behavior. A major interest in biology is understanding these systems.
A standard approach to advancing this goal is to apply the most common tools from science: differential equations and statistical analysis. For many of the type distributed systems described above, however, these tools seem to fall short. With this in mind, there has been a growing interest in recent years in applying algorithmic methods, primarily innovated in the study of computer science, to better describe and analyze distributed behavior in nature.
This topic has many names, depending on who you ask, including natural algorithms, biological algorithms, and biological distributed algorithms. The core approach of this topic is to attempt to describe the local rules in a natural distributed system as an algorithm, and then study the behavior or fundamental capabilities of the resulting system using existing analytical tools from computer science.
These tools allow us to ask and answer deep questions about these systems that cannot be so easily probed by differential equations, statistical analysis, or numerical simulation; for example, identifying the number of bits of memory needed to break symmetry among cells communicating with chemical signaling (the answer turns out to be the logarithm of the inverse of the error bound), or the computational power of a soup of reacting chemicals (under the right conditions, these soups can compute anything a general purpose computer can compute).
A secondary motivation for studying biological algorithms is that they might generate new ideas and approaches for designing electronic computational systems that feature some of the nice properties observed in nature.
This course is a doctoral seminar that will review and discuss the rapidly evolving literature from the study of biological algorithms. Students will come away with a cutting-edge understanding of the main topics, results, and techniques in the field, as well as an understanding of the most promising open problems.
The course is a reading course. We will discuss two or three papers per class meeting. For each paper, we will cover the problems studied and their motivation, as well as the solutions proposed and any results. We will selectively dive into technical details as a group in places where I feel it is tractable and illuminating.
Most of the papers we read will come from the computer science perspective, though we will occasionally read papers from biology, physics, and related fields, where relevant.
A week before each class meeting, I will email a set of discussion questions for each of the papers to be covered in that meeting. This email will also include guidance on what sections to focus on in the readings and what sections can be skimmed or skipped. Students must email me responses to these questions before the class meeting begins. The goal of these questions is to both ensure you do the reading and to help guide your attention to the areas most fruitful for discussion.
In addition to preparing readings, each student will be assigned one class meeting toward the end of the semester in which to prepare and lead a discussion on a topic of their choosing within biological algorithms. This will be your opportunity to practice surveying this literature and extracting interesting trends and directions on your own.
Grading in a doctoral seminar is pass/fail. To achieve a pass requires three things: (1) good answers to most of the reading response questions; (2) participating in discussion; (3) a respectable effort in organizing your assigned class meeting.
Though there isn't a formal research requirement for this seminar, I encourage interested students to use this seminar to indentify promising research topics to pursue beyond the scope of the course. I am happy to discussion research collaborations. In addition, a fantastic NSF-funded workshop on this topic will be held this summer in Washington D.C. -- presenting a perfect venue to present or discuss early stage results and learn about cutting-edge developments.
I will post the readings for each class meeting in the schedule below. Notice, I'll be filling in and modifying the details of this schedule on the fly as the semester progresses, so please check often for changes.
As of now, the tentative list of topics I hope to cover with these readings includes:
A paper published in the journal Science in 2011 claimed that cells in fly embryos might be solving a well-known problem in computer science as part of their development. The paper studied the communication capabilities of these cells and proposed an algorithm that the cells might be executing to solve the problem. A series of follow-up papers within the computer science community grappled with the same model and problem, highlighting tensions between the computer scientist's instinct to identify the best possible solution, and nature's instinct to find something simple that works. We are starting with this topic as it provides a good entry into the general approaches used in the study of biological algorithms, and the tensions they create.
Because this is the first class, I do not expect the students to have completed this reading in advance.
A famous 1990 paper by Mirolla and Strogatz (cited over 1800 times), appearing in an applied mathematics journal, opens unexpectedly: “Fireflies provide one of the most spectacular examples of synchronization of nature.” The authors wanted to explore how simple local rules could lead to such impressive global displays.
The three papers we’ll read this week follow this thread of investigation, starting with the classical systems biology approach of Mirolla and Strogatz, to a more algorithmic, but still bio-inspired approach applied by a group of Harvard computer scientists, to finally, a purely computer science approach applied a pair of optimization minded MIT-based theoreticians.
These readings provide a good example of the power gained when shifting one’s conception of a natural phenomenon from a system of equations to a distributed system of interacting algorithms. They also, however, provide another example of the tension between a biologist’s interest in understanding nature, and a computer scientist’s interest in identify the optimal solution to a problem for a given set of constraints.
For the Mirollo paper, read through Section 2.2 and understand the results proved. You do not have to understand the techniques. For the Degesys paper, read through Section 3. You can skim the discussions of how the algorithms are applied to wireless networks. Also look at the results proved in Setion 4. You do not need to know the details of how they are proved, but glance at the techniques used. Finally, for the Cornejo paper, read through the algorithm description in Section 4. Also take a look at the algorithms and results proved in the sections that follow. You do not need to look deeply at the proof details.
During the first two weeks of this seminar we have studied systems made up of components that communicate with simple signals (e.g., protein expression, beeps and/or flashes). The pair of papers assigned for this week apply techniques from computer science to understand the computational power of these bio-inspired systems.
These papers highlight how concepts of complexity and computability from theoretical computer science can apply to biological systems, and identify interesting trade-offs between properties such as logic complexity, synchrony, and signal diversity in determining the power of a system to solve computational problems.
For the Gilbert paper on the computational power of beeps, understand the model definition and its motivation. Understand the definition of the leader election problem. Understand the basic operation of Algorithm 1 in Section 3.2. For the sub-sections that follow in Section 3 (3.3, 3.4, 3.5), understand the main theorem statements in each section and get a rough sense of the algorithmic strategies that combines with Algorithm 1 to obtain the results. Finally, understand what is being claimed by the main theorem statement in Section 4. There is no need to dive into proof details in this paper, though you might find it useful to get a general sense of some of the techniques.
For the Emek paper on stone age computing, understand the model (which is a little tricky in its details) and its motivations. You should understand the problems solved in Sections 3 - 6, but it is sufficient to just get a general sense of how they solve them. There is no need to dive into the proof details.
Earlier in the 20th century, breakthrough biology research on ants focused on observing colony behavior and decoding ant biology (this is the period in which E.O. Wilson identified chemical communication mechanisms). Later in the century attention turned toward understanding how ants could create complex societies through simple local rules. This systems biology questions is perfectly suited for the application of an algorithmic approach. This week, we will study a sampling of the resulting papers.
Notice how the papers grapple with the right amount of complexity to use in describing the ants: too little power, and nothing interesting is possible, too much, and the ants can compute anything. What's the right approach?
Watch the Gordon video (or read the transcript). It provides a good introduction to some key ant colony behavior and how biologists study them (Gordon is a leader of a more algorithmic approach to the topic). For the four technical papers, your goal should be to understand: (1) how they model the ants; (2) the problems they have the ants solving; (3) the algorithmic strategies used to solve the problems; (3) the bounds and impossibility results established. You do not have to dive deep into the technical details of the proofs, but do get a sense of the type of techniques used. For the Ants and Elephants paper, in particular, focus mainly on the Elephant Gun algorithm, as that's the result we will mainly discuss.
One of the most impressive displays of distributed intelligence in nature is the ability of insect swarms to construct complex structures (think: wasp nests and termite mounds). It is widely understood that no single insect has the cognitive capabilities to understand all aspects of these constructions, so they represent a classical example of an emergent result of simple rules.
This week's readings include two biology papers on particular distributed construction strategies, and a computer science paper that encodes similar rules into a robot swarm. Our goal is to understand the types of distributed algorithms deployed in these scenarios, and to probe what role (if any) algorithmic analysis might play in increasing our understanding of these behaviors.
For the biology papers, they key is to get a feel for the types of local rules these insects seem to be deploying along with the techniques used to try to identify and validate these rules. For the computer science paper, get a sense of the types of algorithms they are deploying on the swarm and what is required to make them function.
One of the most successful existing intersections between the study of distributed algorithms and natural systems is the topic of population protocols. The population protocol model assumes a collection of finite state machines that interact in a pairwise fashion as selected by some scheduler. These networks can compute in the sense that we can treat the multiset of initial states as an input, and ask the network to stabilize to a configuration where the multiset of stable states is the corresponding output.
Because this model is precise in its definition, computer scientists could probe its exact computational power. It turns out, however, that many other systems (including those from the natural world, such as chemical reaction networks) can be simulated by this model, so these results provided insight into the power of many other systems. This week we will learn about this model, its computational power, and the types of strategies used to solve problems in this setting.
Start with the first paper in the list as this was the original paper on the topic. It is worth reading the whole (main body) of the paper, which avoids a lot of technical details, and which summarizes a lot about this model, its motivation, and what you can do with it. Pay particular attention to the strategy in Section 4.2: we will see it again.
Then proceed with the Stably Computable Predicates paper. This paper was important as it precisely bounded the power of the population protocol model (under certain assumptions). Read the paper through Section 3 to understand what they are proving. You can skip all the technical details of the proof that then follow.
Finally, for the approximate majority paper, learn the problem they solve and their simple solution. Pay attention to the discussion in the introduction about what makes a good population protocol solution. You do not need to follow the proof, but maybe glance at it to get a sense of the type of analysis they are deploying.
The behaviors of chemical reactants in a well-mixed solution have been closely studied for over a century. In more recent years, however, an interdisiplinary collection of scientists have been studying how these chemical reaction networks can be made to compute solutions to problems.
Part of the interest in this work is understanding the fundamental power of natural systems. Another part of hte interest is figuring out how to design biological computing devices. This work has strong intersections with computer science (including with the study of population protocols, which we reviewed last week in this seminar.) This week we try to understand some of these connections.
Start with the Computation with Finite Stochastic Chemical Reaction Networks paper. Read Sections 1 - 3. Pay careful attention to the model of SCRN's described in Section 2. You can skim Section 3 at a level of detail that still allows you to understand the basic ideas behind the counter machine simulation.
For the Deterministic Function paper, it is sufficient to read enough to understand what this paper is doing that is different than the paper above, and what is different from what was shown in the population protocol papers read last week.
Finally, for the Speed Faults paper, read the introduction, which captures a lot of information information. Then read enough to understand what they mean by a speed fault and what is meant by the main theorems.