Analyzing Programs with Dynamic Data and Control Flow
Earl Wu and Ayaka Yorihiro, advised by Professor Zachary Palmer
Swarthmore College
Let’s be honest—even the most skilled programmer makes mistakes, and nobody likes
debugging. This is where program analysis comes to the rescue: it’s a type of program
that aims to analyze properties of other programs and predict their behaviors without
running them. While it is provably impossible to have a perfectly precise analysis, a
relatively robust and efficient analysis can contribute greatly to the creation of reliable,
bug-free programs.
Our summer research focused on program analyses for higher-order functional programs.
This type of program contain lots of dynamic data flow (what values affect each other?)
and control flow (which line of code gets executed next?) that are deeply intertwined. It is
commonly known that a program analysis that targets higher-order functional programs
should consider the two in tandem.
However, in Context-, Flow-, and Field-Sensitive Data-Flow Analysis using
Synchronized Pushdown Systems, a paper published in POPL 2019 (Principles of
Programming Languages), the authors presented Boomerang SPDS, an analysis that
analyzed data and control flow separately. They claimed that this separation would not
affect their analysis’ precision on functional-style programs. Since they didn’t provide
enough evidence to support this claim in the paper, we sought to assess the validity of
their argument.
We designed a series of programs that exemplified common functional code and Object-
oriented design patterns. To test whether the separation of data and control flow has an
impact on analysis precision, we experimented on three pairs of analyses: Boomerang
SPDS and Boomerang (Java alias analyses), kPlume and SetPlume (demand-driven
functional analyses), and kADI and SetADI (forward-running functional analyses).
Within each pair, one analysis has a higher level of separation between flow than the
other. The results depicted that in general, the analysis with a higher level of separation
has a lower precision on these programs. Thus, we concluded that the connection
between data and control flow should be preserved in order to yield precise results when
analyzing programs that contain highly dynamic data and control flow.