Notes on Unsafe, Thread-specific, Flow-insensitive, Context-insensitive Points-to analysis B.G. Ryder, January 2006 We will be coding a flow-insensitive, context-insensitive Andersen-style points-to analysis using an on-the-fly call graph, derived from our analysis results. We will be doing this analysis using a worklist algorithm to drive the entire process and restricting our attention to reference assignments that occur within those methods reachable from the initialization of a specific thread. Most points-to analyses are a 2-step process on the set of methods reachable from a first specified method in an OO program. Essentially, we use the points-to relations we derive during the analysis to expand the dynamic dispatch sites in the reachable parts of the program. As we do this, we add to the set of reference assignments which affect the points-to relations, so this computation is interleaved, using a worklist organization. It helps to think about what we do when we find that a method f is reachable. First, we derive the set of reference assignments in f including those assignments set up by the parameter - argument matching. Second, we calculate the points-to side effects of executing these assignments, and change some existing points-to sets. If in this process, we discover new reachable methods, we put them on a method-worklist for later exploration. If a points-to set changes, then we put the corresponding reference variable on a ref-worklist so we can make sure to further expand any calls for which is it a receiver, thus potentially generating more method-worklist entries. Processing continues until both worklists are empty and we have a points-to solution. Details: We have to deal with 4 types of reference assignment statements 1. p = new X(); 2. p = r; 3. p = r.f; 4. r.f = p; Let's discuss the points-to side effects of each of these statements. 1. p = new X(); this creates a new object OX. in the points-to graph, OX corresponds to a node and p corresponds to a node (these nodes need to be created if they are not already in the graph). after this statement, the node for p points to the node for OX. 2. p = r; this causes the node representing p to point to all the nodes that the node representing r points to 3. p = r.f; this causes a lookup of all the objects currently pointed to by the node representing r. for each such object, follow the edge labelled 'f' and make the node representing p point to the target of this edge. 4. r.f = p; this causes a lookup of all the objects currently pointed to by the node representing r, set R. we also must lookup all the objects pointed to by the node representing p, set P. now we must make each element of R have an outgoing points-to edge labelled 'f', that points to each element of P> Recall that when there is a call involving reference arguments, then pseudo assignments are setup between each reference argument and formal parameter in the target method. These are then processed the same as described above. It is clear that this analysis only considers methods possibly executed by a specific thread whose 'run' statement is the beginning of the call graph generated above. It may not be so clear why this analysis is 'unsafe'. Recall that we are performing a flow-insensitive, context-insensitive analysis. It is possible that some of the objects we are encountering are 'shared' by more than one thread. Essentially this means that any of these threads can write into a reference field of such a shared object. Therefore, although we are analyzing the points-to relations resulting from execution of one thread, another thread may be able to store into a shared object and create additional points-to relations that our analysis will not discover. A 'safe' analysis calculates an approximate solution that never 'misses' any possible points-to relations that may hold at execution time. Thus, our analysis as described here is 'unsafe'. In project II, we will account for these effects.