Login with: Google Twitter Facebook Local

Avoiding The Needless Multiplication Of Forms

Static analysis in LLVM/Clang

Dr Andrew Moss

2020-11-27

Non-virtual devirtualization

Every plan is a fully formed and splendid thing until it encounters reality.
I have a specific problem that I want to solve, but the problem developed a whole new wart on closer inspection. The callgreptool from the previous post works on statically defined control-flow in the input program, but it cannot see the existence of dynamic call edges. There are two kinds that crop up frequently in C code that can be resolved relatively easily (and many that cannot).
  1. Thread creation
  2. Callbacks that only have a single target
For the first kind we need to inject some domain-knowlege into the analyser so that when the input program does the following:
pthread_create(thread, attr, thread_main, arg);
There is a kind of implied control-flow edge from the call to the function thread_main It is not strictly an edge in the call-graph; thread_mainwill execute asynchronously so its lifetime is not bounded by the call. But it still represents a flow of control from the caller to the function.
The analyser should recognise thread creation calls and insert extra edges into the call-graph before evaluating the regex by walking across the graph.
The second kind of edge originates in different ways. I've seen them appear in code for two reasons:
  1. Leaving hooks for future functionality (premature-generalization).
  2. Wrapping cross-cutting concerns
Premature generalization is the evil twin of premature optimization who sometimes stands in as a doppelgänger at weddings and other public events. Typically a programmer is encoding an operation that they know may change in the future. Of course the right time to encode some kind of despatch on the operation would be when it has changed and we know what the different options may be. But programmers love "flexibility" and so they will build a despatch structure in the code that despatches control through a callback to a choice of... exactly one target function. A much longer rant about this and the more appropriate "rule of three" will wait for another time. I have seen things...
But troubing flashbacks aside the second type of origin is a valid use for callbacks that only have one statically defined target: wrapping calls with other functionality. The most common place for this to add a layer of logging, or auditing, or tracing... to code without rewriting the target function. It typically looks like this:
int processing_func( int arg1, void *arg2); log_processing( current_log, processing_func, arg1, arg2); ... void log_processing( log* l, int(*target)(int, void*, arg1, arg2) ) { ... perform steps before the target target(arg1, arg2); ... perform steps after the target }

Motivating example

The two patterns can be combined together - we need an analysis that can propagate information about static calls through procedure-calls and use it to resolve dynamic-looking calls into static calls. A motivating example is this piece of code (taken from the Netdata internal library ) :
static void *thread_start(void *ptr) { netdata_thread = (NETDATA_THREAD *)ptr; if(!(netdata_thread->options & NETDATA_THREAD_OPTION_DONT_LOG_STARTUP)) info("thread created with task id %d", gettid()); if(pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL) != 0) error("cannot set pthread cancel type to DEFERRED."); if(pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL) != 0) error("cannot set pthread cancel state to ENABLE."); thread_set_name_np(ptr); void *ret = NULL; pthread_cleanup_push(thread_cleanup, ptr); ret = netdata_thread->start_routine(netdata_thread->arg); pthread_cleanup_pop(1); return ret; } int netdata_thread_create(netdata_thread_t *thread, const char *tag, NETDATA_THREAD_OPTIONS options, void *(*start_routine) (void *), void *arg) { NETDATA_THREAD *info = mallocz(sizeof(NETDATA_THREAD)); info->arg = arg; info->thread = thread; info->tag = strdupz(tag); info->start_routine = start_routine; info->options = options; int ret = pthread_create(thread, attr, thread_start, info); if(ret != 0) error("failed to create new thread for %s. pthread_create() failed with code %d", tag, ret); else { if (!(options & NETDATA_THREAD_OPTION_JOINABLE)) { int ret2 = pthread_detach(*thread); if (ret2 != 0) error("cannot request detach of newly created %s thread. pthread_detach() failed with code %d", tag, ret2); } } return ret; }
Like any piece of code in a long-running project this tells a story of woe and suffering. Thread cancellation is a warty subject. If a thread is capable of terminating itself reliably then the threading model in posix is not overly painful. But of course the use of any syscall that can block indefinitely means that a thread cannot terminate itself cleanly. And writing any kind of useful code in posix means calls syscalls that may not return. So the programmer has to face thread cancellation.
Handling cancellation means additional steps that need to added across all threads: setting initial cancellation state, adding handlers for cleanup etc. So we see this common code expressed in a wrapper that calls the target via a callback. For extra fun this constant target address is passed through a structure in memory because the API for threads only allows a single value to be passed through thread creation. For every call-site that calls netdata_thread_createwith a constant address there is an implied static control-flow that is expressed as dynamic control-flow in the code.
Finding and resolving this control flow is then an analysis that is similar in spirit to de-virtualization although it is being performed on raw function pointers and we do not have the instantiated type at the call-site to guide the analysis. To make it useful in real-world code (which this fragment from Netdaa is a goo example of) we must be able to track constant values passed through structures in memory. Finally we need to propagate this model of constant values across call-sites in the program, so the resulting analysis will be quite complex and resemble some parts of existing analyses:
  1. The IPO-propagation of the constant information is similar to SCCP.
  2. Analysis of the structures in memory will be similar to shape-analysis (but tracking function-pointer values rather than recursive definitions of the structure).
  3. Where the lattice of values for a call targets can be resolved to a single definite entry we have an analysis similar to de-virtualization.

A "plan"

Plans are flexible things and there are a lot of unknowns in how I'm going to implement this. But roughly:
  1. Find and tag memory allocation points in the program (i.e. direct calls to malloc).
  2. Define a model for pointer targets as disjoint regions with individual lattices for contents.
  3. Inter-procedural propagation of constants as joins on the lattices (via stores into statically resolvable offsets of tracked heap regions).
To keep this tractable it will not be running globally over the entire program the propagation of the heap-regions is only going to occur up and down the def-use chains for the memory allocation sites "close to" the target thread creation calls.

Comments

Markdown syntax...

Sign in at the top of the page to leave a comment