Continuations

Up to this point, we have described how entity method invocations can be used to model simulation events. This facility provides us with all the functionality of an explicit event queue, which is all that many existing simulators use, plus the ability to transparently execute the simulation more efficiently. However, it remains cumbersome to model simulation processes, since they must be written as event-driven state machines. While many entities, such as network protocols or routing algorithms, naturally take this event-oriented form, other kinds of entities do not. For example, an entity that models a file-transfer is more readily encoded as a process than as a sequence of events. Specifically, one would rather use a tight loop around a blocking send routine than dispatch send events to some transport entity, which will eventually dispatch send-complete events in return. To that end, we introduce simulation time continuations.

In order to invoke an entity method with continuation, the simulation developer merely needs to declare that a given entity method is blocking. Syntactically, an entity method is blocking if and only if it declares that it throws a JistAPI.Continuation exception. This exception is not actually thrown. It acts merely as a method tag and as an indication to the JiST rewriter. Moreover, it need not be explicitly declared further up the call-chain, since it is a sub-class of Error, the root of an implicit exception hierarchy.

The semantics of a blocking entity method invocation, as shown in Figure 10, are a natural extension atop the existing event-based invocation. We first save the call-stack of the calling entity and attach it to the outgoing event. When the call event is complete, we notice that it has caller information, so we dispatch a callback event to the caller, with its continuation information. Thus, when the callback event is eventually dequeued, the state is restored and execution will continue right after the point of the blocking entity method invocation. In the meantime, however, the local simulation time has progressed to the simulation time at which the calling event was completed and other events may have been processed against the entity.

**Figure 10:** The addition of blocking methods allows simulation developers to regain the simplicity of process-oriented development. When a blocking entity method is invoked, the continuation state of the current event is saved and attached to a call event. When this call event is complete, we schedule a callback event to the caller. The continuation is restored and the caller continues its processing, albeit at a later simulation time.
$\begin{figure}\resizebox{\columnwidth}{!}{\begin{picture}(0,0)% \includegraphics{figs/cont}% \par \end{picture}\input{figs/cont.texi}} \end{figure}$

This approach has a number of advantages. First, it allows blocking and non-blocking entity methods to co-exist. Methods can arbitrarily be tagged as blocking and we extend the basic event structure to store the call and callback information. Secondly, there is no notion of an explicit process. Unlike process-oriented simulation runtime, which must allocate a fixed stack-space for each real or logical process, the JiST stack space is unified across all its active entities, reducing memory consumption. The continuation stacks actually exist in the heap along with the events that contain them and any objects that they reference. Moreover, our model is actually closer to threading, in that multiple continuations can exist simultaneously for a single entity. Finally, there is no system context switching required. The parallelism occurs only in simulation time, so the underlying events may be executed sequentially within a single thread of control.

The code listing in Figure 11 shows an entity with a single blocking method. Notice that blocking is a blocking method, since it declares that it may throw a JistAPI.Continuation exception on line 5. Otherwise, blocking would be a regular entity method, invoked in simulation time. The effect of the blocking tag is best understood by comparing the output for the program both with and without the blocking semantics, i.e. both with an without the blocking tag on line 5. Note how the timestamps are affected, in addition to the ordering of the events.

blocking non-blocking

> i=0 t=0 > called at t=0 > i=1 t=1 > called at t=1 > i=2 t=2 > called at t=2

> i=0 t=0 > i=1 t=0 > i=2 t=0 > called at t=0 > called at t=0 > called at t=0

**Figure 11:** A basic entity which illustrates the use of a blocking method.
$\begin{figure}\small \medskip \hrulefill\hspace{1ex}{\tt cont.java}\hspace{1ex... ...x}\input{include/cont.java.html} \vspace{2ex} \hrule {\small } \end{figure}$

Unfortunately, saving and restoring the call-stack for the purposes of continuation is not a trivial task in Java [20]. The fundamental difficulty arises from the fact that stack manipulations are not supported at either the language, library or bytecode level. We have implemented and evaluated a number of ways to address this problem. The specific choice of implementation does not affect the execution semantics of continuations. However, understanding the underlying implementation may help simulation developers produce more efficient simulations.

Our implementation of continuations draws on ideas in the JavaGoX [21] and PicoThreads [3] projects, which also need to save the stack, albeit in a radically different context. We introduce an important new idea to these approaches, which eliminates the need to modify method signatures. This fact is significant, since it allows our implementation to function even across the standard Java libraries and enables us, for example, to run standard, unmodified Java network applications directly over our wireless simulator. This design also eliminates the use of exceptions to carry state information and is considerably more efficient for our simulation needs.

Since we are not allowed access to the call-stack in Java, we instead convert parts of the original simulation program into a continuation-passing style (CPS). The first step is to determine which parts of the simulation code need to be transformed. For this purpose, the JiST rewriter incrementally produces a call-graph of the simulation at runtime as it is loaded and uses the blocking method tags to compute all continuable methods. Continuable methods are those methods that could exist on a call stack at the point of a blocking entity method invocation. Or, more precisely, a continuable method is defined recursively as any method that contains:

**Figure 12:** These pseudocode-bytecode listings show the basic CPS transformation that occurs on all continuable methods. The transformation instruments the method to allow it to either: a) save and exit, or b) restore and start; from any continuable invocation location.
$\begin{figure}\par \hrulefill \vspace{-2ex} \begin{multicols}{2} {\bf Before ... ...e{2ex} {\small }} \end{multicols} \vspace{-5ex} \hrulefill \par\end{figure}$

Each method within the continuable set undergoes a basic CPS transformation, as shown in Figure 12. We scan the method for continuation points and assign each one a program location number. We then perform a data-flow analysis of the method to determine the types of the local variables and stack slots at that point and use this information to generate a custom class that will store the continuation frame of this program location. These classes, containing properly typed fields for each of the local variables and stack slots in the frame, will be linked together to form the preserved stack. Finally, we insert both saving and restoration code for each continuation point. The saving code marshals the stack and locals into the custom frame object and pushes it onto the event continuation stack via the kernel. The restoration code does the opposite and then jumps right back to the point of the blocking invocation.

All of this must be done in a type-safe manner. This requires special consideration not only for the primitive types, but also for arrays and null-type values. There are other restrictions that stem from the bytecode semantics. Specifically, the bytecode verifier will allow uninitialized values on the stack, but not in local variables or fields. The bytecode verifier will also not allow an initializer (constructor) to be invoked more than once. We eliminate both of these possibilities by statically verifying that no constructor is continuable.

Finally, the kernel functions as the continuation trampoline, as shown in Figure 13. When the kernel receives a request to perform a call with continuation, it registers the call information, switches to save mode and returns to the caller. The stack then unwinds and eventually returns to the event loop, at which point the call event is dispatched with the continuation attached. When the call event is received, it is processed, and a callback event is dispatched in return with both the continuation and the result attached. Upon receiving this callback event, the kernel switches to restore mode and invokes the appropriate method. The stack then winds up to its prior state and the kernel receives a request for a continuation call yet again. This time, the kernel simply returns the result of the call event and allows the event processing to continue from where it left off.

**Figure 13:** The JiST event loop also functions as a continuation trampoline. It saves the continuation state on a blocking entity method invocation and restores it upon receiving the callback. Due to Java constraints, the stack must be manually unwound and preserved.
$\begin{figure}\resizebox{\columnwidth}{!}{\begin{picture}(0,0)% \includegraphics{figs/contstack}% \par \end{picture}\input{figs/contstack.texi}} \end{figure}$

The performance of a blocking call with a short stack is only around 2-3 times slower than two regular events. Thus, continuations present a viable, efficient way to reclaim process-oriented simulation functionality within an event-oriented simulation framework.