(uist451-burg.pdf)

Interactive Record/Replay for Web Application Debugging

Brian Burg

, Richard Bailey

, Amy J. Ko

, Michael D. Ernst

Computer Science and Engineering

University of Washington

{burg, rjacob, mernst}@cs.washington.edu

The Information School

University of Washington

ajko@uw.edu

ABSTRACT

During debugging, a developer must repeatedly and manu-

ally reproduce faulty behaviors in order to inspect different

facets of the program’s execution. Existing tools for reproduc-

ing such behaviors prevent the use of debugging aids such as

breakpoints and logging, and are not designed for interactive,

random-access exploration of recorded behavior. This paper

presents Timelapse, a tool for quickly recording, reproduc-

ing, and debugging interactive behaviors in web applications.

Developers can use Timelapse to browse, visualize, and seek

within recorded program executions while simultaneously us-

ing familiar debugging tools such as breakpoints and logging.

Testers and end-users can use Timelapse to demonstrate fail-

ures in situ and share recorded behaviors with developers,

improving bug report quality by obviating the need for de-

tailed reproduction steps. Timelapse is built on Dolos, a novel

record/replay infrastructure that ensures deterministic execu-

tion by capturing and reusing program inputs both from the

user and from external sources such as the network. Dolos

introduces negligible overhead and does not interfere with

breakpoints and logging. In a small user evaluation, partici-

pants used Timelapse to accelerate existing reproduction ac-

tivities, but were not signiﬁcantly faster or more successful in

completing the larger tasks at hand. Together, the Dolos in-

frastructure and Timelapse developer tool support systematic

bug reporting and debugging practices.

Author Keywords

Debugging, deterministic replay, web applications

ACM Classiﬁcation Keywords

D.2.5 Software Engineering: Testing and debugging

General Terms

Human Factors; Design

INTRODUCTION

Debugging is often an iterative process in which developers re-

peatedly adjust their view on a program’s execution by adding

and removing breakpoints and inspecting program state at

different times and locations. This iteration requires a devel-

oper to repeatedly reproduce the behavior they are inspecting,

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are not

made or distributed for proﬁt or commercial advantage and that copies bear

this notice and the full citation on the ﬁrst page. Copyrights for components

of this work owned by others than ACM must be honored. Abstracting with

credit is permitted. To copy otherwise, or republish, to post on servers or to

redistribute to lists, requires prior speciﬁc permission and/or a fee. Request

permissions from [email protected].

UIST 2013, October 8–11, 2013, St Andrews, United Kingdom.

which can be time-consuming and error-prone [36]. In the

case of interactive programs, even reproducing a failure can

be difﬁcult or impossible: failures can occur on mouse drag

events, be time-dependent, or simply occur too infrequently

for a developer to easily reach a program state suitable for

debugging.

Deterministic record/replay [5] is a technique that can be used

to record a behavior once and then deterministically replay it

repeatedly and automatically, without user interaction. Though

it seems that the capability to record and replay executions

should be useful for debugging, no prior work has described

actual use cases for these capabilities, or how to best expose

these capabilities via user interface affordances. At most,

prior systems demonstrate feasibility by providing a simple

VCR-like interface [18, 33] that can record and replay linearly.

Many record/replay systems have no UI, and are controlled via

commands to the debugger or special APIs [9, 11, 27]. Finally,

prior work does not consider how record/replay capabilities

can interoperate with and enhance existing debugging tools

like breakpoints and logging. Prior tool instantiations [18, 25,

1] are inappropriate for debugging because they can interfere

with program performance, determinism, and breakpoint use,

and they are difﬁcult to deploy.

This paper presents Timelapse, a developer tool for capturing

and replaying web application behaviors during debugging,

and Dolos, a novel record/replay infrastructure for web ap-

plications. A developer can use Timelapse’s interactive vi-

sualizations of program inputs to ﬁnd and seek non-linearly

through the recording, and then use the debugger or other

tools to understand program behavior. To ensure determinis-

tic execution, Dolos captures and reuses user input, network

responses, and other nondeterministic inputs as the program

executes. It does this in a purely additive way—without im-

peding the use of other tools such as debuggers—via a novel

adaptation of virtual machine record/replay techniques to the

execution environment of web browsers. Debugging tools are

particularly important for interactive web applications because

the web’s highly dynamic, event-driven programming model

stymies traditional static program analysis techniques.

This paper makes the following contributions:

•

Dolos: a fast, scalable, precise, and practical infrastructure

for deterministic record/replay of web applications.

•

Timelapse: a developer tool for creating, visualizing, and

navigating program recordings during debugging tasks.

•

The ﬁrst user study to explore how and when developers

use record/replay tools during realistic debugging tasks.

Development

UIST’13, October 8–11, 2013, St. Andrews, UK

473

Most up-to-date version: 06/25/2021

Dolos and Timelapse are free software: our source code, study

materials, and data are available on the project website

We ﬁrst discuss the design of Timelapse, and use a scenario to

illustrate how Timelapse supports recording, reproducing, and

debugging interactive behaviors. Next, we present the design

and implementation of the underlying Dolos record/replay

infrastructure. We discuss the results of a small study to see

how developers use Timelapse in open-ended debugging tasks.

Finally, we conclude with related work and present several

implications for future interactive debugging tools.

REPRODUCING AND NAVIGATING PROGRAM STATES

The Timelapse developer tool is designed around two activities:

capturing user-demonstrated program behaviors, and quickly

and repeatedly reaching program states within a recording

when localizing and understanding a fault. The novel features

we describe in this section support these activities by mak-

ing it simple to record program behavior and by providing

visualizations and affordances that quicken the process of ﬁnd-

ing and seeking to relevant program states without manually

reproducing behavior.

To better understand the utility of replay capabilities during

debugging, we ﬁrst conducted a small pilot study with a pro-

totype record/replay interface. Using contextual inquiry, we

found that developers primarily used the prototype to iso-

late buggy output, and to quickly reach speciﬁc states when

working backwards from buggy output towards its root cause.

Towards these ends, we saw several common use cases: “play

and watch”; isolating output using random-access seeking;

stepping through execution in single-input increments, and

reading low-level input details or logged output.

This section introduces the novel features of the Timelapse

developer tool by showing how a ﬁctional developer named

Claire might use Timelapse’s features while debugging.

Debugging Scenario: (Buggy) Space Invaders

Claire, a new hire at a game company, has been asked to

ﬁx a bug in a web application version of the Space Invaders

video game

. In this game, the player moves a defending

ship and shoots advancing aliens. The game’s implementation

is representative of modern object-oriented interactive web

programs: it uses timer callbacks, event-driven programming,

and helper libraries. The game contains a defect that allows

multiple player bullets to be in ﬂight at a time; there is only

supposed to be one player bullet at a time (Figure 1).

Reproducing Program Behavior

Claire is unfamiliar with the Space Invaders implementation,

so her ﬁrst step towards understanding and ﬁxing the multiple-

bullet defect is to ﬁgure out how to reliably reproduce it. This

is difﬁcult because the failure only occurs in speciﬁc game

states and is inﬂuenced by execution conditions outside of

her control, such as random numbers, the current time, or

asynchronous network requests.

http://github.com/burg/timelapse/

http://matthaynes.net/playground/javascript/glow/

spaceinvaders/

Figure 1. Screenshots of normal and buggy Space Invader game mechan-

ics. Only one bullet should be in play at a time (shown on left). Due to

misuse of library code, each bullet ﬁres two asynchronous bulletDied

events instead of one event when it is destroyed. The double dispatch

sometimes enables multiple player bullets being in play at once (shown

on right). This happens when two bullets are created: one between two

bulletDied events, and the other after both events.

With Timelapse, Claire begins capturing program behaviors

with a single click (Figure 2-6), plays the game until she repro-

duces the failure, and then ﬁnishes the recording. Recordings

created by Timelapse are compact, self-contained, and serial-

izable, so Claire can attach her recording to a bug report or

share it via email.

To reproduce the defect with traditional tools, Claire would

have to multitask between synthesizing reproduction steps,

playing the video game, and reﬂecting on whether the repro-

duction steps are correct. Once Claire ﬁnds reliable reproduc-

tion steps, she could then use breakpoints to further understand

Figure 2. The Timelapse tool interface presents multiple linked views of

recorded program inputs. Above, the timelines drawer (1) is juxtaposed

with a detailed view of program inputs (2). The recording overview

(3) shows inputs over time with a stacked line graph, colored by cat-

egory. The overview’s selected region is displayed in greater detail in

the heatmap view (4). Circle sizes indicate input density per category.

In each view, the red cursor (5) indicates the current replay position

and can be dragged. Buttons allow for recording (6), 1× speed replay

(7), and breakpoint scanning (8). Details for the selected circle (9a) are

shown in a side panel (9b).

Development

UIST’13, October 8–11, 2013, St. Andrews, UK

474

the defect. But, breakpoints might themselves affect timing,

making the defect harder to trigger or requiring modiﬁed re-

production steps.

Navigating to Speciﬁc Program States

To focus her attention on code relevant to the failure, Claire

needs to know which speciﬁc user input—and thus which

event handler invocations—caused the second bullet to appear.

Claire uses Timelapse’s visualization and navigation tools

(Figure 2) to locate and seek the recording to an instance

of the multiple bullets failure. First, Claire limits the zoom

interval to when she actually ﬁred bullets, and then ﬁlters out

non-keystroke inputs. She replays single keystrokes with a

keyboard shortcut until a second bullet is added to the game

board. Then, she seeks execution backward by one keystroke.

At this point, she is conﬁdent that the code which created the

second bullet ran in response to the current keystroke.

Without Timelapse, it would not be possible for Claire to iso-

late the failure to a speciﬁc keystroke and then work backwards

from the keystroke. Instead, she would have to insert logging

statements, repeatedly reproduce the failure to generate log-

ging output, and scrutinize logged values for clues leading

towards the root cause.

Navigation Aid: Debugger Bookmarks

Having tracked down the second bullet to a speciﬁc user input,

Claire now needs to investigate what code ran, and why.

With Timelapse, Claire sets several debugger bookmarks (Fig-

ure 3-6) at positions in the recording that she wants to quickly

revert back to, such as the keystroke that caused the second

bullet to appear or an important program point reached via the

debugger. Debugger bookmarks support the concept of tempo-

ral focus points [28],which are useful when a developer wants

to relate information [13]—such as the program’s state at a

breakpoint hit—that is only available at certain points of exe-

cution. Timelapse restores a debugger bookmark by seeking

to the preceding input, setting and continuing to the preced-

ing breakpoint, and ﬁnally simulating the user’s sequence of

debugger commands (step forward/into/out).

With traditional tools, Claire must explore an execution with

debugger commands such as “step into”, “step over”, and

“step out”. This is frustrating because these commands are

irreversible, and Claire would have to manually reproduce

the failure multiple times to compare multiple instants or the

effects of different commands.

Navigation Aid: Breakpoint Radar

Once Claire ﬁnds the code that creates bullets, she needs to

understand why some keystrokes ﬁre bullets and others do not.

With Timelapse, Claire ﬁrst adjusts the zoom interval to in-

clude keystrokes that did and did not trigger the failure. Then,

she sets a breakpoint inside the

Bullet.create()

method

and records and visualizes when it is actually hit during the

execution using the breakpoint radar feature (Figure 3-3a).

Breakpoint radar automates the process of replaying the execu-

tion, pausing and resuming at each hit, and visualizing when

7 8

Figure 3. Timelapse’s visualization of debugger status and breakpoint

history, juxtaposed with the existing Sources panel and debugger (1). A

blue cursor (2) indicates that replay execution is paused at a breakpoint,

instead of between user inputs (as shown in Figure 2). Blue circles mark

the location of known breakpoint hits, and are added or removed au-

tomatically as breakpoints change. A side panel (3b) shows the selected

(3a) circle’s breakpoints. Shortcuts allow for jumping to a speciﬁc break-

point hit (4) or source location (5). Debugger bookmarks (6) are set with

a button (7) and replayed to by clicking (6) or by using a drop-down

menu (8).

the debugger paused. Claire can easily see which keystrokes

created bullets and which did not.

With traditional tools, Claire would need to repeatedly set and

unset breakpoints in order to determine which keystrokes did

or did not create bullets. To populate the breakpoint radar

timeline, Claire would have to manually hit and continue from

dozens or hundreds of breakpoints, and collect and visualize

breakpoint hits herself. For this particular defect, breakpoints

interfere with the timing of the bullet’s frame-based anima-

tions, so it would be nearly impossible for Claire to pause

execution when two bullets are in ﬂight.

Interacting with Other Debugging Tools

Once Claire localizes the part of the program responsible for

the multiple bullets, she still needs to isolate and ﬁx the root

cause. To do so, Claire uses debugging strategies that do not

require Timelapse, but nonetheless beneﬁt from it. Timelapse

is designed to be used alongside other debugging tools such

as breakpoints

, logging, and element inspectors; its interface

can be juxtaposed (Figure 3) with other tools.

Through code inspection, Claire observes that the creation

of a bullet is guarded by a ﬂag indicating whether any bul-

lets are already on the game board. The ﬂag is set in-

side the

Bullet.create()

method and cleared inside the

Bullet.die()

method. To test her intuition about the

To prevent breakpoints from interfering with tool use cases, Time-

lapse tweaks breakpoints in several ways: breakpoints are disabled

when recording or seeking, and enabled during real-time playback.

Development

UIST’13, October 8–11, 2013, St. Andrews, UK

475

code’s behavior, she inserts logging code and captures a new

execution to see if the method calls are balanced. The logging

output in Figure 4 is synchronized with the replay position:

as Claire seeks execution forward or backward, logging out-

put up to the current instant is displayed. Logging output is

cleared when a fresh execution begins (i.e., Timelapse seeks

backwards) and then populated as the program executes nor-

mally.

Claire has discovered that the multiple-bullet defect is caused

by the

bulletDied

event being ﬁred twice, allowing a sec-

ond replacement bullet to be created if the bullet “ﬁre” key

is pressed between the two event dispatches. In other words,

the failure is triggered by ﬁring a bullet while another bullet is

being destroyed (by collision or leaving the game board).

With basic record/replay functionality, affordances for navi-

gating the stream of recorded inputs, and the ability to easily

reach program states by jumping directly to breakpoint hits,

Timelapse both eliminates the need for Claire to repeatedly

reproduce the Space Invaders failure and frees her to focus

on understanding the program’s logic. Our user evaluation

explores these beneﬁts further.

RECORD/REPLAY INFRASTRUCTURE

Dolos is the underlying record/replay infrastructure that en-

ables the Timelapse tools and user interfaces. For Timelapse

to be useful during debugging tasks, Dolos must not disrupt ex-

isting developer tools and workﬂows. Concretely, this entails

the following four design goals:

1. Low overhead.

Recording must introduce minimal perfor-

mance overhead, because many interactive web applications

are performance-sensitive. Replaying must be fast so that

users can quickly navigate through recordings.

2. Deterministic replay.

Recordings must exactly reproduce

observable program behavior when replaying. Replaying

should not have external effects.

Figure 4. Screenshots of the logging output window and the defect’s man-

ifestation in the game. Claire added logging statements to the die and

init methods. At left, the logging output shows the order of method

entries, with the blue circle summarizing 3 identical logging outputs. Ac-

cording to the last 4 logging statements (outlined in red), calls to die

and init are unbalanced. At right, three in-ﬂight bullets correspond to

the three calls to Bullet.init.

3. Non-interfer

ence.

Recording and replaying must not inter-

fere with the use of tools such as breakpoints, watchpoints,

proﬁlers, element inspectors, and logging.

4. Deployability.

Recording and replaying must require no

special installation, conﬁguration, or elevated user privi-

leges.

To the best of our knowledge, no prior record/replay tool sat-

isﬁes all of these constraints. Source-level instrumentation

of JavaScript [16] can perturb performance [25] and relies on

hard-to-deploy

proxies to perform source rewriting. Rewrit-

ing techniques can interfere with a developer’s expectations by

obfuscating source code or by causing the debugger to unex-

pectedly interact with code the developer did not write. User-

space record/replay library tools [18, 27] execute in the same

address space as the target program and use wrappers to inter-

pose on nondeterministic APIs (such as the

Date

construc-

tor). These tools are inherently incompatible with breakpoint

debuggers: because they piggyback on the target program,

pausing at a breakpoint will also prevent the record/replay

library’s mechanisms from executing. Virtual machine replay

debugging [33] ensures deterministic execution of the entire

browser instead of one page context—preventing the use of

built-in debugging tools which cannot run independently from

other browser components. Macro-like replay tools such as

Selenium WebDriver [31], CoScripter [15], or Sikuli [34] are

commonly used for workﬂow automation; towards that end,

they only record and replay user input, and not other pro-

gram inputs such as network trafﬁc, dates and times, and other

sources of non-determinism. Replaying only user input is not

deterministic enough to consistently reproduce web applica-

tion behaviors. These external tools are unaware of internal

system state or external server state: for example, they may try

to provide input while the program is paused at a breakpoint,

or cause the program to ask a server for resources that are no

longer valid.

Web Browser Architecture

We use the term web interpreter to refer to the interpreter-like

core of web browsers. The web interpreter processes inputs

from the network, user, and timers; executes JavaScript;

computes page layout; and renders output. We do not consider

features that do not affect the results of computation, such as

bookmarks and a browser’s address bar. The web interpreter

schedules asynchronous computation using a cooperative

single-threaded event loop. It communicates with embedders

(Safari, Google Chrome) and platforms (Linux, Mac OS X)

via embedder and platform APIs (Figure 5). Both APIs are

bidirectional: the web interpreter can call “out” to collect

environmental information from the platfrom or delegate

security policies to the embedder; the platform and embedder

can call “in” to schedule computation (asynchronous timers,

user input, network trafﬁc) in the web interpreter’s event loop.

Proxies used for instrumentation and debugging like Fiddler [29]

require elevated privileges, manual conﬁguration, and intentional

man-in-the-middle attacks to subvert SSL encryption.

Development

UIST’13, October 8–11, 2013, St. Andrews, UK

476

Embedder

!"#$%&'

()*'

+,+"%'

())#'

Platform

Web Interpreter

Figure 5. The ﬂow of inputs while recording execution. Boxes delineate

module boundaries. Arrows indicate the ﬂow of program inputs. The

web interpreter receives input and sends output via embedder and plat-

form APIs (thick lines). During recording, input shims (striped regions)

record program inputs delivered via the public APIs, and pass on the

inputs to the unmodiﬁed event loop.

Design

Our requirements of low overhead, deterministic replay, non-

interference, and deployability are closest to the design goals

of virtual machine-based record/replay systems. Well-known

systems in this space, such as ReVirt [7] and VMWare Re-

play Debugging [33], achieve deterministic record/replay via

the “instruction-counting trick”. A hypervisor interposes on

nondeterministic machine instructions, but executes ordinary

instructions natively on the bare-metal processor. Interrupts

are captured and injected accurately by counting their position

in the instruction stream. This technique satisﬁes our four

design goals: it is fast, ensures precise replay, is compatible

with debuggers, and requires minimal modiﬁcations to the

runtime environment.

The execution model of web programs does not have ma-

chine instructions or interrupts, but there are parallels be-

tween virtual machines and web browsers. The Dolos in-

frastructure makes the following novel substitutions to make

the “instruction-counting trick” work for the web’s execution

model:

•

Instead of simulating deterministic hardware, Dolos simu-

lates deterministic responses from embedder and platform

when the web interpreter calls out to them.

•

Instead of hardware interrupts, Dolos captures and simulates

the subset of inbound embedder and platform API calls that

trigger computation within the web application.

•

Instead of counting instructions as a unit of execution

progress, Dolos counts DOM event dispatches—precursors

to the execution of JavaScript code—that may have deter-

ministic or nondeterministic effects.

Implementation

The Dolos infrastructure is a modiﬁed version of the WebKit

browser toolkit. We chose WebKit because it contains the

most widely-deployed web interpreter (WebCore) and devel-

oper toolchain (Web Inspector). Our deployment model is

straightforward: a version of WebKit built with Timelapse

and Dolos support is distributed as a dynamic library, and can

be used with existing WebKit embedders such as Safari or

Google Chrome by adjusting the dynamic library load path.

!"!#$%

&''(%

Platform

)#(*$+%

&',%

Web Interpreter

Embedder

Figure 6. The ﬂow of inputs while replaying execution. During replay,

the interpreter’s replay controller uses the input log to simulate the se-

quence of recorded public API calls into the interpreter (blue arrows).

External calls to public APIs are ignored (solid black line) to prevent

user interactions from causing a different execution. Calls from the web

interpreter to platform APIs (i.e., to get the current time) use memoized

return values and do not have external effects.

Recording and replaying

Hypervisors (and record/replay systems based on them) are

fast in part because most execution occurs at full speed on

bare hardware. Similarly, Dolos achieves high record/replay

performance in part by using the same code paths as normal ex-

ecution when recording and replaying program inputs. Dolos

captures and replays calls to the embedder and platform APIs

using shims—thin layers used to observe external API calls

or simulate external API calls from within. During recording

(Figure 5), the shims record all API calls that affect program

determinism. During replay, these API calls are simulated

(Figure 6) to deterministically reproduce the recorded pro-

gram behavior. To prevent user interactions from causing a

different execution during replay, the shims block external

API calls while allowing simulated calls to proceed.

During recording, Dolos saves the timing, ordering, and de-

tails of inbound calls to the web interpreter that are pertinent

to the recorded web program’s execution. During replay, these

inbound calls are simulated by Dolos, rather than actually be-

ing issued by the embedder or platform. Dolos’s approach to

capturing and replaying computation-triggering calls is appli-

cable to programs implemented using an event-loop-based UI

framework. Dolos can seek to and/or pause execution prior to

any simulated inbound call. Because the event loop is transpar-

ent to web programs, Dolos “pauses” execution without using

breakpoints by simply not simulating the next computation-

triggering inbound call. One can use breakpoints to pause

JavaScript execution at any statement, but cannot use them

to pause execution before the web interpreter interprets an

inbound call.

Program inputs

Like hypervisor record/replay systems, Dolos classiﬁes in-

puts to the web interpreter—that is, relevant inbound and

outbound embedder and platform API calls—as either inter-

rupts or environmental nondeterminism. Table 1 enumerates

the major sources of interrupt-like inputs and environmental

inputs. Conceptually, interrupts cause computation to happen,

while environmental inputs embody nondeterministic aspects

of the environment. Interrupts are API calls from the embedder

or platform to the interpreter; environmental inputs are API

calls to the embedder or platform from the interpreter.

Development

UIST’13, October 8–11, 2013, St. Andrews, UK

477

Input Classiﬁcation DOM Derivatives

Keyboard strokes Interrupt keyboard events

Mouse input Interrupt mouse events

Timer callbacks Interrupt (none)

Asynchronous events Interrupt animation events

Network response Interrupt AJAX, images, data

Random values Environment Math.random

Current time Environment Date.now

Persistent state Environment cookies, local storage

Table 1. Major classes of input as deﬁned by Dolos.

Dolos records and replays embedder/platform API calls rather

than individual DOM events [18, 21, 25] which result from

the web interpreter’s interpretation of inbound API calls. For

example, instead of recording individual DOM

click

events,

Dolos captures the embedder’s calls to the web interpreter’s

handleMousePress

API. This in turn may trigger one,

many, or no DOM event dispatches, depending on the in-

terpreter’s state. Dolos’s strategy reduces log size, runtime

overhead, and implementation complexity because it does

not require event targets to be serialized when recording or

re-targeted when replaying.

Dolos captures and replays network trafﬁc in the same way that

other event loop callbacks are handled. When capturing, Dolos

caches the HTTP headers and raw data of network responses

as they are passed from the embedder to the web interpreter.

When replaying, Dolos blocks outbound network requests

from actually reaching the network. Dolos simulates determin-

istic network responses by reusing cached HTTP headers and

data. For example, when loading images asynchronously on

Flickr, Dolos caches all image data when capturing. When re-

playing, Dolos simulates network responses by reusing cached

image data, and never communicates with Flickr servers.

For environmental inputs, Dolos uses shims to memoize the

web interpreter’s C++ calls to the platform or embedder APIs

rather than memoizing calls to nondeterministic JavaScript

functions. For example, Dolos does not record the return value

of JavaScript’s

Date.now()

function; instead, Dolos memo-

izes the JavaScript engine’s calls to the

currentTimeMS()

platform API inside the implementation of Date.now().

Replay ﬁdelity

The Dolos infrastructure guarantees identical execution when

recording and replaying, up to the order and contents of DOM

events dispatched. JavaScript execution is completely deter-

ministic. There is no guarantee regarding number of layout

reﬂows or paints due to time compression or internal browser

nondeterminism. This is unimportant because visual out-

put cannot affect the determinism of JavaScript computation:

JavaScript client code can ask for computed layout data, but

will block until the layout algorithm runs on the current ver-

sion of the DOM tree. Rendering and painting activity is not

exposed to client code at all.

To the best of our knowledge, Dolos is the ﬁrst web

record/replay infrastructure that detects execution divergence.

Timelapse warns users when divergences are detected or when

Guo et al. [9] report on the space and time beneﬁts of memoizing

application-level API calls instead of low-level system calls [27].

known-nondeterministic APIs are used, and allows them to

abort replay, ignore divergences, or report feature requests

to the Dolos developers. Dolos uses differences in measures

such as DOM node counts and event dispatch counts to detect

unexpected execution divergences caused by bugs in Dolos

itself. This has helped us ﬁnd and address obscure sources of

nondeterminism, such as resource caching effects, asynchrony

in the HTML parser, implicit window size and focus state, and

improper multiplexing of inputs to nested iframe elements.

The Dolos prototype does not address all known sources of

nondeterminism, such as the Touch, Battery, Sensor, Screen,

or Clipboard APIs, among others. There are no conceptual

barriers to supporting these features: they are implemented

in terms of standardized DOM events and interfaces, making

them relatively easy to interpose upon using techniques de-

scribed in this paper. Each new program input requires local

changes to route control ﬂow through a shim, plus a helper

class to serialize and simulate the program input.

Engineering cost

Dolos’s design scales to new platforms, embedders, and

sources of nondeterminism because inputs are recorded and

replayed at existing module boundaries. Implementing Dolos

required minimal changes to WebKit’s architecture: namely,

the insertion of shims just beneath the web interpreter’s pub-

lic embedder and platform APIs (shown in Figure 5 and in

Figure 6). The Dolos infrastructure consists of 7.6K SLOC

(74 new ﬁles, 75 modiﬁed ﬁles). For comparison, WebKit

contains about 1.38M SLOC.

Overhead

Dolos’s record/replay performance slowdown is unnoticeable

(

< 1.1×

) for interactive workloads and modest (

≤ 1.65×

) for

non-interactive benchmarks without any signiﬁcant optimiza-

tion efforts (Table 2). Recording overhead and the amount of

data collected scales with user, network, and nondeterministic

inputs, not CPU time.

We report the geometric mean of 10 runs (except for interactive

runs, which were recorded once but replayed 10 times). We

cleared network resource caches between executions to avoid

memory and disk cache nondeterminism. We used local copies

of benchmarks to avoid network nondeterminism.

Recording has almost no time overhead: execution times are

dominated by the subject program. Replaying at 1

speed

is marginally slower than a normal execution due to extra

work performed by Dolos, and seeking (fast replaying) is

much faster because it elides user and network waits from the

recorded execution.

While being created or replayed, recordings are stored in-

memory and consume modest amounts of memory (ﬁrst col-

umn in the data size section of Table 2). When serialized, the

recordings are highly compressible. A recording’s length is

limited only by main memory; in our experience, users at-

tempt to minimize recording length to reduce the number of

irrelevant inputs. Timelapse’s linear presentation of time does

not scale well to long executions. This could be ameliorated

by using nonlinear time scaling techniques [32, 14].

Development

UIST’13, October 8–11, 2013, St. Andrews, UK

478

Program Run Time Data Size (KB)

Name Description Workload Bottleneck Baseline Disabled Recording Replaying Seeking Log Site

JSLinux x86 emulator Run until login network 10.5s 1.00× 1.65× 1.65× 0.37× 24.7 / 71.4 / 8.5 4500

JS Raytracer ray-tracer Complete run CPU 6.3s 1.00× 1.01× 1.17× 1.02× 24.0 / 69.4 / 10.2 5.9

Space Invaders video game Scripted gameplay timers 25.8s 1.00× 1.03× 1.22× 0.25× 247 / 683 / 56.8 712

Mozilla.org home page Read latest news user 22.3s 1.00× 1.00× 1.09× 0.23× 187 / 502 / 29.5 2800

CodeMirror text editor Edit a document user 16.6s 1.00× 1.00× 1.03× 0.07× 57.6 / 163 / 17.1 168

Colorpicker jQuery widget Reproduce defect user 15.3s 1.00× 1.00× 1.07× 0.13× 112 / 302 / 26.7 577

DuckDuckGo search engine Browse results user 14.1s 1.00× 1.00× 1.08× 0.19× 119 / 309 / 29.6 1900

Table 2. Overhead for three non-interactive and four interactive programs. “Baseline” is unmodiﬁed WebKit, and “Disabled” is Dolos when neither

record nor replay is enabled. Log size is given for the in-memory representation, the uncompressed log ﬁle, and the compressed log ﬁle. Site content is

images, scripts, and HTML.

Limitations

Dolos only ensures deterministic execution for client-side

portions of web applications; it records and simulates client

interactions with a remote server, but does not directly cap-

ture server-side state. Tools that link client- and server-side

execution traces [35] may beneﬁt from the additional run-

time context provided by a Dolos recording. Dolos cannot

control the determinism of local, external software compo-

nents such as Flash, Silverlight, or other plugins. However,

plugins interact via the embedder API; Dolos could handle

plugin nondeterminism in the same way that other embedder

nondeterminism is handled.

Web interpreters provide many API calls that do not affect

program determinism. For example, WebKit’s web interpreter

includes APIs for usability features like native spell-checking,

in-page search, and accessibility. Dolos does not record or

replay these API calls because a developer may wish to use

such features differently during replay, and these features do

not affect the determinism of the web program.

Dolos’s hypervisor-like record/replay strategy relies on the

layered architecture of WebKit. It is not directly applicable to

systems without clear API boundaries between the embedder,

platform, and web interpreter. For example, the Gecko web

interpreter used by Firefox is implemented by dozens of de-

coupled components rather than a monolithic module. This

design makes it easy to extend the browser, but difﬁcult to

record and replay only the subset of components that affect the

speciﬁc web program’s execution, as opposed to those are used

by browser features, browser extensions, or several web pro-

grams at once. Shims for Gecko would have the same behavior

as Dolos’s shims, but they would be placed throughout the

web interpreter rather than at coarse abstraction boundaries.

HOW DO DEVELOPERS USE TIMELAPSE?

Prior work [7, 18, 27] asserts the usefulness of deterministic

record and replay for debugging. In this section, we present

a formative user study that investigates when, how, and for

whom record/replay tools and speciﬁcally Timelapse are bene-

ﬁcial. Our speciﬁc research questions were:

RQ 1

How does Timelapse affect the way that developers

reproduce behavior?

RQ 2

How do successful and unsuccessful developers use

Timelapse differently?

Study Design

We recruited 14 web developers, 2 of which we used in pilot

studies to reﬁne the study design. Each participant performed

two tasks. We used a within-subjects design to see how Time-

lapse changed the behavior of individual developers. For one

task, participants could use the standard debugging tools in-

cluded with the Safari web browser. For the other task, they

could also use Timelapse. To mitigate learning effects, we

randomized the ordering of the two tasks, and randomized task

in which they were allowed to use Timelapse.

The goal of this study was to explore the variation in how de-

velopers used Timelapse, so the tasks needed to be challenging

enough to expose a range of task success. To balance realism

with replicability, we chose two tasks of medium difﬁculty,

each with several intermediate milestones. Based on our re-

sults, our small exploratory study was still sufﬁciently large

to capture the variability in debugging and programming skill

among web developers.

Participants

We recruited 14 web developers in the Seattle area. Each partic-

ipant had recently worked on a substantial interactive website

or web application. One half of the participants were develop-

ers, designers, or testers. The other half were researchers who

wrote web applications in the course of their research. We did

not control for experience with the jQuery or Glow libraries

used by the programs being debugged.

Programs and Tasks

Space Invaders.

One program was the Space Invaders game

from our earlier example scenario. The program consists of

625 SLOC in 6 ﬁles (excluding library code) and uses the

Glow JavaScript library

. We chose this program for two rea-

sons: its extensive use of timers makes it a heavy record/replay

workload, and its event-oriented implementation is represen-

tative of object-oriented model-view programs, the dominant

paradigm for large, interactive web applications.

We asked participants to ﬁx two Space Invaders defects. The

ﬁrst was an API mismatch that occurred when we upgraded the

Glow library to a recent version while preparing the program

for use in our study. In prior versions, a sprite’s coordinates

were represented with

and

properties; in recent versions,

coordinates are instead represented with

left

and

top

prop-

erties, respectively. After upgrading, the game’s hit detection

http://www.bbc.co.uk/glow/

Development

UIST’13, October 8–11, 2013, St. Andrews, UK

479

code ceases to work because it references the obsolete property

names. The second defect was described in the motivating

example and was masked by the ﬁrst defect.

Colorpicker.

The other program was Colorpicker

, an inter-

active widget for selecting colors in the RGB and HSV col-

orspaces (see in Figure 7). The program consists of about 500

LOC (excluding library and example code). The widget sup-

ports color selection via RGB (red, green, blue) or HSV (hue,

saturation, brightness) component values or through several

widgets that visualize dimensions of the HSV colorspace.

We chose this

Figure 7. The Colorpicker widget.

program be-

cause it makes

extensive use

of the popular

jQuery library,

which—by

virtue of being

highly layered,

abstracted, and

optimized—makes reasoning about and following the code

signiﬁcantly more laborious.

The Colorpicker task was to create a regression test for a real,

unreported defect in the Colorpicker widget. The defect mani-

fests when selecting a color by adjusting an RGB component

value, as shown in Figure 7. If the user drags the G component

(left panel, orange), the R component spontaneously changes

(right panel, red). The R component should not change when

adjusting the G component. The bug is caused by unneces-

sary rounding in the algorithm that converts values between

RGB and HSV color spaces. Since the color picker uses the

HSV representation internally, repeated conversions between

RGB and HSV can expose numerical instability during certain

patterns of interaction.

We claim that both of these faults are representative of many

bugs in interactive programs. Often there is nothing wrong

with the user interface or event handling code per se, but

faults that are buried deep within the application logic are

only uncovered by user input or manifest as visual output.

The Space Invaders faults lie in incorrect uses of library APIs,

but manifest as broken gameplay mechanics. Similarly, the

Colorpicker fault exists in a core numerical routine, but is only

manifested ephemerally in response to mouse move events.

Procedure

Participants performed the study alone in a computer lab. Par-

ticipants were ﬁrst informed of the general purpose and struc-

ture of the study, but not of our research questions to avoid

observer and subject expectancy effects. Immediately prior

to the task where Timelapse was available, participants spent

30 minutes reading a Timelapse tutorial and performing ex-

ercises on a demo program. In order to proceed, participants

were required to demonstrate mastery of recording, replaying,

zooming, seeking, and using breakpoint radar and debugger

bookmarks. Participants could refer back to the tutorial during

subsequent tasks.

http://www.eyecon.ro/colorpicker/

Each task was described in the form of a bug report that in-

cluded a brief description of the bug and steps to reproduce

the fault. At the start of each task, the participant was in-

structed to read the entire bug report and then reproduce the

fault. Each task was considered complete when the participant

demonstrated their correct solution. Participants had up to

45 minutes to complete each task. We stopped participants

when they had demonstrated successful completion to us or

exceeded the time limit.

After both task periods were over, we interviewed participants

for 10 minutes about how they used the tool during the tutorial

and tool-equipped task and how they might have used the tool

on the other task. We also asked about their prior experience

in bug reproduction, debugging, and testing activities.

Data Collection and Analysis

We captured a screen and audio recording of each participant’s

session, and gathered timing and occurrence data by reviewing

the video recordings after the study concluded.

Our tasks were both realistic and difﬁcult so as to draw out

variations in debugging skill and avoid imposing a perfor-

mance ceiling. We measured task success via completion of

several intermediate task steps or critical events. For the Space

Invaders task, the steps were: successful fault reproduction,

identifying the API mismatch, ﬁxing the API bug, reproduc-

ing the rate-of-ﬁre defect, written root cause, and ﬁxing the

rate-of-ﬁre defect. For the Colorpicker task, the steps were:

successful fault reproduction, written root cause, correct test

form, identifying a buggy input, and verifying the test.

We measured the time on task as the duration from the start of

the initial reproduction attempt until the task was completed or

until the participant ran out of time. We recorded the count and

duration of all reproduction activities and whether the activity

was mediated by Timelapse (automatic reproduction) or not

(manual reproduction). Reproduction times only included time

in which participants’ attention was engaged on reproduction,

which we determined by observing changes in window focus,

mouse positioning, and interface modality.

Results

Below, we summarize our ﬁndings of how Timelapse affects

developers’ reproduction behavior (RQ1) and how this inter-

acts with debugging expertise (RQ2).

Timelapse did not reduce time spent reproducing behav-

iors.

There was no signiﬁcant difference in the percentage

of time spent reproducing behaviors across conditions and

tasks. Though Timelapse makes reproduction of behaviors

simpler, it does not follow that this fact will reduce overall

time spent on reproduction. We observed the opposite: be-

cause reproduction with Timelapse was so easy, participants

seemed more willing to reproduce behaviors repeatedly. A

possible confound is that behaviors in our tasks were fairly

easy to reproduce, so Timelapse only made reproduction less

tedious, not less challenging. We had hoped to test whether

Timelapse is more useful for ﬁxing more challenging bugs, but

were forced to reduce task difﬁculty so that we could retain a

Development

UIST’13, October 8–11, 2013, St. Andrews, UK

480

within-subjects study design while minimizing participants’

time commitment.

8–25% of time was spent reproducing behavior.

Even

when provided detailed and correct reproduction steps, de-

velopers in both conditions spent up to 25% (and typically

10–15%) of their time reproducing behaviors. Participants

in all tasks and conditions reproduced behavior many times

(median of 22 instances) over small periods. This suggests that

developers frequently digress from investigative activities to re-

produce program behavior. These measures are unlikely to be

ecologically valid because most participants did not complete

all tasks, and time spent on reproduction activities outside of

the scope of our study tasks (i.e., during bug reporting, triage,

and testing) is not included.

Expert developers incorporated replay capabilities.

High-

performing participants—those who successfully completed

the most task steps—seemed to better integrate Timelapse’s

capabilities into their debugging workﬂows. Corroborating

the results of previous studies [26, 22], we observed that suc-

cessful developers debugged methodically, formed and tested

debugging hypotheses using a bisection strategy, and revised

their assessment of the root cause as their understanding of

the defect grew. They quickly learned how to use Timelapse

to facilitate these activities. They used Timelapse to accelerate

familiar tasks, rather than redesigning their workﬂow around

record/replay capabilities. In the Colorpicker task, participant

11 used Timelapse to compare program state before and after

each call in the

mousemove

event handler, and then used

Timelapse to move back and forth in time when bisecting the

speciﬁc calls that caused the widget’s RGB values to update

incorrectly. Participants in the control condition appeared to

achieve the same strategy more slowly by interleaving changes

to breakpoints and manual reproduction.

Timelapse distracted less-skilled developers.

Those who

only achieved partial or limited success had trouble integrat-

ing Timelapse into their workﬂow. We partially attribute this

to differences in participants’ prior debugging experiences and

strategies. The less successful participants used ad-hoc, op-

portunistic debugging strategies; overlooked important source

code or runtime state; and were led astray by unveriﬁed as-

sumptions. Consequently, even when these developers used

Timelapse, they did not use it to a productive end.

Summary

In our study, developers used Timelapse to automat-

ically reproduce program behavior during debugging tasks, but

this capability alone did not signiﬁcantly affect task times, task

success, or time spent reproducing behaviors. For developers

who employed systematic debugging strategies, Timelapse

was useful for quickly reproducing behaviors and navigating

to important program states. Timelapse distracted developers

who used ad-hoc or opportunistic strategies, or who were un-

familiar with standard debugging tools. Timelapse was used

to accelerate the reproduction steps of existing strategies, but

did not seem to affect strategy selection during our short study.

As with any new tool, it appears that some degree of training

and experience is necessary to fully exploit the tool’s beneﬁts.

In our small study, the availability of Timelapse had no statis-

tically signiﬁcant effects on participants’ speed or success in

●

Colorpicker Space Invaders

1000

500

2000

2500

Task Time (s)

Colorpicker Space Invaders

100

Task Completion (%)

Task Condition Control Timelapse

Figure 8. A summary of task time and success per condition and task.

Box plots show outliers (points), median (thick bar), and 1st and 3rd

quartiles (bottom and top of box, respectively). There was no statis-

tically signiﬁcant difference in performance between participants who

used standard tools and those who had access to Timelapse.

completing tasks. Figure 8 shows task success and task time

per task and condition. In future work, we plan to study how

long-term use of Timelapse during daily development affects

debugging strategies. We also plan to investigate how record-

ings can improve bug reporting practices and communication

between bug reporters and bug ﬁxers.

RELATED WORK

Visualizing and Exploring Recordings and Traces

In contrast to the dearth of interactive record/replay tools,

there have been many tools [12, 24] to visualize, navigate,

and explore execution traces

generated by logging instrumen-

tation. This is because execution traces are several orders

of magnitude larger and contain very low-level details: the

only way to understand them is to use elaborate search, anal-

ysis, and visualization tools. While Timelapse visualizes an

execution as the temporal ordering of its inputs, trace-based de-

bugging tools [12, 32] infer and display higher-level structural

or causal relationships observed during execution. Timelapse’s

affordances primarily support navigation through the record-

ing with respect to program inputs, while trace-based tools

focus directly on aids to program comprehension (such as sup-

porting causal inference or answering questions about what

happened [12]).

Unfortunately, the practicality of many trace-based debug-

ging tools is limited by the performance of modern hardware

and the size of generated execution traces. Modern proﬁlers,

logging, and lightweight instrumentation systems [2] (and vi-

sualization tools [32] built on top of them) have acceptable

performance because they capture infrequent high-level data

or perform sampling. In contrast, heavyweight ﬁne-grained ex-

ecution trace collection introduces up to an order of magnitude

slowdown [12, 20]. Generated traces and their indices [23, 24]

are very large and often limited by the size of main memory.

Supporting Behavior Reproduction and Dissemination

To the best of our knowledge, deterministic record/replay sys-

tems that support dissemination of behaviors have only been

Execution traces consist of intermediate program states logged over

time, while Dolos’s recordings consist only of the program’s inputs.

Development

UIST’13, October 8–11, 2013, St. Andrews, UK

481

widely deployed as part of video game engines [6]. Recordings

of gameplay are artifacts shared between users for entertain-

ment and education. These recordings are also a critical tool

for debugging video game engines and their network proto-

cols [30]. In the wider software development community, bug

reporting systems [8] and practices [36] emphasize the sharing

of evidence such as program output (e.g., screenshots, stack

traces, logs, memory dumps) and program input (e.g, test

cases, conﬁgurations, and ﬁles). Developers investigate bug

reports with user-written reproduction steps.

While we have focused on the utility of record/replay systems

for debugging, such systems are also useful for creating and

evaluating software. Prior work has used record/replay of

real captured data to provide a consistent, interactive means

for prototyping sensor processing [3, 19] and computer vi-

sion [10] algorithms. More generally, macro-replay systems

for reproducing user [31] and network [29] input are used

for prototyping and testing web applications and other user

interfaces. Dolos recordings contain a superset of these in-

puts; it is possible to synthesize a macro (i.e, automated test

case) for use with other tools. The JSBench tool [25] uses this

strategy to synthesize standalone web benchmarks. Derived

inputs may improve the results of state-exploration tools such

as Crawljax [17] by providing real, captured input traces.

CONCLUSION AND FUTURE WORK

Together, Timelapse and Dolos constitute the ﬁrst toolchain

designed for interactively capturing and replaying web ap-

plication behaviors during debugging. Timelapse focuses on

browsing, visualizing, and navigating program states to sup-

port behavior reproduction during debugging tasks. Our user

study conﬁrmed that behavior reproduction was a signiﬁcant

activity in realistic debugging tasks, and Timelapse assisted

some developers in locating and automatically reproducing

behaviors of interest. The Dolos infrastructure uses a novel

adaptation of instruction-counting record/replay techniques to

reproduce web application behaviors. Our prototype demon-

strates that deterministic record/replay can be implemented

within browsers in an additive way—without impacting per-

formance or determinism, impeding tool use, or requiring

conﬁguration—and is a platform for new debugging aids.

Prior work assumes that executions are in short supply during

debugging, and that developers know a priori what sorts of

analysis and data they want before reproducing behavior. In

future work, we want to disrupt this status quo. On-demand

replay (in the foreground, background, or ofﬂine) could

change the feasibility of useful program understanding

tools [12] or dynamic analyses [4] that, heretofore, have

been considered too expensive for practical (always-on)

use. Using the Dolos infrastructure, we intend to transform

prior works in dynamic analysis and trace visualization into

on-demand, interactive tools that a developer can quickly

employ when necessary. We believe that when combined

with on-demand replay, post mortem trace visualization and

program understanding tools will become in vivo tools for

understanding program behavior at runtime.

ACKNOWLEDGEMENTS

This material is based in part upon work supported by the

National Science Foundation under Grant Numbers CCF-

0952733 and CCF-1153625. Any opinions, ﬁndings, and

conclusions or recommendations expressed in this material are

those of the author(s) and do not necessarily reﬂect the views

of the National Science Foundation.

REFERENCES

1. Andrica, S., and Candea, G. WaRR: A tool for

high-ﬁdelity web application record and replay. In DSN

(2011).

2. Cantrill, B. M., Shapiro, M. W., and Leventhal, A. H.

Dynamic instrumentation of production systems. In

USENIX ATC (2004).

3. Cardenas, T., Bastea-Forte, M., Ricciardi, A., Hartmann,

B., and Klemmer, S. R. Testing physical computing

prototypes through time-shifted and simulated input

traces. In UIST (2008).

4. Chow, J., Garﬁnkel, T., and Chen, P. M. Decoupling

dynamic program analysis from execution in virtual

environments. In USENIX ATC (2008).

5. Cornelis, F., Georges, A., Christiaens, M., Ronsse, M.,

Ghesquiere, T., and Bosschere, K. D. A taxonomy of

execution replay systems. In SSGRR (2003).

Dickinson, P. Instant replay: Building a game engine with

reproducible behavior. In Gamasutra (July 2001).

7. Dunlap, G. W., King, S. T., Cinar, S., Basrai, M. A., and

Chen, P. M. ReVirt: enabling intrusion analysis through

virtual-machine logging and replay. SIGOPS Oper. Syst.

Rev. 36, SI (Dec. 2002), 211–224.

8. Glerum, K., Kinshuman, K., Greenberg, S., Aul, G.,

Orgovan, V., Nichols, G., Grant, D., Loihle, G., and Hunt,

G. Debugging in the (very) large: Ten years of

implementation and practice. In SOSP (2009).

9. Guo, Z., Wang, X., Tang, J., Liu, X., Xu, Z., Wu, M.,

Kaashoek, M. F., and Zhang, Z. R2: An application-level

kernel for record and replay. In OSDI (2008).

10. Kato, J., McDirmid, S., and Cao, X. DejaVu: Integrated

support for developing interactive camera-based

programs. In UIST (2012).

11. King, S. T., Dunlap, G. W., and Chen, P. M. Debugging

operating systems with time-traveling virtual machines.

In USENIX ATC (2005).

12. Ko, A. J., and Myers, B. A. Extracting and answering

why and why not questions about Java program output.

ACM Trans. Softw. Eng. Methodol. 20, 2 (Sept. 2010),

4:1–4:36.

13. Ko, A. J., Myers, B. A., Coblenz, M. J., and Aung, H. H.

An exploratory study of how developers seek, relate, and

collect relevant information during software maintenance

tasks. IEEE Trans. Softw. Eng. 32, 12 (Dec. 2006),

971–987.

14.

Kuhn, A., and Greevy, O. Exploiting the analogy between

traces and signal processing. In ICSM (2006).

Development

UIST’13, October 8–11, 2013, St. Andrews, UK

482

15. Leshed, G., Haber, E. M., Matthews, T., and Lau, T.

CoScripter: Automating and sharing how-to knowledge

in the enterprise. In CHI (2008).

16. Mesbah, A., and van Deursen, A. Invariant-based

automatic testing of AJAX user interfaces. In ICSE

(2009).

17.

Mesbah, A., van Deursen, A., and Lenselink, S. Crawling

Ajax-based web applications through dynamic analysis of

user interface state changes. ACM Transactions on the

Web (TWEB) 6, 1 (2012), 3:1–3:30.

18. Mickens, J., Elson, J., and Howell, J. Mugshot:

deterministic capture and replay for JavaScript

applications. In NSDI (2010).

19. Newman, M. W., Ackerman, M. S., Kim, J., Prakash, A.,

Hong, Z., Mandel, J., and Dong, T. Bringing the ﬁeld into

the lab: Supporting capturing and RePlay of contextual

data for design. In UIST (2010).

20.

O’Callahan, R. Efﬁcient collection and storage of indexed

program traces, 2006.

http://www.ocallahan.org/Amber.pdf.

21. Oney, S., and Myers, B. FireCrystal: Understanding

interactive behaviors in dynamic web pages. In VL/HCC

(2009).

22. Parnin, C., and Orso, A. Are automated debugging

techniques actually helping programmers? In ISSTA

(2011).

23. Pothier, G., and Tanter, E. Summarized trace indexing

and querying for scalable back-in-time debugging. In

ECOOP (2011).

24.

Pothier, G., Tanter, E., and Piquer, J. Scalable omniscient

debugging. In OOPSLA (2007).

25. Richards, G., Gal, A., Eich, B., and Vitek, J. Automated

construction of JavaScript benchmarks. In OOPSLA

(2011).

26. Robillard, M. P., Coelho, W., and Murphy, G. C. How

effective developers investigate source code: an

exploratory study. IEEE Trans. Softw. Eng. 30, 12 (Dec.

2004), 889–903.

27. Saito, Y. Jockey: a user-space library for record-replay

debugging. In AADEBUG (2005).

28. Sillito, J., Murphy, G. C., and Volder, K. D. Asking and

answering questions during a programming change task.

IEEE Trans. Soft. Eng. 34 (2008), 434–451.

29. Telerik. Fiddler web debugging proxy, 2013.

http://www.fiddler2.com/fiddler2/.

30.

Terrano, M., and Bettner, P. 1500 Archers on a 28.8:

Network programming in Age of Empires and beyond. In

Gamasutra (March 2001).

31. The Selenium Project. Selenium WebDriver

documentation, 2012.

http://seleniumhq.org/docs/03_webdriver.html.

32.

umper, J., Bohnet, J., and D

ollner, J. Understanding

complex multithreaded software systems by using trace

visualization. In SOFTVIS (2010).

33. VMWare, I. Replay debugging on Linux, October 2009.

http://www.vmware.com/pdf/ws7_replay_linux_

technote.pdf.

34. Yeh, T., Chang, T.-H., and Miller, R. C. Sikuli: Using

GUI screenshots for search and automation. In UIST

(2009).

35. Zaidman, A., Matthijssen, N., Storey, M.-A., and van

Deursen, A. Connecting client and server-side execution

traces. Empirical Software Engineering (EMSE) 18, 2

(2013), 181–218.

36. Zimmermann, T., Premraj, R., Bettenburg, N., Just, S.,

Schrter, A., and Weiss, C. What makes a good bug report?

IEEE Trans. Soft. Eng. 36, 5 (September 2010), 618–643.

Development

UIST’13, October 8–11, 2013, St. Andrews, UK

483