Acid Manual
Phil Winterbottom
philw@plan9.bell-labs.com
Introduction
Acid is a general purpose, source level symbolic debugger. The debugger is built around a simple command language. The command language, distinct from the language of the program being debugged, provides a flexible user interface that allows the debugger interface to be customized for a specific application or architecture. Moreover, it provides an opportunity to write test and verification code independently of a program’s source code. Acid is able to debug multiple processes provided they share a common set of symbols, such as the processes in a threaded program.
Like other language-based solutions, Acid presents a poor user interface but provides a powerful debugging tool. Application of Acid to hard problems is best approached by writing functions off-line (perhaps loading them with the include function or using the support provided by acme(1)), rather than by trying to type intricate Acid operations at the interactive prompt.
Acid allows the execution of a program to be controlled by operating on its state while it is stopped and by monitoring and controlling its execution when it is running. Each program action that causes a change of execution state is reflected by the execution of an Acid function, which may be user defined. A library of default functions provides the functionality of a normal debugger.
A Plan 9 process is controlled by writing messages to a control file in the proc(3) file system. Each control message has a corresponding Acid function, which sends the message to the process. These functions take a process id (pid) as an argument. The memory and text file of the program may be manipulated using the indirection operators. The symbol table, including source cross reference, is available to an Acid program. The combination allows complex operations to be performed both in terms of control flow and data manipulation.
Input format and whatis
Comments start with // and continue to the end of the line. Input is a series of statements and expressions separated by semicolons. At the top level of the interpreter, the builtin function print is called automatically to display the result of all expressions except function calls. A unary + may be used as a shorthand to force the result of a function call to be printed.
Also at the top level, newlines are treated as semicolons by the parser, so semicolons are unnecessary when evaluating expressions.
When Acid starts, it loads the default program modules, enters interactive mode, and prints a prompt. In this state Acid accepts either function definitions or statements to be evaluated. In this interactive mode statements are evaluated immediately, while function definitions are stored for later invocation.
The whatis operator can be used to report the state of identifiers known to the interpreter. With no argument, whatis reports the name of all defined Acid functions; when supplied with an identifier as an argument it reports any variable, function, or type definition associated with the identifier. Because of the way the interpreter handles semicolons, the result of a whatis statement can be returned directly to Acid without adding semicolons. A syntax error or interrupt returns Acid to the normal evaluation mode; any partially evaluated definitions are lost.
Using the Library Functions
After loading the program binary, Acid loads the portable and architecture-specific library functions that form the standard debugging environment. These files are Acid source code and are human-readable. The following example uses the standard debugging library to show how language and program interact:
% acid /bin/ls
/bin/ls:mips plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/mips
acid: new()
75721: system call _main ADD $-0x14,R29
75721: breakpoint main+0x4 MOVW R31,0x0(R29)
acid: bpset(ls)
acid: cont()
75721: breakpoint ls ADD $-0x16c8,R29
acid: stk()
At pc:0x0000141c:ls /sys/src/cmd/ls.c:87
ls(s=0x0000004d,multi=0x00000000) /sys/src/cmd/ls.c:87
called from main+0xf4 /sys/src/cmd/ls.c:79
main(argc=0x00000000,argv=0x7ffffff0) /sys/src/cmd/ls.c:48
called from _main+0x20 /sys/src/libc/mips/main9.s:10
acid: PC
0xc0000f60
acid: *PC
0x0000141c
acid: ls
0x0000141c
The function new() creates a new process and stops it at the first instruction. This change in state is reported by a call to the Acid function stopped, which is called by the interpreter whenever the debugged program stops. Stopped prints the status line giving the pid, the reason the program stopped and the address and instruction at the current PC. The function bpset makes an entry in the breakpoint table and plants a breakpoint in memory. The cont function continues the process, allowing it to run until some condition causes it to stop. In this case the program hits the breakpoint placed on the function ls in the C program. Once again the stopped routine is called to print the status of the program. The function stk prints a C stack trace of the current process. It is implemented using a builtin Acid function that returns the stack trace as a list; the code that formats the information is all written in Acid. The Acid variable PC holds the address of the cell where the current value of the processor register PC is stored. By indirecting through the value of PC the address where the program is stopped can be found. All of the processor registers are available by the same mechanism.
Types
An Acid variable has one of four types: integer, float, list, or string. The type of a variable is inferred from the type of the right-hand side of the assignment expression which last set its value. Referencing a variable that has not yet been assigned draws a "used but not set" error. Many of the operators may be applied to more than one type; for these operators the action of the operator is determined by the types of its operands. The action of each operator is defined in the Expressions section of this manual.
Variables
Acid has three kinds of variables: variables defined by the symbol table of the debugged program, variables that are defined and maintained by the interpreter as the debugged program changes state, and variables defined and used by Acid programs.
Some examples of variables maintained by the interpreter are the register pointers listed by name in the Acid list variable registers, and the symbol table listed by name and contents in the Acid variable symbols.
The variable pid is updated by the interpreter to select the most recently created process or the process selected by the setproc builtin function.
Formats
In addition to a type, variables have formats. The format is a code letter that determines the printing style and the effect of some of the operators on that variable. The format codes are derived from the format letters used by db(1). By default, symbol table variables and numeric constants are assigned the format code X, which specifies 32-bit hexadecimal. Printing a variable with this code yields the output 0x00123456. The format code of a variable may be changed from the default by using the builtin function fmt. This function takes two arguments, an expression and a format code. After the expression is evaluated the new format code is attached to the result and forms the return value from fmt. The backslash operator is a short form of fmt. The format supplied by the backslash operator must be the format character rather than an expression. If the result is assigned to a variable the new format code is maintained in the variable. For example:
acid: x=10
acid: print(x)
0x0000000a
acid: x = fmt(x, ’D’)
acid: print(x, fmt(x, ’X’))
10 0x0000000a
acid: x
10
acid: x\o
12
The supported format characters are:
o Print two-byte integer in octal.
O Print four-byte integer in octal.
q Print two-byte integer in signed octal.
Q Print four-byte integer in signed octal.
B Print four-byte integer in binary.
d Print two-byte integer in signed decimal.
D Print four-byte integer in signed decimal.
V Print eight-byte integer in signed decimal.
Z Print eight-byte integer in unsigned decimal.
x Print two-byte integer in hexadecimal.
X Print four-byte integer in hexadecimal.
Y Print eight-byte integer in hexadecimal.
u Print two-byte integer in unsigned decimal.
U Print four-byte integer in unsigned decimal.
f Print single-precision floating point number.
F Print double-precision floating point number.
g Print a single precision floating point number in string format.
G Print a double precision floating point number in string format.
b Print byte in hexadecimal.
c Print byte as an ASCII character.
C Like c, with printable ASCII characters represented normally and others printed in the form \xnn.
s Interpret the addressed bytes as UTF characters and print successive characters until a zero byte is reached.
r Print a two-byte integer as a rune.
R Print successive two-byte integers as runes until a zero rune is reached.
i Print as machine instructions.
I As i above, but print the machine instructions in an alternate form if possible: sunsparc and mipsco reproduce the manufacturers’ syntax.
a Print the value in symbolic form.
Complex types
Acid permits the definition of the layout of memory. The usual method is to use the -a flag of the compilers to produce Acid-language descriptions of data structures (see 2c(1)) although such definitions can be typed interactively. The keywords complex, adt, aggr, and union are all equivalent; the compiler uses the synonyms to document the declarations. A complex type is described as a set of members, each containing a format letter, an offset in the structure, and a name. For example, the C structure
struct List {
int type;
struct List *next;
};
is described by the Acid statement
complex List {
’D’ 0 type;
’X’ 4 next;
};
Scope
Variables are global unless they are either parameters to functions or are declared as local in a function body. Parameters and local variables are available only in the body of the function in which they are instantiated. Variables are dynamically bound: if a function declares a local variable with the same name as a global variable, the global variable will be hidden whenever the function is executing. For example, if a function f has a local called main, any function called below f will see the local version of main, not the external symbol.
Addressing
Since the symbol table specifies addresses, to access the value of program variables an extra level of indirection is required relative to the source code. For consistency, the registers are maintained as pointers as well; Acid variables with the names of processor registers point to cells holding the saved registers.
The location in a file or memory image associated with an address is calculated from a map associated with the file. Each map contains one or more quadruples (t, b, e, f), defining a segment named t (usually text, data, regs, or fpregs) mapping addresses in the range b through e to the part of the file beginning at offset f. The memory model of a Plan 9 process assumes that segments are disjoint. There can be more than one segment of a given type (e.g., a process may have more than one text segment) but segments may not overlap. An address a is translated to a file address by finding a segment for which b + a < e; the location in the file is then address + f - b.
Usually, the text and initialized data of a program are mapped by segments called text and data. Since a program file does not contain bss, stack, or register data, these data are not mapped by the data segment. The text segment is mapped similarly in the memory image of a normal (i.e., non-kernel) process. However, the segment called *data maps memory from the beginning to the end of the program’s data space. This region contains the program’s static data, the bss, the heap and the stack. A segment called *regs maps the registers; *fpregs maps the floating point registers.
Sometimes it is useful to define a map with a single segment mapping the region from 0 to 0xFFFFFFFF; such a map allows the entire file to be examined without address translation. The builtin function map examines and modifies Acid’s map for a process.
Name Conflicts
Name conflicts between keywords in the Acid language, symbols in the program, and previously defined functions are resolved when the interpreter starts up. Each name is made unique by prefixing enough $ characters to the front of the name to make it unique. Acid reports a list of each name change at startup. The report looks like this:
/bin/sam: mips plan 9 executable
/lib/acid/port
/lib/acid/mips
Symbol renames:
append=$append T/0xa4e40
acid:
The symbol append is both a keyword and a text symbol in the program. The message reports that the text symbol is now named $append.
Expressions
Operators have the same binding and precedence as in C. For operators of equal precedence, expressions are evaluated from left to right.
Boolean expressions
If an expression is evaluated for a boolean condition the test performed depends on the type of the result. If the result is of integer or floating type the result is true if the value is non-zero. If the expression is a list the result is true if there are any members in the list. If the expression is a string the result is true if there are any characters in the string.
primary-expression:
identifier
identifier : identifier
constant
( expression )
{ elist }
elist:
expression
elist , expression
An identifier may be any legal Acid variable. The colon operator returns the address of parameters or local variables in the current stack of a program. For example:
*main:argc
prints the number of arguments passed into main. Local variables and parameters can only be referenced after the frame has been established. It may be necessary to step a program over the first few instructions of a breakpointed function to properly set the frame.
Constants follow the same lexical rules as C. A list of expressions delimited by braces forms a list constructor. A new list is produced by evaluating each expression when the constructor is executed. The empty list is formed from {}.
acid: x = 10
acid: l = { 1, x, 2\D }
acid: x = 20
acid: l
{0x00000001 , 0x0000000a , 2 }
Lists
Several operators manipulate lists.
list-expression:
primary-expression
head primary-expression
tail primary-expression
append expression , primary-expression
delete expression , primary-expression
The primary-expression for head and tail must yield a value of type list. If there are no elements in the list the value of head or tail will be the empty list. Otherwise head evaluates to the first element of the list and tail evaluates to the rest.
acid: head {}
{}
acid: head {1, 2, 3, 4}
0x00000001
acid: tail {1, 2, 3, 4}
{0x00000002 , 0x00000003 , 0x00000004 }
The first operand of append and delete must be an expression that yields a list. Append places the result of evaluating primary-expression at the end of the list. The primary-expression supplied to delete must evaluate to an integer; delete removes the n’th item from the list, where n is integral value of primary-expression. List indices are zero-based.
acid: append {1, 2}, 3
{0x00000001 , 0x00000002 , 0x00000003 }
acid: delete {1, 2, 3}, 1
{0x00000001 , 0x00000003 }
Assigning a list to a variable copies a reference to the list; if a list variable is copied it still points at the same list. To copy a list, the elements must be copied piecewise using head and append.
Operators
postfix-expression:
list-expression
postfix-expression [ expression ]
postfix-expression ( argument-list )
postfix-expression . tag
postfix-expression -> tag
postfix-expression ++
postfix-expression --
argument-list:
expression
argument-list , expression
The [ expression ] operator performs indexing. The indexing expression must result in an expression of integer type, say n. The operation depends on the type of postfix-expression. If the postfix-expression yields an integer it is assumed to be the base address of an array in the memory image. The index offsets into this array; the size of the array members is determined by the format associated with the postfix-expression. If the postfix-expression yields a string the index operator fetches the n’th character of the string. If the index points beyond the end of the string, a zero is returned. If the postfix-expression yields a list then the indexing operation returns the n’th item of the list. If the list contains less than n items the empty list {} is returned.
The ++ and -- operators increment and decrement integer variables. The amount of increment or decrement depends on the format code. These postfix operators return the value of the variable before the increment or decrement has taken place.
unary-expression:
postfix-expression
++ unary-expression
-- unary-expression
unary-operator: one of
* @ + - ~ !
The operators * and @ are the indirection operators. @ references a value from the text file of the program being debugged. The size of the value depends on the format code. The * operator fetches a value from the memory image of a process. If either operator appears on the left-hand side of an assignment statement, either the file or memory will be written. The file can only be modified when Acid is invoked with the -w option. The prefix ++ and -- operators perform the same operation as their postfix counterparts but return the value after the increment or decrement has been performed. Since the ++ and * operators fetch and increment the correct amount for the specified format, the following function prints correct machine instructions on a machine with variable length instructions, such as the 68020 or 386:
defn asm(addr)
{
addr = fmt(addr, ’i’);
loop 1, 10 do
print(*addr++, "\n");
}
The operators ~ and ! perform bitwise and logical negation respectively. Their operands must be of integer type.
cast-expression:
unary-expression
unary-expression \ format-char
( complex-name ) unary-expression
A unary expression may be preceded by a cast. The cast has the effect of associating the value of unary-expression with a complex type structure. The result may then be dereferenced using the . and -> operators.
An Acid variable may be associated with a complex type to enable accessing the type’s members:
acid: complex List {
’D’ 0 type;
’X’ 4 next;
};
acid: complex List lhead
acid: lhead.type
10
acid: lhead = ((List)lhead).next
acid: lhead.type
-46
Note that the next field cannot be given a complex type automatically.
When entered at the top level of the interpreter, an expression of complex type is treated specially. If the type is called T and an Acid function also called T exists, then that function will be called with the expression as its argument. The compiler options -a and -aa will generate Acid source code defining such complex types and functions; see 2c(1).
A unary-expression may be qualified with a format specifier using the \ operator. This has the same effect as passing the expression to the fmt builtin function.
multiplicative-expression:
cast-expression
multiplicative-expression * multiplicative-expression
multiplicative-expression / multiplicative-expression
multiplicative-expression % multiplicative-expression
These operate on integer and float types and perform the expected operations: * multiplication, / division, % modulus.
additive-expression:
multiplicative-expression
additive-expression + multiplicative-expression
additive-expression - multiplicative-expression
These operators perform as expected for integer and float operands. Unlike in C, + and - do not scale the addition based on the format of the expression. This means that i=i+1 will always add 1 but i++ will add the size corresponding to the format stored with i. If both operands are of either string or list type then addition is defined as concatenation. Adding a string and an integer is treated as concatenation with the Unicode character corresponding to the integer. Subtraction is undefined for strings and lists.
shift-expression:
additive-expression
shift-expression << additive-expression
shift-expression >> additive-expression
The >> and << operators perform bitwise right and left shifts respectively. Both require operands of integer type.
relational-expression:
relational-expression < shift-expression
relational-expression > shift-expression
relational-expression <= shift-expression
relational-expression >= shift-expression
equality-expression:
relational-expression
relational-expression == equality-expression
relational-expression != equality-expression
The comparison operators are < (less than), > (greater than), <= (less than or equal to), >= (greater than or equal to), == (equal to) and != (not equal to). The result of a comparison is 0 if the condition is false, otherwise 1. The relational operators can only be applied to operands of integer and float type. The equality operators apply to all types. Comparing mixed types is legal. Mixed integer and float compare on the integral value. Other mixtures are always unequal. Two lists are equal if they have the same number of members and a pairwise comparison of the members results in equality.
AND-expression:
equality-expression
AND-expression & equality-expression
XOR-expression:
AND-expression
XOR-expression ^ AND-expression
OR-expression:
XOR-expression
OR-expression | XOR-expression
These operators perform bitwise logical operations and apply only to the integer type. The operators are & (logical and), ^ (exclusive or) and | (inclusive or).
logical-AND-expression:
OR-expression
logical-AND-expression && OR-expression
logical-OR-expression:
logical-AND-expression
logical-OR-expression || logical-AND-expression
The && operator returns 1 if both of its operands evaluate to boolean true, otherwise 0. The || operator returns 1 if either of its operands evaluates to boolean true, otherwise 0.
Statements
if expression then statement else statement
if expression then statement
The expression is evaluated as a boolean. If its value is true the statement after the then is executed, otherwise the statement after the else is executed. The else portion may be omitted.
while expression do statement
In a while loop, the statement is executed while the boolean expression evaluates true.
loop startexpr, endexpr do statement
The two expressions startexpr and endexpr are evaluated prior to loop entry. Statement is evaluated while the value of startexpr is less than or equal to endexpr. Both expressions must yield integer values. The value of startexpr is incremented by one for each loop iteration. Note that there is no explicit loop variable; the expressions are just values.
return expression
return terminates execution of the current function and returns to its caller. The value of the function is given by expression. Since return requires an argument, nil-valued functions should return the empty list {}.
local variable
The local statement creates a local instance of variable, which exists for the duration of the instance of the function in which it is declared. Binding is dynamic: the local variable, rather than the previous value of variable, is visible to called functions. After a return from the current function the previous value of variable is restored.
If Acid is interrupted, the values of all local variables are lost, as if the function returned.
defn function-name ( parameter-list ) body
parameter-list:
variable
parameter-list , variable
body:
{ statement }
Functions are introduced by the defn statement. The definition of parameter names suppresses any variables of the same name until the function returns. The body of a function is a list of statements enclosed by braces.
Code variables
Acid permits the delayed evaluation of a parameter to a function. The parameter may then be evaluated at any time with the eval operator. Such parameters are called code variables and are defined by prefixing their name with an asterisk in their declaration.
For example, this function wraps up an expression for later evaluation:
acid: defn code(*e) { return e; }
acid: x = code(v+atoi("100")\D)
acid: print(x)
(v+atoi("100"))\D;
acid: eval x
<stdin>:5: (error) v used but not set
acid: v=5
acid: eval x
105
Source Code Management
Acid provides the means to examine source code. Source code is represented by lists of strings. Builtin functions provide mapping from address to lines and vice-versa. The default debugging environment has the means to load and display source files.
Builtin Functions
The Acid interpreter has a number of builtin functions, which cannot be redefined. These functions perform machine- or operating system-specific functions such as symbol table and process management. The following section presents a description of each builtin function. The notation {} is used to denote the empty list, which is the default value of a function that does not execute a return statement. The type and number of parameters for each function are specified in the description; where a parameter can be of any type it is specified as type item.
integer access(string) Check if a file can be read
Access returns the integer 1 if the file name in string can be read by the builtin functions file, readfile, or include, otherwise 0. A typical use of this function is to follow a search path looking for a source file; it is used by findsrc.
if access("main.c") then
return file("main.c");
float atof(string) Convert a string to float
atof converts the string supplied as its argument into a floating point number. The function accepts strings in the same format as the C function of the same name. The value returned has the format code f. atof returns the value 0.0 if it is unable to perform the conversion.
acid: +atof("10.4e6")
1.04e+07
integer atoi(string) Convert a string to an integer
atoi converts the argument to an integer value. The function accepts strings in the same format as the C function of the same name. The value returned has the format code D. atoi returns the integer 0 if it is unable to perform a conversion.
acid: +atoi("-1255")
-1255
{} error(string) Generate an interpreter error
error generates an error message and returns the interpreter to interactive mode. If an Acid program is running, it is aborted. Processes being debugged are not affected. The values of all local variables are lost. error is commonly used to stop the debugger when some interesting condition arises in the debugged program.
while 1 do {
step();
if *main != @main then
error("memory corrupted");
}
list file(string) Read the contents of a file into a list
file reads the contents of the file specified by string into a list. Each element in the list is a string corresponding to a line in the file. file breaks lines at the newline character, but the newline characters are not returned as part each string. file returns the empty list if it encounters an error opening or reading the data.
acid: print(file("main.c")[0])
#include <u.h>
integer filepc(string) Convert source address to text address
filepc interprets its string argument as a source file address in the form of a file name and line offset. filepc uses the symbol table to map the source address into a text address in the debugged program. The integer return value has the format X. filepc returns an address of -1 if the source address is invalid. The source file address uses the same format as acme(1). This function is commonly used to set breakpoints from the source text.
acid: bpset(filepc("main:10"))
acid: bptab()