C2Rust Manual

C2Rust helps you migrate C99-compliant code to Rust. It provides:

a C to Rust translator
a Rust code refactoring tool
tools to cross-check execution of the C code against the new Rust code

The translator (or transpiler), produces unsafe Rust code that closely mirrors the input C code. The primary goal of the translator is to produce code that is functionally identical to the input C code. Generating safe or idomatic Rust is not a goal for the translator. Rather, we think the best approach is to gradually rewrite the translated Rust code using dedicated refactoring tools. To this end, we are building a refactoring tool that rewrites unsafe auto-translated Rust into safer idioms.

Some refactoring will have to be done by hand which may introduce errors. We provide plugins for clang and rustc so you can compile and run two binaries and check that they behave identically (at the level of function calls). For details on cross-checking see the cross-checks directory and the cross checking tutorial.

Here's the big picture:

C2Rust overview

To learn more, check out our RustConf'18 talk on YouTube and try the C2Rust translator online at www.c2rust.com.

C2Rust requires LLVM 6 or 7 and its corresponding libraries and clang compiler. Python 3.4 or later, CMake 3.4.3 or later, and openssl (1.0) are also required. These prerequisites may be installed with the following commands, depending on your platform:

Ubuntu 16.04, 18.04 & 18.10:

  apt install build-essential llvm-6.0 clang-6.0 libclang-6.0-dev cmake libssl-dev pkg-config

Arch Linux:

  pacman -S base-devel llvm clang cmake openssl

OS X: XCode command-line tools and recent LLVM (we recommend the Homebrew version) are required.
```
  xcode-select --install
  brew install llvm python3 cmake openssl
```

Finally, a rust installation with Rustup is required on all platforms. You will also need to install rustfmt:

    rustup component add rustfmt-preview

cargo build --release

This builds the c2rust tool in the target/release/ directory.

On OS X with Homebrew LLVM, you need to point the build system at the LLVM installation as follows:

LLVM_CONFIG_PATH=/usr/local/opt/llvm/bin/llvm-config cargo build

If you have trouble with cargo build, the developer docs provide more details on the build system.

To translate C files specified in compile_commands.json (see below), run the c2rust tool with the transpile subcommand:

c2rust transpile compile_commands.json

(The c2rust refactor tool is also available for refactoring Rust code, see refactoring).

The translator requires the exact compiler commands used to build the C code. To provide this information, you will need a standard compile_commands.json file. Many build systems can automatically generate this file, as it is used by many other tools, but see below for recommendations on how to generate this file for common build processes.

Once you have a compile_commands.json file describing the C build, translate the C code to Rust with the following command:

c2rust transpile path/to/compile_commands.json

To generate a Cargo.toml template for a Rust library, add the -e option:

c2rust transpile --emit-build-files path/to/compile_commands.json

To generate a Cargo.toml template for a Rust binary, do this:

c2rust transpile --main myprog path/to/compile_commands.json

Where --main myprog tells the transpiler to use the main method from myprog.rs as the entry point.

The translated Rust files will not depend directly on each other like normal Rust modules. They will export and import functions through the C API. These modules can be compiled together into a single static Rust library or binary.

There are several known limitations in this translator. The translator will emit a warning and attempt to skip function definitions that cannot be translated.

The compile_commands.json file can be automatically created using either cmake, intercept-build, or bear.

It may be a good idea to remove optimizations(-OX) from the compile commands file, as there are optimization builtins which we do not support translating.

When creating the initial build directory with cmake specify -DCMAKE_EXPORT_COMPILE_COMMANDS=1. This only works on projects configured to be built by cmake. This works on Linux and MacOS.

cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=1 ...

intercept-build (part of the scan-build tool) is recommended for non-cmake projects. intercept-build is bundled with clang under tools/scan-build-py but a standalone version can be easily installed via PIP with:

pip install scan-build

Usage:

intercept-build <build command>

You can also use intercept-build to generate a compilation database for compiling a single C file, for example:

intercept-build sh -c "cc program.c"

If you have bear installed, it can be used similarly to intercept-build:

bear <build command>

The transpiler module is invoked using the transpile sub-command of c2rust:

c2rust transpile [args] compile_commands.json [-- extra-clang-args]

The following arguments control the basic transpiler behavior:

--emit-modules - Emit each translated Rust file as a module (the default is to make each file its own crate).
--fail-on-error - Fail instead of warning if a source file cannot be fully translated.
--reduce-type-annotations - Do not emit explicit type annotations when unnecessary.
--translate-asm - Translate C inline assembly into corresponding Rust inline assembly. The translated assembly is unlikely to work as-is due to differences between GCC and LLVM (used in Rust) inline assembly styles, but it can provide a starting point for manual translation.
-f <regex>, --filter <regex> - Only translate files based on the regular expression used.

The transpiler can create skeleton cargo build files for the translated Rust sources, controlled by the following options:

-e, --emit-build-files - Emit cargo build files to build the translated Rust code as a library. Build files are emitted in the directory specified by --output-dir, or if not specified, the directory containing compile_commands.json. This will not overwrite existing files, so remove these build files before re-creating build files. (implies --emit-modules)
-m <main_module>, --main <main_module> - Emit cargo build files to build the translated Rust code as a binary. The main function must be found in the specified module (C source file) <main_module>. <main_module> should be the bare module name, not including the .rs extension. Build files are emitted in the directory specified by --output-dir, or if not specified, the directory containing compile_commands.json. This will not overwrite existing files, so remove this build file directory before re-creating build files. (implies --emit-build-files)

The transpiler can instrument the transpiled Rust code for cross-checking. The following options control this instrumentation:

-x, --cross-checks - Add macros and build files for cross-checking.
--use-fakechecks - Link against the fakechecks library for cross-checking instead of using the default online checks.
-X <config>, --cross-check-config <config> - Use the given config file as the cross-checking config.

The c2rust-transpile library uses the c2rust-ast-exporter library to translate C code to Rust. The ast-exporter library links against the native clang compiler front end to parse C code and exports the AST for use in the transpiler, which is then implemented purely in Rust.

This document tracks things that we know the translator can't handle, as well as things it probably won't ever handle.

variadic function definitions (blocking Rust issue)
preserving comments (work in progress)
long double and _Complex types (partially blocked by Rust language)
Non x86/64 SIMD function/types and x86/64 SIMD function/types which have no rust equivalent

GNU packed structs (Rust has #[repr(packed)] compatible with #[repr(C)])
inline functions (Rust has #[inline])
restrict pointers (Rust has references)
inline assembly
macros

longjmp/setjmp Although there are LLVM intrinsics for these, it is unclear how these interact with Rust (esp. idiomatic Rust).
jumps into and out of statement expressions We support GNU C statement expressions, but we can not handle jumping into or out of these. Both entry and exit into the expression have to be through the usual fall-through evaluation of the expression.

C2Rust-Bitfields enables you to write structs containing bitfields. It has three primary goals:

Byte compatibility with equivalent C bitfield structs
The ability to take references/pointers to non bitfield fields
Provide methods to read from and write to bitfields

We currently provides a single custom derive, BitfieldStruct, as well as a dependent field attribute bitfield.

Rust 1.30+
Rust Stable, Beta, or Nightly
Little Endian Architecture

Suppose you want to write a super compact date struct which only takes up three bytes. In C this would look like this:

struct date {
    unsigned char day: 5;
    unsigned char month: 4;
    unsigned short year: 15;
} __attribute__((packed));

Clang helpfully provides us with this information:

*** Dumping AST Record Layout
         0 | struct date
     0:0-4 |   unsigned char day
     0:5-8 |   unsigned char month
    1:1-15 |   unsigned short year
           | [sizeof=3, align=1]

And this is enough to build our rust struct:

extern crate libc;

#[repr(C, align(1))]
#[derive(BitfieldStruct)]
struct Date {
    #[bitfield(name = "day", ty = "libc::c_uchar", bits = "0..=4")]
    #[bitfield(name = "month", ty = "libc::c_uchar", bits = "5..=8")]
    #[bitfield(name = "year", ty = "libc::c_ushort", bits = "9..=23")]
    day_month_year: [u8; 3]
}

fn main() {
    let mut date = Date {
        day_month_year: [0; 3]
    };

    date.set_day(18);
    date.set_month(7);
    date.set_year(2000);

    assert_eq!(date.day(), 18);
    assert_eq!(date.month(), 7);
    assert_eq!(date.year(), 2000);
}

Furthermore, C bitfield rules for overflow and signed integers are taken into account.

This crate can generate no_std compatible code when the no_std feature flag is provided.

Since rust doesn't support a build.rs exclusively for tests, you must manually compile the c test code and link it in.

$ clang tests/bitfields.c -c -fPIC -o tests/bitfields.o
$ ar -rc tests/libtest.a tests/bitfields.o
$ RUSTFLAGS="-L `pwd`/tests" cargo test

This crate is inspired by the rust-bitfield, packed_struct, and bindgen crates.

This is a refactoring tool for Rust programs, aimed at removing unsafety from automatically-generated Rust code.

c2rust refactor command line usage is as follows:

c2rust refactor [flags] <command> [command args] -- <input file> [rustc flags]

Flags for c2rust refactor are described by c2rust refactor --help.

See the command documentation for a list of commands, including complete usage and descriptions. Multiple commands can be separated by an argument consisting of a single semicolon, as in c2rust refactor cmd1 arg1 \; cmd2 arg2. (Note the semicolon needs to be escaped to prevent it from being interpreted by the shell.)

c2rust refactor requires rustc command line arguments for the program to be refactored, so that it can use rustc to load and typecheck the source code. For projects built with cargo, pass the --cargo flag to c2rust refactor and it will obtain the right arguments from cargo automatically. Otherwise, you must provide the rustc arguments on the c2rust refactor command line, after a -- separator.

Some commands require the user to "mark" some AST nodes for it to operate on. For example, the rename_struct command requires that the user mark the declaration of the struct that should be renamed.

Each mark associates a "label" with a specific AST node (identified by its NodeId). Labels are used to distinguish different types of marks, and a single node can have any number of marks with distinct labels. For example, when running the func_to_method command, which turns functions into methods in an inherent impl, the user must mark the functions to move with the target label and must mark the destination impl with the dest label. Nodes marked with other labels will be ignored. The set of labels recognized by a command is described in the command's documentation; by default, most commands that use marks operate on target.

The most flexible way of marking nodes is by using the select command. See the command documentation and src/select/mod.rs for details. Note that marks are not preserved across c2rust refactor invocations, so you usually want to run select followed by the command of interest using the ; separator mentioned above.

abstract
autoretype
bitcast_retype
bytestr_to_str
canonicalize_externs
canonicalize_structs
char_literals
clear_marks
commit
convert_cast_as_ptr
convert_format_args
convert_printfs
copy_marks
create_item
delete_items
delete_marks
fix_unused_unsafe
fold_let_assign
func_to_method
generalize_items
ionize
let_x_uninitialized
link_funcs
link_incomplete_types
mark_arg_uses
mark_callers
mark_field_uses
mark_pub_in_mod
mark_related_types
mark_uses
ownership_annotate
ownership_mark_pointers
ownership_split_variants
pick_node
print_marks
print_spans
reconstruct_for_range
reconstruct_while
remove_null_terminator
remove_redundant_casts
remove_redundant_let_types
remove_unused_labels
rename_items_regex
rename_marks
rename_struct
rename_unnamed
reoganize_definitions
replace_items
retype_argument
retype_return
retype_static
rewrite_expr
rewrite_stmts
rewrite_ty
select
select_phase2
set_mutability
set_visibility
sink_lets
sink_unsafe
static_collect_to_struct
static_to_local
static_to_local_ref
struct_assign_to_update
struct_merge_updates
test_analysis_ownership
test_analysis_type_eq
test_debug_callees
test_f_plus_one
test_insert_remove_args
test_one_plus_one
test_reflect
test_replace_stmts
test_typeck_loop
type_fix_rules
uninit_to_default
wrap_api
wrap_extern
wrapping_arith_to_normal

`abstract`

Usage: abstract SIG PAT [BODY]

Replace all instances of pat with calls to a new function whose name and signature is given by sig. Example:

Input:

 1 + 2

After running abstract 'add(x: u32, y: u32) -> u32' 'x + y':

 add(1, 2)

 // Elsewhere:
 fn add(x: u32, y: u32) -> u32 { x + y }

All type and value parameter names in sig act as bindings when matching pat. The captured exprs and types are passed as parameters when building the new call expression. The body of the function is body, if provided, otherwise pat itself.

Non-ident patterns in sig are not supported. It is also an error for any type parameter's name to collide with any value parameter.

If matching with pat fails to capture expressions for any of the value parameters of sig, it is an error. If it fails to capture for a type parameter, the parameter is filled in with _ (infer).

Usage: autoretype 'A: T'...

Marks: A... (specified in command)

Change the type of nodes with mark A to the new type T, propagating changes and inserting casts when possible to satisfy type checking. Multiple simultaneous retypings can be specified in this command as separate arguments. Each argument should be of the form: label: type where label is a mark label and type can be parsed as a valid rust type.

Usage: bitcast_retype PAT REPL

Marks: may read marks depending on PAT

For every type in the crate matching PAT, change the type to REPL. PAT and REPL are types, and can use placeholders in the manner of rewrite_ty. For each definitions whose type has changed, it also inserts mem::transmute calls at each use of the definition to fix discrepancies between the old and new types. (This implies that the original type and its replacement must be transmutable to each other.)

Usage: bytestr_to_str

Marks: target

Convert bytestring literal expressions marked target to string literal expressions.

Note the mark must be placed on the expression, as it is currently difficult to mark a literal node.

Usage: canonicalize_externs MOD_PATH

Marks: target

Replace foreign items ("externs") with references to externs in a different crate or module.

For each foreign fn or static marked target, if a foreign item with the same symbol exists in the module at MOD_PATH (which can be part of an external crate), it deletes the marked foreign item and replaces all its uses with uses of the matching foreign item in MOD_PATH. If a replacement item has a different type than the original, it also inserts the necessary casts at each use of the item.

Usage: canonicalize_structs

Marks: target

For each type definition marked target, delete all other type definitions with the same name, and replace their uses with uses of the target type.

This only works when all the identically-named types have the same definition, such as when all are generated from #includes of the same C header.

Example:

 mod a {
     pub struct Foo { ... }  // Foo: target
 }

 mod b {
     struct Foo { ... }  // same as ::a::Foo

     unsafe fn use_foo(x: &Foo) { ... }
 }

After running canonicalize_structs:

 mod a {
     pub struct Foo { ... }
 }

 mod b {
     // 1. `struct Foo` has been deleted
     // 2. `use_foo` now references `::a::Foo` directly
     unsafe fn use_foo(x: &::a::Foo) { ... }
 }

Note that this transform does not check or adjust item visibility. If the target type is not visible throughout the crate, this may introduce compile errors.

Obsolete - the translator now does this automatically.

Usage: char_literals

Replace integer literals cast to libc::c_char with actual char literals. For example, replaces 65 as libc::c_char with 'A' as libc::c_char.

Usage: clear_marks

Marks: clears all marks

Remove all marks from all nodes.

Usage: commit

Write the current crate to disk (by rewriting the original source files), then read it back in, clearing all mark. This can be useful as a "checkpoint" between two sets of transformations, if applying both sets of changes at once proves to be too much for the rewriter.

This is only useful when the rewrite mode is inplace. Otherwise the "write" part of the operation won't actually change the original source files, and the "read" part will revert the crate to its original form.

Usage: convert_format_args

Marks: target

For each function call, if one of its argument expressions is marked target, then parse that argument as a printf format string, with the subsequent arguments as the format args. Replace both the format string and the args with an invocation of the Rust format_args! macro.

This transformation applies casts to the remaining arguments to account for differences in argument conversion behavior between C-style and Rust-style string formatting. However, it does not attempt to convert the format_args! output into something compatible with the original C function. This results in a type error, so this pass should usually be followed up by an additional rewrite to change the function being called.

Example:

 printf("hello %d\n", 123);

If the string "hello %d\n" is marked target, then running convert_format_string will replace this call with

 printf(format_args!("hello {:}\n", 123 as i32));

At this point, it would be wise to replace the printf expression with a function that accepts the std::fmt::Arguments produced by format_args!.

Usage: copy_marks OLD_MARK NEW_MARK

Marks: reads OLD_MARK; sets NEW_MARK

For every node bearing OLD_MARK, also apply NEW_MARK.

Usage: create_item ITEMS <inside/after> [MARK]

Marks: MARK/target

Parse ITEMS as item definitions, and insert the parsed items either inside (as the first child) or after (as a sibling) of the AST node bearing MARK (default: target). Supports adding items to both mods and blocks.

Note that other itemlikes, such as impl and trait items, are not handled by this command.

Usage: delete_items

Marks: target

Delete all items marked target from the AST. This handles items in both mods and blocks, but doesn't handle other itemlikes.

Usage: delete_marks MARK

Marks: clears MARK

Remove MARK from every node where it appears.

Usage: fix_unused_unsafe

Find unused unsafe blocks and turn them into ordinary blocks.

Usage: fold_let_assign

Fold together lets with no initializer or a trivial one, and subsequent assignments. For example, replace let x; x = 10; with let x = 10;.

Usage: func_to_method

Marks: target, dest

Turn functions marked target into static methods (no self) in the impl block marked dest. Turn functions that have an argument marked target into methods, replacing the named argument with self. Rewrite all uses of marked functions to call the new method versions.

Marked arguments of type T, &T, and &mut T (where T is the Self type of the dest impl) will be converted to self, &self, and &mut self respectively.

Usage: generalize_items VAR [TY]

Marks: target

Replace marked types with generic type parameters.

Specifically: add a new type parameter called VAR to each item marked target, replacing type annotations inside that item that are marked target with uses of the type parameter. Also update all uses of target items, passing TY as the new type argument when used inside a non-target item, and passing the type variable VAR when used inside a target item.

If TY is not provided, it defaults to a copy of the first type annotation that was replaced with VAR.

Example:

 struct Foo {    // Foo: target
     x: i32,     // i32: target
     y: i32,
 }

 fn f(foo: Foo) { ... }  // f: target

 fn main() {
     f(...);
 }

After running generalize_items T:

 // 1. Foo gains a new type parameter `T`
 struct Foo<T> {
     // 2. Marked type annotations become `T`
     x: T,
     y: i32,
 }

 // 3. `f` gains a new type parameter `T`, and passes
 // it through to uses of `Foo`
 fn f<T>(foo: Foo<T>) { ... }
 struct Bar<T> {
     foo: Foo<T>,
 }

 fn main() {
     // 4. Uses outside target items use `i32`, the
     // first type that was replaced with `T`.
     f::<i32>(...);
 }

Usage: ionize

Marks: target

Convert each union marked target to a type-safe Rust enum. The generated enums will have as_variant and as_variant_mut methods for each union field, which panic if the enum is not the named variant. Also updates assignments to union variables to assign one of the new enum variants, and updates uses of union fields to call the new methods instead.

Obsolete - the translator now does this automatically.

Usage: let_x_uninitialized

For each local variable that is uninitialized (let x;), add mem::uninitialized() as an initializer expression.

Usage: link_funcs

Link up function declarations and definitions with matching symbols across modules. For every foreign fn whose symbol matches a fn definition elsewhere in the program, it replaces all uses of the foreign fn with a direct call of the fn definition, and deletes the foreign fn.

Example:

 mod a {
     #[no_mangle]
     unsafe extern "C" fn foo() { ... }
 }

 mod b {
     extern "C" {
         // This resolves to `a::foo` during linking.
         fn foo();
     }

     unsafe fn use_foo() {
         foo();
     }
 }

After running link_funcs:

 mod a {
     #[no_mangle]
     unsafe extern "C" fn foo() { ... }
 }

 mod b {
     // 1. Foreign fn `foo` has been deleted
     unsafe fn use_foo() {
         // 2. `use_foo` now calls `foo` directly
         ::a::foo();
     }
 }

Usage: link_incomplete_types

Link up type declarations and definitions with matching names across modules. For every foreign type whose name matches a type definition elsewhere in the program, it replaces all uses of the foreign type with the type definition, and deletes the foreign type.

Example:

 mod a {
     struct Foo { ... }
 }

 mod b {
     extern "C" {
         type Foo;
     }

     unsafe fn use_foo(x: &Foo) { ... }
 }

After running link_incomplete_types:

 mod a {
     struct Foo { ... }
 }

 mod b {
     // 1. Foreign fn `Foo` has been deleted
     // 2. `use_foo` now references `Foo` directly
     unsafe fn use_foo(x: &::a::Foo) { ... }
 }

Usage: mark_arg_uses ARG_IDX MARK

Marks: reads MARK; sets/clears MARK

For every fn definition bearing MARK, apply MARK to expressions passed in as argument ARG_IDX in calls to that function. Removes MARK from the original function.

Usage: mark_callers MARK

Marks: reads MARK; sets/clears MARK

For every fn definition bearing MARK, apply MARK to call expressions that call that function. Removes MARK from the original function.

Obsolete - use select with match_expr!(typed!(::TheStruct).field) instead

Usage: mark_field_uses FIELD MARK

Marks: reads MARK; sets/clears MARK

For every struct definition bearing MARK, apply MARK to expressions that use FIELD of that struct. Removes MARK from the original struct.

Obsolete - use select instead.

Usage: mark_pub_in_mod MARK

Marks: reads MARK; sets MARK

In each mod bearing MARK, apply MARK to every public item in the module.

Usage: mark_related_types [MARK]

Marks: MARK/target

For each type annotation bearing MARK (default: target), apply MARK to all other type annotations that must be the same type according to (a simplified version of) Rust's typing rules.

For example, in this code:

 fn f(x: i32, y: i32) -> i32 {
     x
 }

The i32 annotations on x and the return type of f are related, because changing these annotations to two unequal types would produce a type error. But the i32 annotation on y is unrelated, and can be changed independently of the other two.

Usage: mark_uses MARK

Marks: reads MARK; sets/clears MARK

For every top-level definition bearing MARK, apply MARK to uses of that definition. Removes MARK from the original definitions.

Usage: ownership_annotate [MARK]

Marks: MARK/target

Run ownership analysis on functions bearing MARK (default: target), and add attributes to each function describing its inferred ownership properties. See analysis/ownership/README.md for details on ownership inference.

Usage: ownership_mark_pointers [MARK]

Marks: reads MARK/target; sets ref, mut, and box

Run ownership analysis on functions bearing MARK (default: target), then for pointer type appearing in their argument and return types, apply one of the marks ref, mut, or box, reflecting the results of the ownership analysis. See analysis/ownership/README.md for details on ownership inference.

Usage: ownership_split_variants [MARK]

Marks: MARK/target

Run ownership analysis on functions bearing MARK (default: target), and split each ownership-polymorphic functions into multiple monomorphic variants. See analysis/ownership/README.md for details on ownership inference.

Test command - not intended for general use.

Usage: pick_node KIND FILE LINE COL

Find a node of kind KIND at location FILE:LINE:COL. If successful, logs the node's ID and span at level info.

Test command - not intended for general use.

Usage: print_marks

Marks: reads all

Logs the ID and label of every mark, at level info.

Test command - not intended for general use.

Usage: print_spans

Print IDs, spans, and pretty-printed source for all exprs, pats, tys, stmts, and items.

Usage: reconstruct_for_range

Replaces i = start; while i < end { ...; i += step; } with for i in (start .. end).step_by(step) { ...; }.

Obsolete - the translator now does this automatically.

Usage: reconstruct_while

Replaces all instances of loop { if !cond { break; } ... } with while loops.

Usage: remove_null_terminator

Marks: target

Remove a trailing \0 character from marked string and bytestring literal expressions.

Note the mark must be placed on the expression, as it is currently difficult to mark a literal node.

Usage: remove_unused_labels

Removes loop labels that are not used in a named break or continue.

Usage: rename_items_regex PAT REPL [FILTER]

Marks: reads FILTER

Replace PAT (a regular expression) with REPL in all item names. If FILTER is provided, only items bearing the FILTER mark will be renamed.

Usage: rename_marks OLD_MARK NEW_MARK

Marks: reads/clears OLD_MARK; sets NEW_MARK

For every node bearing OLD_MARK, remove OLD_MARK and apply NEW_MARK.

Obsolete - use rename_items_regex instead.

Usage: rename_struct NAME

Marks: target

Rename the struct marked target to NAME. Only supports renaming a single struct at a time.

Usage: rename_unnamed

Renames all Idents that have unnamed throughout the Crate, so the Crate can have a completely unique naming scheme for Anonymous Types. This command should be ran after transpiling using c2rust-transpile, and is also mainly to be used when doing the reorganize_definition pass; although this pass can run on any c2rust-transpiled project.

Example:

pub mod foo {
    pub struct unnamed {
        a: i32
    }
}

pub mod bar {
    pub struct unnamed {
        b: usize
    }
}

Becomes:

pub mod foo {
    pub struct unnamed {
        a: i32
    }
}

pub mod bar {
    pub struct unnamed_1 {
        b: usize
    }
}

Usage: replace_items

Marks: target, repl

Replace all uses of items marked target with reference to the item marked repl, then remove all target items.

Usage: retype_argument NEW_TY WRAP UNWRAP

Marks: target

For each argument marked target, change the type of the argument to NEW_TY, and use WRAP and UNWRAP to convert values to and from the original type of the argument at call sites and within the function body.

WRAP should contain an expression placeholder __old, and should convert __old from the argument's original type to NEW_TY. UNWRAP should contain an expression placeholder __new, and should perform the opposite conversion.

Usage: retype_return NEW_TY WRAP UNWRAP

Marks: target

For each function marked target, change the return type of the function to NEW_TY, and use WRAP and UNWRAP to convert values to and from the original type of the argument at call sites and within the function body.

WRAP should contain an expression placeholder __old, and should convert __old from the function's original return type to NEW_TY. UNWRAP should contain an expression placeholder __new, and should perform the opposite conversion.

Usage: retype_static NEW_TY REV_CONV_ASSIGN CONV_RVAL CONV_LVAL [CONV_LVAL_MUT]

Marks: target

For each static marked target, change the type of the static to NEW_TY, using the remaining arguments (which are all all expression templates) to convert between the old and new types at the definition and use sites.

The expression arguments are used as follows:

REV_CONV_ASSIGN: In direct assignments to the static and in its initializer expression, the original assigned value is wrapped (as __old) in REV_CONV_ASSIGN to produce a value of type NEW_TY.
CONV_RVAL: In rvalue contexts, the static is wrapped (as __new) in CONV_RVAL to produce a value of the static's old type.
CONV_LVAL and CONV_LVAL_MUT are similar to CONV_RVAL, but for immutable and mutable lvalue contexts respectively. Especially for CONV_LVAL_MUT, the result of wrapping should be an lvalue expression (such as a dereference or field access), not a temporary, as otherwise updates to the static could be lost. CONV_LVAL_MUT is not required for immutable statics, which cannot appear in mutable lvalue contexts.

Usage: rewrite_expr PAT REPL [FILTER]

Marks: reads FILTER, if set; may read other marks depending on PAT

For every expression in the crate matching PAT, replace it with REPL. PAT and REPL are both Rust expressions. PAT can use placeholders to capture nodes from the matched AST, and REPL can refer to those same placeholders to substitute in the captured nodes. See the matcher module for details on AST pattern matching.

If FILTER is provided, only expressions marked FILTER will be rewritten. This usage is obsolete - change PAT to marked!(PAT, FILTER) to get the same behavior.

Example:

 fn double(x: i32) -> i32 {
     x * 2
 }

After running rewrite_expr '$e * 2' '$e + $e':

 fn double(x: i32) -> i32 {
     x + x
 }

Here $e * 2 matches x * 2, capturing x as $e. Then x is substituted for $e in $e + $e, producing the final expression x + x.

Usage: rewrite_ty PAT REPL [FILTER]

Marks: reads FILTER, if set; may read other marks depending on PAT

For every type in the crate matching PAT, replace it with REPL. PAT and REPL are both Rust types. PAT can use placeholders to capture nodes from the matched AST, and REPL can refer to those same placeholders to substitute in the captured nodes. See the matcher module for details on AST pattern matching.

If FILTER is provided, only expressions marked FILTER will be rewritten. This usage is obsolete - change PAT to marked!(PAT, FILTER) to get the same behavior.

See the documentation for rewrite_expr for an example of this style of rewriting.

Usage: select MARK SCRIPT

Marks: sets MARK; may set/clear other marks depending on SCRIPT

Run node-selection script SCRIPT, and apply MARK to the nodes it selects. See select::SelectOp, select::Filter, and select::parser for details on select script syntax.

Usage: select_phase2 MARK SCRIPT

Marks: sets MARK; may set/clear other marks depending on SCRIPT

Works like select, but stops the compiler's analyses before typechecking happens. This means type information will not available, and script commands that refer to it will fail.

Usage: set_mutability MUT

Marks: target

Set the mutability of all items marked target to MUT. MUT is either imm or mut. This command only affects static items (including extern statics).

Usage: set_visibility VIS

Marks: target

Set the visibility of all items marked target to VIS. VIS is a Rust visibility qualifier such as pub, pub(crate), or the empty string.

Doesn't handle struct field visibility, for now.

Usage: sink_lets

For each local variable with a trivial initializer, move the local's declaration to the innermost block containing all its uses.

"Trivial" is currently defined as no initializer (let x;) or an initializer without any side effects. This transform requires trivial assignments to avoid reordering side effects.

Usage: sink_unsafe

Marks: target

For functions marked target, convert unsafe fn f() { ... } into fn () { unsafe { ... } }. Useful once unsafe argument handling has been eliminated from the function.

Usage: static_collect_to_struct STRUCT VAR

Marks: target

Collect marked statics into a single static struct.

Specifically:

Find all statics marked target. For each one, record its name, type, and initializer expression, then delete it.
Generate a new struct definition named STRUCT. For each marked static, include a field of STRUCT with the same name and type as the static.
Generate a new static mut named VAR whose type is STRUCT. Initialize it using the initializer expressions for the marked statics.
For each marked static foo, replace uses of foo with VAR.foo.

Example:

 static mut FOO: i32 = 100;
 static mut BAR: bool = true;

 unsafe fn f() -> i32 {
     FOO
 }

After running static_collect_to_struct Globals G, with both statics marked:

 struct Globals {
     FOO: i32,
     BAR: bool,
 }

 static mut G: Globals = Globals {
     FOO: 100,
     BAR: true,
 };

 unsafe fn f() -> i32 {
     G.FOO
 }

Usage: static_to_local

Marks: target

Delete each static marked target. For each function that uses a marked static, insert a new local variable definition replicating the marked static.

Example:

 static mut FOO: i32 = 100;  // FOO: target

 unsafe fn f() -> i32 {
     FOO
 }

 unsafe fn g() -> i32 {
     FOO + 1
 }

After running static_to_local:

 // `FOO` deleted

 // `f` gains a new local, replicating `FOO`.
 unsafe fn f() -> i32 {
     let FOO: i32 = 100;
     FOO
 }

 // If multiple functions use `FOO`, each one gets its own copy.
 unsafe fn g() -> i32 {
     let FOO: i32 = 100;
     FOO + 1
 }

Usage: static_to_local_ref

Marks: target, user

For each function marked user, replace uses of statics marked target with uses of newly-introduced reference arguments. Afterward, no user function directly accesses any target static. At call sites of user functions, a reference to the original static is passed in for each new argument if the caller is not itself a user function; otherwise, the caller's own reference argument is passed through. Note this sometimes results in functions gaining arguments corresponding to statics that the function itself does not use, but that its callees do.

Example:

 static mut FOO: i32 = 100;  // FOO: target

 unsafe fn f() -> i32 {  // f: user
     FOO
 }

 unsafe fn g() -> i32 {  // g: user
     f()
 }

 unsafe fn h() -> i32 {
     g()
 }

After running static_to_local_ref:

 static mut FOO: i32 = 100;

 // `f` is a `user` that references `FOO`, so it
 // gains a new argument `FOO_`.
 unsafe fn f(FOO_: &mut i32) -> i32 {
     // References to `FOO` are replaced with `*FOO_`
     *FOO_
 }

 // `g` is a `user` that references `FOO` indirectly,
 // via fellow `user` `f`.
 unsafe fn g(FOO_: &mut i32) -> i32 {
     // `g` passes through its own `FOO_` reference
     // when calling `f`.
     f(FOO_)
 }

 // `h` is not a `user`, so its signature is unchanged.
 unsafe fn h() -> i32 {
     // `h` passes in a reference to the original
     // static `FOO`.
     g(&mut FOO)
 }

Usage: struct_assign_to_update

Replace all struct field assignments with functional update expressions.

Example:

 let mut x: S = ...;
 x.f = 1;
 x.g = 2;

After running struct_assign_to_update:

 let mut x: S = ...;
 x = S { f: 1, ..x };
 x = S { g: 2, ..x };

Usage: struct_merge_updates

Merge consecutive struct updates into a single update.

Example:

 let mut x: S = ...;
 x = S { f: 1, ..x };
 x = S { g: 2, ..x };

After running struct_assign_to_update:

 let mut x: S = ...;
 x = S { f: 1, g: 2, ..x };

Test command - not intended for general use.

Usage: test_analysis_ownership

Runs the ownership analysis and dumps the results to stderr.

Test command - not intended for general use.

Usage: test_analysis_type_eq

Runs the type_eq analysis and logs the result (at level info).

Test command - not intended for general use.

Usage: test_debug_callees

Inspect the details of each Call expression. Used to debug RefactorCtxt::opt_callee_info.

Test command - not intended for general use.

Usage: test_f_plus_one

Replace the expression f(__x) with __x + 1 everywhere it appears.

Test command - not intended for general use.

Usage: test_insert_remove_args INS REM

In each function marked target, insert new arguments at each index listed in INS (a comma-separated list of integers), then delete the arguments whose original indices are listed in REM.

This is used for testing sequence rewriting of fn argument lists.

Test command - not intended for general use.

Usage: test_one_plus_one

Replace the expression 2 with 1 + 1 everywhere it appears.

Test command - not intended for general use.

Usage: test_reflect

Applies path and ty reflection on every expr in the program.

Test command - not intended for general use.

Usage: test_replace_stmts OLD NEW

Replace statement(s) OLD with NEW everywhere it appears.

Test command - not intended for general use.

Usage: test_typeck_loop

Runs a no-op typechecking loop for three iterations. Used to test the typechecking loop and AST re-analysis code.

Usage: type_fix_rules RULE...

Attempts to fix type errors in the crate using the provided rules. Each rule has the form "ectx, actual_ty, expected_ty => cast_expr".

ectx is one of rval, lval, lval_mut, or *, and determines in what kinds of expression contexts the rule applies.
actual_ty is a pattern to be matched against the (reflected) actual expression type.
expected_ty is a pattern to be matched against the (reflected) expected expression type.
cast_expr is a template for generating a cast expression.

For expressions in context ectx, whose actual type matches actual_ty and whose expected type matches expected_ty (and where actual != expected), the expr is substituted into cast_expr to replace the original expr with one of the expected type. During substitution, cast_expr has access to variables captured from both actual_ty and expected_ty, as well as __old containing the original (ill-typed) expression.

Obsolete - works around translator problems that no longer exist.

Usage: uninit_to_default

In local variable initializers, replace mem::uninitialized() with an appropriate default value of the variable's type.

Usage: wrap_api

Marks: target

For each function foo marked target:

Reset the function's ABI to "Rust" (the default)
Remove any #[no_mangle] or #[export_name] attributes
Generate a new wrapper function called foo_wrapper with foo's old ABI and an #[export_name="foo"] attribute.

Calls to foo are left unchanged. The result is that callers from C use the wrapper function, while internal calls use foo directly, and the signature of foo can be changed freely without affecting external callers.

Usage: wrap_extern

Marks: target, dest

For each foreign function marked target, generate a wrapper function in the module marked dest, and rewrite all uses of the function to call the wrapper instead.

Example:

 extern "C" {
     fn foo(x: i32) -> i32;
 }

 mod wrappers {
     // empty
 }

 fn main() {
     let x = unsafe { foo(123) };
 }

After transformation, with fn foo marked target and mod wrappers marked dest:

 extern "C" {
     fn foo(x: i32) -> i32;
 }

 mod wrappers {
     unsafe fn foo(x: i32) -> i32 {
         ::foo(x)
     }
 }

 fn main() {
     let x = unsafe { ::wrappers::foo(123) };
 }

Note that this also replaces the function in expressions that take its address, which may cause problem as the wrapper function has a different type that the original (it lacks the extern "C" ABI qualifier).

Usage: wrapping_arith_to_normal

Replace all uses of wrapping arithmetic methods with ordinary arithmetic operators. For example, replace x.wrapping_add(y) with x + y.

Reference

Refactoring module

Tables

Stmt	AST Stmt
Expr	AST Expr

Fields

refactor

Global refactoring state

Class RefactorState

RefactorState:run_command (name, args)	Run a builtin refactoring command
RefactorState:transform (callback)	Run a custom refactoring transformation

Class MatchCtxt

MatchCtxt:parse_stmts (pat)	Parse statements and add them to this MatchCtxt
MatchCtxt:parse_expr (pat)	Parse an expressiong and add it to this MatchCtxt
MatchCtxt:fold_with (needle, crate, callback)	Find matches of `pattern` within `crate` and rewrite using `callback`
MatchCtxt:get_expr (Expression)	Get matched binding for an expression variable
MatchCtxt:get_stmt (Statement)	Get matched binding for a statement variable
MatchCtxt:try_match (pat, target)	Attempt to match `target` against `pat`, updating bindings if matched.
MatchCtxt:subst (replacement)	Substitute the currently matched AST node with a new AST node

Class TransformCtxt

TransformCtxt:replace_stmts_with (needle, callback)	Replace matching statements using given callback
TransformCtxt:replace_expr_with (needle, callback)	Replace matching expressions using given callback
TransformCtxt:match ()	Create a new, empty MatchCtxt
TransformCtxt:get_ast (node)	Retrieve a Lua version of an AST node

<h2 class="section-header "><a name="Tables"></a>Tables</h2>

<dl class="function">
<dt>
<a name = "Stmt"></a>
<strong>Stmt</strong>
</dt>
<dd>
AST Stmt


<h3>Fields:</h3>
<ul>
    <li><span class="parameter">type</span>
     "Stmt"
    </li>
    <li><span class="parameter">kind</span>
        <span class="types"><a class="type" href="https://www.lua.org/manual/5.3/manual.html#6.4">string</a></span>
     <code>StmtKind</code> of this statement</p>

StmtKind::Local only:

ty AstNode Type of local (optional)

init AstNode Initializer of local (optional)

pat AstNode Name of local

StmtKind::Item only:

item AstNode Item node

StmtKind::Semi and StmtKind::Expr only:

expr AstNode Expression in this statement

Expr

AST Expr

<h3>Fields:</h3>
<ul>
    <li><span class="parameter">type</span>
     "Expr"
    </li>
    <li><span class="parameter">kind</span>
        <span class="types"><a class="type" href="https://www.lua.org/manual/5.3/manual.html#6.4">string</a></span>
     <code>ExprKind</code> of this expression</p>

ExprKind::Lit only:

value Literal value of this expression

Fields

<dl class="function">
<dt>
<a name = "refactor"></a>
<strong>refactor</strong>
</dt>
<dd>
Global refactoring state


<ul>
    <li><span class="parameter">refactor</span>
     RefactorState object
    </li>
</ul>

Class RefactorState

      <div class="section-description">
      Refactoring context
      </div>
<dl class="function">
<dt>
<a name = "RefactorState:run_command"></a>
<strong>RefactorState:run_command (name, args)</strong>
</dt>
<dd>
Run a builtin refactoring command


<h3>Parameters:</h3>
<ul>
    <li><span class="parameter">name</span>
        <span class="types"><a class="type" href="https://www.lua.org/manual/5.3/manual.html#6.4">string</a></span>
     Command to run
    </li>
    <li><span class="parameter">args</span>
        <span class="types"><a class="type" href="https://www.lua.org/manual/5.3/manual.html#6.4">{string,...}</a></span>
     List of arguments for the command
    </li>
</ul>

RefactorState:transform (callback)

Run a custom refactoring transformation

<h3>Parameters:</h3>
<ul>
    <li><span class="parameter">callback</span>
        <span class="types"><span class="type">function(TransformCtxt,AstNode)</span></span>
     Transformation function called with a fresh <a href="scripting_api.html#TransformCtxt">TransformCtxt</a> and the crate to be transformed.
    </li>
</ul>

Class MatchCtxt

      <div class="section-description">
      A match context
      </div>
<dl class="function">
<dt>
<a name = "MatchCtxt:parse_stmts"></a>
<strong>MatchCtxt:parse_stmts (pat)</strong>
</dt>
<dd>
Parse statements and add them to this MatchCtxt


<h3>Parameters:</h3>
<ul>
    <li><span class="parameter">pat</span>
        <span class="types"><a class="type" href="https://www.lua.org/manual/5.3/manual.html#6.4">string</a></span>
     Pattern to parse
    </li>
</ul>

<h3>Returns:</h3>
<ol>

       <span class="types"><a class="type" href="scripting_api.html#AstNode">AstNode</a></span>
    The parsed statements
</ol>

MatchCtxt:parse_expr (pat)

Parse an expressiong and add it to this MatchCtxt

<h3>Parameters:</h3>
<ul>
    <li><span class="parameter">pat</span>
        <span class="types"><a class="type" href="https://www.lua.org/manual/5.3/manual.html#6.4">string</a></span>
     Pattern to parse
    </li>
</ul>

<h3>Returns:</h3>
<ol>

       <span class="types"><a class="type" href="scripting_api.html#AstNode">AstNode</a></span>
    The parsed expression
</ol>

MatchCtxt:fold_with (needle, crate, callback)

Find matches of pattern within crate and rewrite using callback

<h3>Parameters:</h3>
<ul>
    <li><span class="parameter">needle</span>
        <span class="types"><a class="type" href="scripting_api.html#AstNode">AstNode</a></span>
     Pattern to search for
    </li>
    <li><span class="parameter">crate</span>
        <span class="types"><a class="type" href="scripting_api.html#AstNode">AstNode</a></span>
     Crate to fold over
    </li>
    <li><span class="parameter">callback</span>
        <span class="types"><span class="type">function(AstNode,MatchCtxt)</span></span>
     Function called for each match. Takes the matching node and a new <a href="scripting_api.html#MatchCtxt">MatchCtxt</a> for that match.
    </li>
</ul>

MatchCtxt:get_expr (Expression)

Get matched binding for an expression variable

<h3>Parameters:</h3>
<ul>
    <li><span class="parameter">Expression</span>
        <span class="types"><a class="type" href="https://www.lua.org/manual/5.3/manual.html#6.4">string</a></span>
     variable pattern
    </li>
</ul>

<h3>Returns:</h3>
<ol>

       <span class="types"><a class="type" href="scripting_api.html#AstNode">AstNode</a></span>
    Expression matched by this binding
</ol>

MatchCtxt:get_stmt (Statement)

Get matched binding for a statement variable

<h3>Parameters:</h3>
<ul>
    <li><span class="parameter">Statement</span>
        <span class="types"><a class="type" href="https://www.lua.org/manual/5.3/manual.html#6.4">string</a></span>
     variable pattern
    </li>
</ul>

<h3>Returns:</h3>
<ol>

       <span class="types"><a class="type" href="scripting_api.html#AstNode">AstNode</a></span>
    Statement matched by this binding
</ol>

MatchCtxt:try_match (pat, target)

Attempt to match target against pat, updating bindings if matched.

<h3>Parameters:</h3>
<ul>
    <li><span class="parameter">pat</span>
        <span class="types"><a class="type" href="scripting_api.html#AstNode">AstNode</a></span>
     AST (potentially with variable bindings) to match with
    </li>
    <li><span class="parameter">target</span>
        <span class="types"><a class="type" href="scripting_api.html#AstNode">AstNode</a></span>
     AST to match against
    </li>
</ul>

<h3>Returns:</h3>
<ol>

       <span class="types"><span class="type">bool</span></span>
    true if match was successful
</ol>

MatchCtxt:subst (replacement)

Substitute the currently matched AST node with a new AST node

<h3>Parameters:</h3>
<ul>
    <li><span class="parameter">replacement</span>
        <span class="types"><a class="type" href="scripting_api.html#AstNode">AstNode</a></span>
     New AST node to replace the currently matched AST. May include variable bindings if these bindings were matched by the search pattern.
    </li>
</ul>

<h3>Returns:</h3>
<ol>

       <span class="types"><a class="type" href="scripting_api.html#AstNode">AstNode</a></span>
    New AST node with variable bindings replaced by their matched values
</ol>

Class TransformCtxt

      <div class="section-description">
      Transformation context
      </div>
<dl class="function">
<dt>
<a name = "TransformCtxt:replace_stmts_with"></a>
<strong>TransformCtxt:replace_stmts_with (needle, callback)</strong>
</dt>
<dd>
Replace matching statements using given callback


<h3>Parameters:</h3>
<ul>
    <li><span class="parameter">needle</span>
        <span class="types"><a class="type" href="https://www.lua.org/manual/5.3/manual.html#6.4">string</a></span>
     Statements pattern to search for, may include variable bindings
    </li>
    <li><span class="parameter">callback</span>
        <span class="types"><span class="type">function(AstNode,MatchCtxt)</span></span>
     Function called for each match. Takes the matching node and a new <a href="scripting_api.html#MatchCtxt">MatchCtxt</a> for that match. See <a href="scripting_api.html#MatchCtxt:fold_with">MatchCtxt:fold_with</a>
    </li>
</ul>

TransformCtxt:replace_expr_with (needle, callback)

Replace matching expressions using given callback

<h3>Parameters:</h3>
<ul>
    <li><span class="parameter">needle</span>
        <span class="types"><a class="type" href="https://www.lua.org/manual/5.3/manual.html#6.4">string</a></span>
     Expression pattern to search for, may include variable bindings
    </li>
    <li><span class="parameter">callback</span>
        <span class="types"><span class="type">function(AstNode,MatchCtxt)</span></span>
     Function called for each match. Takes the matching node and a new <a href="scripting_api.html#MatchCtxt">MatchCtxt</a> for that match. See <a href="scripting_api.html#MatchCtxt:fold_with">MatchCtxt:fold_with</a>
    </li>
</ul>

TransformCtxt:match ()

Create a new, empty MatchCtxt

<h3>Returns:</h3>
<ol>

       <span class="types"><a class="type" href="scripting_api.html#MatchCtxt">MatchCtxt</a></span>
    New match context
</ol>

TransformCtxt:get_ast (node)

Retrieve a Lua version of an AST node

<h3>Parameters:</h3>
<ul>
    <li><span class="parameter">node</span>
        <span class="types"><a class="type" href="scripting_api.html#AstNode">AstNode</a></span>
     AST node handle
    </li>
</ul>

<h3>Returns:</h3>
<ol>

    Struct representation of this AST node. Valid return types are <a href="scripting_api.html#Stmt">Stmt</a>, and <a href="scripting_api.html#Expr">Expr</a>.
</ol>

generated by LDoc 1.4.6 Last updated 2019-02-21 10:38:12

c2rust refactor provides a general-purpose rewriting command, rewrite_expr, for transforming expressions. In its most basic form, rewrite_expr replaces one expression with another, everywhere in the crate:

rewrite_expr '1+1' '2'

Here, all instances of the expression 1+1 (the "pattern") are replaced with 2 (the "replacement").

rewrite_expr parses both the pattern and the replacement as Rust expressions, and compares the structure of the expression instead of its raw text when looking for occurrences of the pattern. This lets it recognize that 1 + 1 and 1 + /* comment */ both match the pattern 1+1 (despite being textually distinct), while 1+11 does not (despite being textually similar).

In rewrite_expr's expression pattern, any name beginning with double underscores is a metavariable. Just as a variable in an ordinary Rust match expression will match any value (and bind it for later use), a metavariable in an expression pattern will match any Rust code. For example, the expression pattern __x + 1 will match any expression that adds 1 to something:

rewrite_expr '__x + 1' '11'

In these examples, the __x metavariable matches the expressions 1, 2 * 3, and f().

When a metavariable matches against some piece of code, the code it matches is bound to the variable for later use. Specifically, rewrite_expr's replacement argument can refer back to those metavariables to substitute in the matched code:

rewrite_expr '__x + 1' '11 * __x'

In each case, the expression bound to the __x metavariable is substituted into the right-hand side of the multiplication in the replacement.

Finally, the same metavariable can appear multiple times in the pattern. In that case, the pattern matches only if each occurence of the metavariable matches the same expression. For example:

rewrite_expr '__x + __x' '2 * __x'

Here a + a and f() + f() are both replaced, but f() + 1 is not because __x cannot match both f() and 1 at the same time.

Suppose we wish to add an argument to an existing function. All current callers of the function should pass a default value of 0 for this new argument. We can update the existing calls like this:

rewrite_expr 'my_func(__x, __y)' 'my_func(__x, __y, 0)'

Every call to my_func now passes a third argument, and we can update the definition of my_func to match.

rewrite_expr supports several special matching forms that can appear in patterns to add extra restrictions to matching.

A pattern such as def!(::foo::f) matches any ident or path expression that resolves to the function whose absolute path is ::foo::f. For example, to replace all expressions referencing the function foo::f with ones referencing foo::g:

rewrite_expr 'def!(::foo::f)' '::foo::g'

This works for all direct references to f, whether by relative path (foo::f), absolute path (::foo::f), or imported identifier (just f, with use foo::f in scope). It can even handle imports under a different name (f2 with use foo::f as f2 in scope), since it checks only the path of the referenced definition, not the syntax used to reference it.

When rewrite_expr attempts to match def!(path) against some expression e, it actually completely ignores the content of e itself. Instead, it performs these steps:

Check rustc's name resolution results to find the definition d that e resolves to. (If e doesn't resolve to a definition, then the matching fails.)
Construct an absolute path dpath referring to d. For definitions in the current crate, this path looks like ::mod1::def1. For definitions in other crates, it looks like ::crate1::mod1::def1.
Match dpath against the path pattern provided as the argument of def!. Then e matches def!(path) if dpath matches path, and fails to match otherwise.

Matching with def! can sometimes fail in surprising ways, since the user-provided path is matched against a generated path that may not appear explicitly anywhere in the source code. For example, this attempt to match HashMap::new does not succeed:

rewrite_expr
    'def!(::std::collections::hash_map::HashMap::new)()'
    '::std::collections::hash_map::HashMap::with_capacity(10)'

The debug_match_expr command exists to diagnose such problems. It takes only a pattern, and prints information about attempts to match it at various points in the crate:

debug_match_expr 'def!(::std::collections::hash_map::HashMap::new)()'

Here, its output includes this line:

def!(): trying to match pattern path(::std::collections::hash_map::HashMap::new) against AST path(::std::collections::HashMap::new)

Which reveals the problem: the absolute path def! generates for HashMap::new uses the reexport at std::collections::HashMap, not the canonical definition at std::collections::hash_map::HashMap. Updating the previous rewrite_expr command allows it to succeed:

rewrite_expr
    'def!(::std::collections::HashMap::new)()'
    '::std::collections::HashMap::with_capacity(10)'

The argument to def! is a path pattern, which can contain metavariables just like the overall expression pattern. For instance, we can rewrite all calls to functions from the foo module:

rewrite_expr 'def!(::foo::__name)()' '123'

Since every definition in the foo module has an absolute path of the form ::foo::(something), they all match the expression pattern def!(::foo::__name).

Like any other metavariable, the ones in a def! path pattern can be used in the replacement expression to substitute in the captured name. For example, we can replace all references to items in the foo module with references to the same-named items in the bar module:

rewrite_expr 'def!(::foo::__name)' '::bar::__name'

Note, however, that each metavariable in a path pattern can match only a single ident. This means foo::__name will not match the path to an item in a submodule, such as foo::one::two. Handling these would require an additional rewrite step, such as rewrite_expr 'def!(::foo::__name1::__name2)' '::bar::__name1::__name2'.

A pattern of the form typed!(e, ty) matches any expression that matches the pattern e, but only if the type of that expression matches the pattern ty. For example, we can perform a rewrite that only affects i32s:

rewrite_expr 'typed!(__e, i32)' '0'

Every expression matches the metavariable __e, but only the i32s (whether literals or variables of type i32) are affected by the rewrite.

Internally, typed! works much like def!. To match an expression e against typed!(e_pat, ty_pat), rewrite_expr follows these steps:

Consult rustc's typechecking results to get the type of e. Call that type rustc_ty.
rustc_ty is an internal, abstract representation of the type, which is not suitable for matching. Construct a concrete representation of rustc_ty, and call it ty.
Match e against e_pat and ty against ty_pat. Then e matches typed!(e_pat, ty_pat) if both matches succeed, and fails to match otherwise.

When matching fails unexpectedly, debug_match_expr is once again useful for understanding the problem. For example, this rewriting command has no effect:

rewrite_expr "typed!(__e, &'static str)" '"hello"'

Passing the same pattern to debug_match_expr produces output that includes the following:

typed!(): trying to match pattern type(&'static str) against AST type(&str)

Now the problem is clear: the concrete type representation constructed for matching omits lifetimes. Replacing &'static str with &str in the pattern causes the rewrite to succeed:

rewrite_expr 'typed!(__e, &str)' '"hello"'

The expression pattern and type pattern arguments of typed!(e, ty) are handled using the normal rewrite_expr matching engine, which means they can contain metavariables and other special matching forms. For example, metavariables can capture both parts of the expression and parts of its type for use in the replacement:

rewrite_expr
    'typed!(Vec::with_capacity(__n), ::std::vec::Vec<__ty>)'
    '::std::iter::repeat(<__ty>::default())
        .take(__n)
        .collect::<Vec<__ty>>()'

Notice that the rewritten code has the correct element type in the call to default, even in cases where the type is not written explicitly in the original expression! The matching of typed! obtains the inferred type information from rustc, and those inferred types are captured by metavariables in the type pattern.

This example demonstrates usage of def! and typed!.

Suppose we have some unsafe code that uses transmute to convert a raw pointer that may be null (*const T) into an optional reference (Option<&T>). This conversion is better expressed using the as_ref method of *const T, and we'd like to apply this transformation automatically.

Here is a basic first attempt:

rewrite_expr 'transmute(__e)' '__e.as_ref()'

This has two major shortcomings, which we will address in order:

It works only on code that calls exactly transmute(foo). The instances that import std::mem and call mem::transmute(foo) do not get rewritten.
It rewrites transmutes between any types, not just *const T to Option<&T>. Only transmutes between those types should be replaced with as_ref.

We want to rewrite calls to std::mem::transmute, regardless of how those calls are written. This is a perfect use case for def!:

rewrite_expr 'def!(::std::intrinsics::transmute)(__e)' '__e.as_ref()'

Now our rewrite catches all uses of transmute, whether they're written as transmute(foo), mem::transmute(foo), or even ::std::mem::transmute(foo).

Notice that we refer to transmute as std::intrinsics::transmute: this is the location of its original definition, which is re-exported in std::mem. See the "def!: debugging match failures" section for an explanation of how we discovered this.

We now have a command for rewriting all transmute calls, but we'd like it to rewrite only transmutes from *const T to Option<&T>. We can achieve this by filtering the input and output types with typed!:

rewrite_expr '
    typed!(
        def!(::std::intrinsics::transmute)(
            typed!(__e, *const __ty)
        ),
        ::std::option::Option<&__ty>
    )
' '__e.as_ref()'

Now only those transmutes that turn *const T into Option<&T> are affected by the rewrite. And because typed! has access to the results of type inference, this works even on transmute calls that are not fully annotated (transmute(foo), not just transmute::<*const T, Option<&T>>(foo)).

The marked! form is simple: marked!(e, label) matches an expression only if e matches the expression and the expression is marked with the given label. See the documentation on marks and select for more information.

Several other refactoring commands use the same pattern-matching engine as rewrite_expr:

rewrite_ty PAT REPL (docs) works like rewrite_expr, except it matches and replaces type annotations instead of expressions.
abstract SIG PAT (docs) replaces expressions matching a pattern with calls to a newly-created function.
type_fix_rules (docs) uses type patterns to find the appropriate rule to fix each type error.
select's match_expr (docs) and similar filters use syntax patterns to identify nodes to mark.

Many refactoring commands in c2rust refactor are designed to work only on selected portions of the crate, rather than affecting the entire crate uniformly. To support this, c2rust refactor has a mark system, which allows marking AST nodes (such as functions, expressions, or type annotations) with simple string labels. Certain commands add or remove marks, while others check the existing marks to identify nodes to transform.

For example, in a program containing several byte string literals, you can use select to mark a specific one:

select target 'item(B2); desc(expr);'

Then, you can use bytestr_to_str to change only the marked byte string to an ordinary string literal, leaving the others unaffected:

bytestr_to_str

This ability to limit transformations to specific parts of the program is useful for refactoring a large codebase incrementally, on a module-by-module or function-by-function basis.

The remainder of this tutorial describes select and related mark-manipulation commands. For details of how marks affect various transformation commands, see the command documentation or read about the marked! pattern for rewrite_expr and other pattern-matching commands.

A "mark" is a short string label that is associated with a node in the AST. Marks can be applied to nodes of most kinds, including items, expressions, patterns, type annotations, and so on. The mark string can be any valid Rust identifier, though most commands that process marks use short words such as target, dest, or new. It's possible to apply multiple distinct marks to the same node, and it's also possible to mark children of marked nodes separately from their parents (for example, to mark an expression and one of its subexpressions).

Here are some examples.

select target 'crate; desc(match_expr(2 + 2));'

The ▶ ... ◀ indicators in the diff show that the expression 2 + 2 has been marked. Hover over the indicators for more details, such as the label of the added mark.

As mentioned above, most kinds of nodes can be marked, not only expressions. Here we mark a function, a pattern, and a type annotation:

select a 'item(f);' ;
select b 'item(g); desc(match_ty(i32));' ;
select c 'item(g); desc(match_pat(Some(x)));' ;

As mentioned above, it's possible to mark the same node twice with different labels. (Marking it twice with the same label is no different from marking it once.) Here's an example of marking a function multiple times:

select a 'item(f);' ;
select a 'item(f);' ;
select b 'item(f);' ;

As you can see by hovering over the indicators, labels a and b were both added to the function f.

Marks on a node have no connection to marks on its parent or child nodes. We can, for example, mark an expression like 2 + 2, then separately mark its subexpressions with either the same or different labels:

select a 'item(f); desc(match_expr(2 + 2));' ;
select a 'item(f); desc(match_expr(2)); first;' ;
select b 'item(f); desc(match_expr(2)); last;' ;

Hovering over the mark indicators shows precisely what has happened: we marked both 2 + 2 and the first 2 with the label a, and marked the second 2 with the label b.

The select command provides a simple scripting language for applying marks to specific nodes. The basic syntax of the command is:

select LABEL SCRIPT

select runs a SCRIPT (written in the language described below) to obtain a set of AST nodes, then marks every node in the set with LABEL, which should be a single identifier such as target.

More concretely, when running the script, select maintains a "current selection", which is a set of AST nodes. Script operations (described below) can extend or modify the current selection. At the end of the script, select marks every node in the current selection with LABEL.

We next describe a few common select script patterns, followed by details on the available operations and filters.

For items such as functions, type declarations, or traits, the item(path) operation selects the item by its path:

select target 'item(f);' ;
select target 'item(T);' ;
select target 'item(S);' ;
select target 'item(m::g);' ;

Note that this only works for the kinds of items that can be imported via use. It doesn't handle other kinds of item-like nodes, such as impl methods, which cannot be imported directly.

The operations crate; desc(filter); together select all nodes (or, equivalently, all descendants of the crate) that match a filter. For example, we can select all expressions matching the pattern 2 + 2 using a match_expr filter:

select target 'crate; desc(match_expr(2 + 2));'

Here we see that crate; desc(filter); can find matching items anywhere in the crate: inside function bodies, constant declarations, and even inside the length expression of an array type annotation.

In the previous example, crate; desc(filter); is made up of two separate script operations. crate selects the entire crate:

select target 'crate;'

Then desc(filter) looks for descendants of selected nodes that match filter, and replaces the current selection with the nodes it finds:

clear_marks ;
select target 'crate; desc(match_expr(2 + 2));'

(Note: we use clear_marks here only for illustration purposes, to make the diff clearly show the changes between the old and new versions of our select command.)

Combining desc with operations other than crate allows selecting descendants of only specific nodes. For example, we can find expressions matching 2 + 2, but only within the function f:

select target 'item(f); desc(match_expr(2 + 2));'

In a more complex example, we can use multiple desc calls to target an expression inside of a specific method (recall that methods can't be selected directly with item(path)). We first select the module containing the impl:

select target 'item(m);'

Then we select the method of interest, using the name filter (described below):

clear_marks ;
select target 'item(m); desc(name("f"));'

And finally, we select the expression inside the method:

clear_marks ;
select target 'item(m); desc(name("f")); desc(match_expr(2 + 2));'

Combined with some additional filters described below, this approach is quite effective for marking nodes that can't be named with an ordinary import path, such as impl methods or items nested inside functions.

A select script can consist of any number of operations, which will be run in order to completion. (There is no control flow in select scripts.) Each operation ends with a semicolon, much like Rust statements.

The remainder of this section documents each script operation.

crate (which takes no arguments) adds the root node of the entire crate to the current selection. All functions, modules, and other declarations are descendants of this single root node.

Example:

select target 'crate;'

item(p) adds the item identified by the path p to the current selection. The provided path is handled like in Rust's use declarations (except that only plain paths are supported, not wildcards or curly-braced blocks).

select target 'item(m::S);'

Because the item operation only adds to the current selection (as opposed to replacing the current selection with a set containing only the identified item), we can run item multiple times to select several different items at once:

select target 'item(f); item(m::S); item(m);'

child(f) checks each child of each currently selected node against the filter f, and replaces the current selection with the set of matching children.

This can be used, for example, to select a static's type annotation without selecting type annotations that appear inside its initializer:

select target 'item(S); child(kind(ty));'

To illustrate how this works, here is the AST for the static S item:

item static S
- identifier S (the name of the static)
- type i32 (the type annotation of the static)
- expression 123_u8 as i32 (the initializer of the static)
  - expression 123_u8 (the input of the cast expression)
  - type i32 (the target type of the cast expression)

The static's type annotation is a direct child of the static (and has kind ty, matching the kind(ty) filter), so the type annotation is selected by the example command above. The target type for the cast is not a direct child of the static - rather, it's a child of the initializer expression, which is a child of the static - so it is ignored.

desc(f) ("descendant") checks each descendant of each currently selected node against the filter f, and replaces the current selection with the set of matching descendants. This is similar to child, but checks for matching descendants at any depth, not only matching direct children.

Using the same example as for child, we see that desc selects more nodes:

select target 'item(S); desc(kind(ty));'

Specifically, it selects both the type annotation of the static and the target type of the cast expression, as both are descendants of the static (though at different depths). Of course, it still does not select the type annotation of the const C, which is not a descendant of static S at any depth.

Note that desc only considers the strict descendants of marked nodes - that is, it does not consider a node to be a "depth-zero" descendant of itself. So, for example, the following command selects nothing:

select target 'item(S); desc(item_kind(static));'

S itself is a static, but contains no additional statics inside of it, and desc does not consider S itself when looking for item_kind(static) descendants.

filter(f) checks each currently selected node against the filter f, and replaces the current selection with the set of matching nodes. Equivalently, filter(f) removes from the current selection any nodes that don't match f.

Most uses of the filter operation can be replaced by passing a more appropriate filter expression to desc or child, so the examples in this section are somewhat contrived. (filter can still be useful in combination with marked, described below, or in more complex select scripts.)

Here is a slightly roundabout way to select all items named f. First, we select all items:

select target 'crate; desc(kind(item));'

Then, we use filter to keep only items named f:

clear_marks ;
select target 'crate; desc(kind(item)); filter(name("f"));'

With this command, only descendants of crate matching both filters kind(item) and name("f") are selected. (This could be written more simply as crate; desc(kind(item) && name("f"));.)

first replaces the current selection with a set containing only the first selected node. last does the same with the last selected node. "First" and "last" are determined by a postorder traversal of the AST, so sibling nodes are ordered as expected, and a parent node come "after" all of its children.

The first and last operations are most useful for finding places to insert new nodes (such as with the create_item command) while ignoring details such as the specific names or kinds of the nodes around the insertion point. For example, we can use last to easily select the last item in a module. First, we select all the module's items:

select target 'item(m); child(kind(item));'

Then we use last to select only the last such child:

clear_marks ;
select target 'item(m); child(kind(item)); last;'

Now we could use create_item to insert a new item after the last existing one.

marked(l) adds all nodes marked with label l to the current selection. This is useful for more complex marking operations, since (together with the delete_marks command) it allows using temporary marks to manipulate multiple sets of nodes simultaneously.

For example, suppose we wish to select both the first and the last item in a module. Normally, this would require duplicating the select command, since both first and last replace the entire current selection with the single first or last item. This would be undesirable if the operations for setting up the initial set of items were fairly complex. But with marked, we can save the selection before running first and restore it afterward.

We begin by selecting all items in the module and saving that selection by marking it with the tmp_all_items label:

select tmp_all_items 'item(m); child(kind(item));'

Next, we use marked to retrieve the tmp_all_items set and take the first item from it. This reduces the current selection to only a single item, but the tmp_all_items marks remain intact for later use.

select target 'marked(tmp_all_items); first;'

We do the same to mark the last item with target:

select target 'marked(tmp_all_items); last;'

Finally, we clean up, removing the tmp_all_items marks using the delete_marks command:

delete_marks tmp_all_items

Now the only marks remaining are the target marks on the first and last items of the module, as we originally intended.

reset clears the set of marked nodes. This is only useful in combination with mark and unmark, as otherwise the operations before a reset have no effect.

These operations allow select scripts to manipulate marks directly, rather than relying solely on the automatic marking of selected nodes at the end of the script. mark(l) marks all nodes in the current selection with label l (immediately, rather than waiting until the select command is finished), and unmark(l) removes label l from all selected nodes.

mark, unmark, and reset can be used to effectively combine multiple select commands in a single script. Here's the "first and last" example from the marked section, using only a single select command:

select _dummy '
    item(m); child(kind(item)); mark(tmp_all_items); reset;
    marked(tmp_all_items); first; mark(target); reset;
    marked(tmp_all_items); last; mark(target); reset;
    marked(tmp_all_items); unmark(tmp_all_items); reset;
'

Note that we pass _dummy as the LABEL argument of select, since the desired target marks are applied using the mark operation, rather than relying on the implicit marking done by select.

unmark is also useful in combination with marked to interface with non-select mark manipulation commands. For example, suppose we want to mark all occurrences of 2 + 2 that are passed as arguments to a function f. One option is to do this using the mark_arg_uses command, with additional processing by select before and after. Here we start by marking the function f:

select target 'item(f);'

Next, we run mark_arg_uses to replace the mark on f with a mark on each argument expression passed to f:

mark_arg_uses 0 target

And finally, we use select again to mark only those arguments that match 2 + 2:

select target 'marked(target); unmark(target); filter(match_expr(2 + 2));'

Beginning the script with marked(target); unmark(target); copies the set of target-marked nodes into the current selection, then removes the existing marks. The remainder of the script can then operate as usual, manipulating only the current selection with no need to worry about additional marks being already present.

Filter expressions can be combined using the boolean operators &&, ||, and !. A node matches the filter f1 && f2 only if it matches f1 and also matches f2, and so on.

kind(k) matches AST nodes whose node kind is k. The supported node kinds are:

item - a top-level item, as in struct Foo { ... } or fn foo() { ... }. Includes both items in modules and items defined inside functions or other blocks, but does not include "item-like" nodes inside traits, impls, or extern blocks.
trait_item - an item inside a trait definition, such as a method or associated type declaration
impl_item - an item inside an impl block, such as a method or associated type definition
foreign_item - an item inside an extern block ("foreign module"), such as a C function or static declaration
stmt
expr
pat - a pattern, including single-ident patterns like foo in let foo = ...;
ty - a type annotation, such as Foo in let x: Foo = ...;
arg - a function or method argument declaration
field - a struct, enum variant, or union field declaration
itemlike - matches nodes whose kind is any of item, trait_item, impl_item, or foreign_item
any - matches any node

The node kind k can be used alone as shorthand for kind(k). For example, the operation desc(item); is the same as desc(kind(item));.

item_kind(k) matches itemlike AST nodes whose subkind is k. The itemlike subkinds are:

extern_crate
use
static
const
fn
mod
foreign_mod
global_asm
ty - type alias definition, as in type Foo = Bar;
existential - existential type definition, as in existential type Foo: Bar;. Note that existential types are currently an unstable language feature.
enum
struct
union
trait - ordinary trait Foo { ... } definition, including unsafe trait
trait_alias - trait alias definition, as in trait Foo = Bar; Note that trait aliases are currently an unstable language feature.
impl - including both trait and inherent impls
mac - macro invocation. Note that select works on the macro-expanded AST, so macro invocations are never present under normal circumstances.
macro_def - 2.0/decl_macro-style macro definition, as in macro foo(...) { ... }. Note that 2.0-style macro definitions are currently an unstable language feature.

Note that a single item_kind filter can match multiple distinct node kinds, as long as the subkind is correct. for example, item_kind(fn) will match fn items, method trait_items and impl_items, and fn declarations inside extern blocks (foreign_items). similarly, item_kind(ty) matches ordinary type alias definitions, associated type declarations (in traits) and definitions (in impls), and foreign type declarations inside extern blocks.

item_kind filters match only those nodes that also match kind(itemlike), as other node kinds have no itemlike subkind.

The itemlike subkind k can be used alone as shorthand for item_kind(k). For example, the operation desc(fn); is the same as desc(item_kind(fn));.

pub matches any item, impl item, or foreign item whose visibility is pub. It currently does not support struct fields, even though they can also be declared pub.

mut matches static mut items, static mut foreign item declarations, and mutable binding patterns such as the mut foo in let mut foo = ...;.

name(re) matches itemlikes, arguments, and fields whose name matches the regular expression re. For example, name("[fF].*") matches fn f() { ... } and struct Foo { ... }, but not trait Bar { ... }. It currently does not support general binding patterns, aside from those in function arguments.

path(p) matches itemlikes and enum variants whose absolute path is p.

path_prefix(n, p) is similar to path(p), but drops the last n segments of the node's path before comparing to p.

has_attr(a) matches itemlikes, exprs, and field declarations that have an attribute named a.

match_expr(e) uses rewrite_expr-style AST matching to compare exprs to e, and matches any node where AST matching succeeds. For example, match_expr(__e + 1) matches the expressions 1 + 1, x + 1, and f() + 1, but not 2 + 2.

match_pat, match_ty, and match_stmt are similar, but operate on pat, ty, and stmt nodes respectively.

marked(l) matches nodes that are marked with the label l.

any_child(f) matches nodes that have a child that matches f. all_child(f) matches nodes where all children of the node match f.

any_desc and all_desc are similar, but consider all descendants instead of only direct children.

In addition to select, c2rust refactor contains a number of other mark-manipulation commands. A few of these can be replicated with appropriate select scripts (though using the command is typically easier), but some are more complex.

copy_marks OLD NEW adds a mark with label NEW to every node currently marked with OLD.

delete_marks OLD removes the label OLD from every node that is currently marked with it.

rename_marks OLD NEW behaves like copy_marks OLD NEW followed by delete_marks OLD: it adds a mark with label NEW to every node marked with OLD, then removes OLD from each such node.

mark_uses LABEL transfers LABEL marks from definitions to uses. That is, it finds each definition marked with LABEL, marks each use of such a definition with LABEL, then removes LABEL from the definitions. For example, if a static FOO: ... = ... is marked with target, then mark_uses target will add a target mark to every expression FOO that references the marked definition and then remove target from FOO itself.

For the purposes of this command, a "use" of a definition is a path or identifier that resolves to that definition. This includes expressions (both paths and struct literals), patterns (paths to constants, structs, and enum variants), and type annotations. When a function definition is marked, only the function path itself (the foo::bar in foo::bar(x)) is considered a use, not the entire call expression. Method calls (whether using dotted or UFCS syntax) normally can't be handled at all, as their resolution is "type-dependent" (however, the mark_callers command can sometimes work when mark_uses does not).

mark_callers LABEL transfers LABEL marks from function or method definitions to uses. That is, it works like mark_uses, but is specialized to functions and methods. mark_callers uses more a more sophisticated means of name resolution that allows it to detect uses via type-dependent method paths, which mark_uses cannot handle.

For purposes of mark_callers, a "use" is a function call (foo::bar()) or method call (x.foo()) expression where the function or method being called is one of the marked definitons.

mark_arg_uses INDEX LABEL transfers LABEL marks from function or method definitions to the argument in position INDEX at each use. That is, it works like mark_callers, but marks the expression passed as argument INDEX instead of the entire call site.

INDEX is zero-based. However, the self/receiver argument of a method call counts as the first argument (index 0), with the first argument in parentheses having index 1 (arg0.f(arg1, arg2)). For ordinary function calls (including UFCS method calls), the first argument has index 0 (f(arg0, arg1, arg2))

The analysis::ownership module implements a pointer analysis for inferring ownership information in code using raw pointers. The goal is to take code that has been automatically translated from C, and thus uses only raw pointers, and infer which of those raw pointers should be changed to safe &, &mut, or Box pointers. Pointers can appear in a number of places in the input program, but this analysis focuses mainly on function signatures and struct field types.

The goal of the analysis is to assign to each raw pointer type constructor a permission value, one of READ, WRITE, and MOVE, corresponding to the Rust pointer types &, &mut, and Box. These permissions form a trivial lattice, where READ < WRITE < MOVE. The READ permission indicates that the pointed-to data may be read, the WRITE permission indicates that the pointed-to data may be modified, and the MOVE permission indicates that the pointed-to data may be "moved", or consumed in a linear-typed fashion. The MOVE permission also includes the ability to free the pointed-to data, which amouns to "moving to nowhere".

Here is a simple example to illustrate the major features of the analysis:

struct Array {
    data: *mut i32,
}

unsafe fn new_array(len: usize) -> *mut Array {
    let data = malloc(size_of::<i32>() * len);
    let arr = malloc(size_of::<Array>());
    (*arr).data = data;
    array
}

unsafe fn delete_array(arr: *mut Array) {
    free((*arr).data);
    free(arr);
}

unsafe fn element_ptr(arr: *mut Array, idx: usize) -> *mut i32 {
    (*arr).data.offset(idx)
}

unsafe fn get(arr: *mut Array, idx: usize) -> i32 {
    let elt: *mut i32 = element_ptr(arr, idx);
    *elt
}

unsafe fn set(arr: *mut Array, idx: usize, val: i32) {
    let elt: *mut i32 = element_ptr(arr, idx);
    *elt = val;
}

The analysis infers pointer permissions by observing how pointers are used, and applying the rules of the Rust reference model. For instance, the set function's elt pointer must have permission WRITE (or higher), because there is a write to the pointed-to data. Similarly, delete_array's first call to free requires that the pointer in the Array::data field must have permission MOVE. Furthermore, the first free also requires arr to have permission MOVE, because consuming the pointer (*arr).data constitutes a move out of *arr. (In general, the pointer permission sets an upper bound on the permissions of all pointers within the pointed-to data. For example, if arr has permission READ, then *(*arr).data can only be read, not written or moved.)

The element_ptr function presents an interesting case for analysis, because it is used polymorphically: in get, we would like element_ptr to take a READ *mut Array and return a READ *mut i32, whereas in set we would like the same function to take and return WRITE pointers. In strictly const-correct C code, get and set would respectively call separate const and non-const variants of element_ptr, but a great deal of C code is not const-correct.

This analysis handles functions like element_ptr by allowing inferred function signatures to be permission polymorphic. Signatures may include permission parameters, which can be instantiated separately at each call site, subject to a set of constraints. For example, here is the inferred polymorphic signature of element_ptr, with permission annotations written in comments (since there is no Rust syntax for them):

fn element_ptr /* <s0, s1> */ (arr: /* s0 */ *mut Array,
                               idx: usize)
                               -> /* s1 */ *mut i32
    /* where s1 <= s0 */;

The function has two permission parameters, s0 and s1, which are the permissions of the argument and return pointers respectively. The signature includes the constraint s1 <= s0, indicating that the output pointer's permission is no higher than that of the input pointer. The function is called in get with permission arguments s0 = s1 = READ and in set with s0 = s1 = WRITE.

Rust does not support any analogue of the permission polymorphism used in this analysis. To make the results useful in actual Rust code, the analysis includes a monomorphization step, which chooses a set of concrete instantiations for each polymorphic function, and selects an instantiation to use for each call site. In the example above, element_ptr would have both READ, READ and WRITE, WRITE instantiations, with the first being used for the callsite in get and the second at the callsite in set.

The analysis first computes a polymorphic signature for each function, then monomorphizes to produce functions that can be handled by Rust's type system.

Both parts of the analysis operate on constraint sets, which contain constraints of the form p1 <= p2. The permissions p1, p2 can be concrete permissions (READ, WRITE, MOVE), permission variables, or expressions of the form min(p1, p2) denoting the less-permissive of two permission values.

Permission variables appear on pointer type constructors in the types of static variables and struct fields ("static" variables), in the types within function signatures ("sig"), in the types of temporaries and local variables ("local"), and at callsites for instantiating a permission polymorphic function ("inst"). Variables are marked with their origin, as variable from different locations are handled in different phases of the analysis.

The overall goal of the analysis is to produce assignments to static and sig variables that satisfy all the relevant constraints (or multiple assignments, when monomorphizing polymorphic functions).

The permission variables of each function's polymorphic signature are easily determined: for simplicity, the analysis introduces one variable for each occurrence of a pointer type constructor in the function signature. Cases that might otherwise involve a single variable appearing at multiple locations in the signature are instead handled by adding constraints between the variables. The main task of the first part of the analysis is to compute the constraints over the signature variables of each function. This part of the analysis must also build an assignment of permission values to all static vars, which are not involved in any sort of polymorphism.

Constraints arise mainly at assignments and function call expressions.

At assignments, the main constraint is that, if the assigned value has a pointer type, the permission on the LHS pointer type must be no greater than the permission on the RHS pointer type (lhs <= rhs). In other words, an assignment of a pointer may downgrade the permission value of that pointer, but may never upgrade it. In non-pointer types, and in the pointed-to type of an outermost pointer type, all permission values occurring in the two types must be equal (lhs <= rhs and rhs <= lhs).

Assignments also introduce two additional constraints, both relating to path permissions. The path permission for an expression is the minimum of the permission values on all pointers dereferenced in the expression. For example, in *(*x).f, the path permission is the minimum of the permission on the local variable x and the permission on the struct field f. The calculation of path permissions reflects the transitive nature of access restrictions in Rust: for example, if a struct field x.f has type &mut T, but x is an immutable reference (&S), then only immutable access is allowed to *x.f.

The two additional constraints introduced by assigments are (1) the path permission of the LHS must be no lower than WRITE, and (2) the path permission of the RHS must be no lower than the permission of the LHS pointer type. Constraint (1) prevents writing through a READ pointer, or through any path containing a READ pointer. Constraint (2) prevents assigning a WRITE pointer accessed through a READ path (or a MOVE pointer accessed through a WRITE or READ path) to a WRITE pointer variable, which would allow bypassing the READ restriction.

Function calls require additional work. At each call site, the analysis copies in the callee's constraints, substituting a fresh "instantiation" ("inst") variable for each variable in the callee's signature. It then links the new inst variables to the surrounding locals by processing a "pseudo-assignment" from each argument expression to the corresponding formal parameter type in the substituted signature, and from the return type to the lvalue expression where the result is to be stored. The effect is to allow the analysis to "reason through" the function call, relating the (local) return value to the caller's argument expressions. Copying the constraints instead of relying on a concrete instantiation permits precise reasoning about polymorphic functions that call other polymorphic functions.

The final step for each function is to simplify the constraint set by eliminating "local", "inst", and "static" permission variables. Local variables have no connection to types outside the current function, and can be simplified away without consequence. Eliminating static and instantiation variables requires fixed-point iteration, which is described below. The result of the simplification is a set of constraints over only the function's sig variables, which is suitable for use as the constraint portion of the function signature.

Since each function's signature depends on the signatures of its callees, and functions may be recursive, a fixed-point iteration step is required to compute the final constraint set for each function. To simplify the implementation, the polymorphic signature construction part of the analysis is split into two phases. The intraprocedural phase visits every function once and generates constraints for that function, but doesn't copy in constraints from callees, which may not have been processed yet. This phase records details of each call site for later use. The intraprocedural phase eliminates local variables at the end of each function, but it does not have enough information to safely eliminate static and inst variables. The interprocedural phase updates each function in turn, substituting in callees' sig constraints and simplifying away static and inst variables to produce a new, more accurate set of sig constraints for the current function, and iterates until it reaches a fixed point. The interprocedural phase also computes an assignment of concrete permission values to static variables, during the process of removing static variables from functions' constraint sets.

The first part of the analysis infers a permission polymorphic signature for each function, but Rust does not support this form of polymorphism. To make the analysis results applicable to actual Rust code, the analysis must provide enough information to allow monomorphizing functions - that is, producing multiple copies of each function with different concrete instantiations of the permission variables.

Monomorphization begins by collecting all "useful" monomorphic signatures for each function. The analysis identifies all signature variables that appear in output positions (in the return type, or behind a pointer whose permission value is always at least WRITE), then enumerates all assignments to those output variables that are allowed by the function's constraints. For each combination of outputs, it finds the least-restrictive valid assignment of permissions to the remaining (input) variables. For example, given this function:

fn element_ptr /* <s0, s1> */ (arr: /* s0 */ *mut Array,
                               idx: usize)
                               -> /* s1 */ *mut i32
    /* where s1 <= s0 */;

The only output variable is s1, which appears in the return type. The monomorphization step will try each assignment to s1 that is allowed by the constraints. Since the only constraint is s1 <= s0, READ, WRITE, and MOVE are all valid. For each of these, it finds the least restrictive assignment to s0 that is compatible with the assignment to s0. For example, when s1 = MOVE, only s0 = MOVE is valid, so the analysis records MOVE, MOVE as a monomorphization for the element_ptr function. When s1 = WRITE, both s0 = MOVE and s0 = WRITE satisfy the constraints, but s0 = WRITE is less restrictive - it allows calling the function with both MOVE and WRITE pointers, while setting s0 = MOVE allows only MOVE pointers. So the analysis records arguments WRITE, WRITE as another monomorphization, and by similar logic records READ, READ as the final one.

The next step of monomorphization is to select a monomorphic variant to call at each callsite of each monomorphized function. Given a pair of functions:

fn f /* <s0, s1> */ (arr: /* s0 */ *mut Array) -> /* s1 */ *mut i32
        /* where s1 <= s0 */ {
    g(arr)
}

fn g /* <s0, s1> */ (arr: /* s0 */ *mut Array) -> /* s1 */ *mut i32
        /* where s1 <= s0 */ {
    ...
}

For pointer permissions to line up properly, a monomorphic variant of f specialized to READ, READ will need to call a variant of g also specialized to READ, READ, and a variant of f specialized to WRITE, WRITE will need to call a WRITE, WRITE variant of g.

To infer this information, the analysis separately considers each monomorphic signature of each function. It performs a backtracking search to select, for each callsite in the function, a monomorphic signature of the callee, such that all of the calling function's constraints are satisfied, including constraints setting the caller's sig variables equal to the concrete permissions in the monomorphic signature. The table of callee monomorphization selections is included in the analysis results so that callsites can be updated appropriately when splitting functions for monomorphization.

The ownership analysis supports annotations to specify the permission types of functions and struct fields. These annotations serve two purposes. First, the user can annotate functions to provide custom signatures for functions on which the analysis produces inaccurate results. Signatures provided this way will be propagated throughout the analysis, so manually correcting a single wrongly-inferred function can fix the inference results for its callers as well. Second, the ownership system provides an ownership_annotate command that adds annotations to functions reflecting their inferred signatures. The user can then read the generated annotations to check the analysis results, and optionally edit them to improve precision, before proceeding with further code transformations.

There are four annotation types currently supported by the ownership system.

#[ownership_static(<perms>)] provides concrete permission values for all pointer types in a static declaration or struct field. The perms argument is a comma-separated sequence of concrete permission tokens (READ, WRITE, MOVE). The given permission values will be applied to the pointers in the static or field type, following a preorder traversal of the type. For example:
```
struct S {
    #[ownership_static(READ, WRITE, MOVE)]
    f: *mut (*mut u8, *mut u16)
}
```
Here the outermost pointer will be given permission READ, the pointer to u8 will be given permission WRITE, and the pointer to u16 will be given permission MOVE.
#[ownership_constraints(<constraints>) provides the signature constraints for the annotated function, overriding polymorphic signature inference. The argument constraints is a comma-separated sequence of constraints of the form le(<perm1>, <perm2>), each representing a single constraint perm1 <= perm2. The permissions used in each constraint may be any combination of concrete permissions (READ, WRITE, MOVE), permission variables (_0, _1, ...), or expressions of the form min(p1, p2, ...). (The permission syntax is limited by the requirement for compatibility with Rust's attribute syntax.)

The permission variables used in constraints always refer to signature variables of the annotated function. A signature variable is introduced for each pointer type constructor in the function's signature, and they are numbered according to a preorder traversal of each node in the argument and return types of the function. This example shows location of each variable in a simple signature:
```
fn get_err(arr: /* _0 */ *mut Array,
           element_out: /* _1 */ *mut /* _2 */ *mut i32)
           -> /* _3 */ *const c_char;
```
#[ownership_mono(<suffix>, <perms>)] supplies a monomorphic signature to be used for the annotated function. The suffix argument is a quoted string, which (if non-empty) will be used when splitting polymorphic functions into monomorphic variants to construct a name for the monomorphized copy of the function. The perms argument is a comma-separated list of concrete permission tokens, giving the permissions to be used in the function signature in this monomorphization.

The ownership_mono annotation can appear multiple times on a single function to provide multiple monomorphic signatures. However, if it appears at all, monomorphization inference will be completely overriden for the annotated function, and only the provided signatures will be used in callee argument inference and later transformations.

Example:
```
#[ownership_mono("mut", WRITE, WRITE)]
#[ownership_mono("", READ, READ)]
fn first(arr: *mut Array) -> *mut i32;
```
This function will have two monomorphic variants, one where both pointers' permission values are WRITE and one where both are READ. When the ownership_split_variants command splits the function into its monomorphic variants, the WRITE variant will be named first_mut and the READ variant will keep the original name first.
#[ownership_variant_of(<name>)] is used to combine source-level functions into variant groups. See the section on variant groups for details.

The "variant group" mechanism allows combining several source-level functions into a single logical function for purposes of the analysis. This is useful for combining a function that was previously split into monomorphic variants back into a single logical function. This allows for a sort of "modular refactoring", in which the user focuses on one module at a time, analyzing, annotating, and splitting variants in only that module before moving on to another.

As a concrete example of the purpose of this feature, consider the following code:

fn f(arr: *mut Array) -> *mut i32 { ... g(arr) ... }

fn g(arr: *mut Array) -> *mut i32 { ... }

The user works first on (the module containing) g, resulting in splitting g into two variants:

fn f(arr: *mut Array) -> *mut i32 { ... g_mut(arr) ... }

fn g(arr: *mut Array) -> *mut i32 { ... }
fn g_mut(arr: *mut Array) -> *mut i32 { ... }

Note that, because there is still only one variant of f, the transformation must choose a single g variant for f to call. In this case, it chose the g_mut variant.

Later, the user works on f. If g and g_mut are treated as separate functions, then there are two possibilities. First, if the constraints on g_mut are set up (or inferred) to require WRITE permission for arr, then only a WRITE variant of f will be generated. Or second, if the constraints are relaxed, then f may get both READ and WRITE variants, but both will (wrongly) call g_mut.

Treating g and g_mut as two variants of a single function allows the analysis to switch between g variants in the different variants of f, resulting in correct code like the following:

fn f(arr: *mut Array) -> *mut i32 { ... g(arr) ... }
fn f_mut(arr: *mut Array) -> *mut i32 { ... g_mut(arr) ... }

fn g(arr: *mut Array) -> *mut i32 { ... }
fn g_mut(arr: *mut Array) -> *mut i32 { ... }

The ownership_split_variants automatically annotates the split functions so they will be combined into a variant group during further analysis. Variant groups can also be constructed manually using the #[ownership_variant_of(<name>)] annotation, where name is an arbitrary quoted string. All source-level functions bearing an ownership_variant_of annotation with the same name will form a single variant group, which will be treated as a single function throughout the analysis. However, signature inference for the variants themselves is not well supported. Thus, each variant must have an ownership_mono annotation, and exactly one function in each variant group must also have an ownership_constraints annotation. Together, these provide enough information that inference is not required. Note that unlike non-variant functions, variants may not have multiple ownership_mono annotations, as each variant is expected to correspond to a single monomorphization of the original function.

The analysis as described so far tries to mimic the Rust ownership model as implemented in the Rust compiler. However, collection data structures in Rust often use unsafe code to bypass parts of the ownership model. A particularly common case is in removal methods, such as Vec::pop:

impl<T> Vec<T> {
    fn pop(&mut self) -> Option<T> { ... }
}

This method moves a T out of self's internal storage, but only takes self by mutable reference. Under the "normal" rules, this is impossible, and the analysis described above will infer a stricter signature for the raw pointer equivalent:

fn pop(this: /* MOVE */ *mut Vec) -> /* MOVE */ *mut c_void { ... }

The analysis as implemented includes a small adjustment (the "collection hack") to let it infer the correct signature for such methods.

The collection hack is this: when handling a pointer assignment, instead of constraining the path permission of the RHS to be at least the permission of the LHS, we constraint it to be at least min(lhs_perm, WRITE). The result is that it becomes possible to move a MOVE pointer out of a struct when only WRITE permission is available for the pointer to that struct. Then the analysis will infer the correct type for pop:

fn pop(this: /* WRITE */ *mut Vec) -> /* MOVE */ *mut c_void { ... }

This is the top-level directory for all cross-checking components, and contains the following:

A clang plugin that automatically inserts cross-check instrumentation into C code.
An equivalent rustc compiler plugin for Rust.
The libfakechecks cross-checking backend library that prints out all cross-checks to standard output. This library is supported by both the C and Rust compiler plugins.
Our experimental fork of the ReMon MVEE modified for C/Rust side-by-side checking, along with the mvee-configs directory that contains some MVEE configuration examples.

The C2Rust transpiler aims to convert C code to semantically equivalent unsafe Rust code, and later incremental refactoring passes gradually transform this code to Rust code. However, the initial Rust translation might not be a perfect semantic match to the original C code, and the refactoring passes may also change the code in ways that break semantics. Cross-checking is an automated way to verify that the translated program behaves the same as the original C code.

The way cross-checking achieves this goal is by comparing the execution traces of all versions (henceforth called "variants") of the program (original C, unsafe and refactored Rust) and checking for any differences. Our cross-checking implementation modifies the source code of the program at compile-time (separately during C and Rust compiler invocation) so that the variants output the traces at run-time, and then checks the traces against each other either online during execution (using the ReMon MVEE), or offline by comparing log files. The C2Rust cross-checkers currently instrument function entry and exit points, function return values, and function call arguments (currently experimental and disabled by default, but can be enabled per argument, function or file).

To illustrate how cross-checking works, let us take the following code snippet:

int foo() {
    return 1;
}

Calling the foo function will cause the following cross-check events to be emitted:

XCHECK(Ent):193491849/0x0b887389
XCHECK(Exi):193491849/0x0b887389
XCHECK(Ret):8680820740569200759/0x7878787878787877

C2Rust contains one cross-checking implementation per language, in the form of a compiler plugin in both cases. We provide a clang plugin for C code, and a rustc plugin for Rust code.

To build C variants with cross-checks enabled, first build the cross-checking plugin using $C2RUST/scripts/build_cross_checks.py, then run clang (or pass it to the build system) with the following options:

-Xclang -load -Xclang $C2RUST/build/clang-xcheck-plugin.$(uname -n)/plugin/CrossChecks.so to load the plugin
-Xclang -add-plugin -Xclang crosschecks to activate the plugin
-Xclang -plugin-arg-crosschecks -Xclang <...> for every additional option to pass to the plugin
-ffunction-sections may be required to correctly deduplicate some linkonce functions inserted by the plugin

Note that every option passed to clang requires a -Xclang prefix before the actual option, so that the compiler driver passes it to the clang backend correctly. We provide a cc_wrapper.sh script in the plugin source code directory that inserts these automatically, as well as several project-specific scripts in directories under examples/.

Additionally, the following arguments should be passed to the linker:

The cross-checking runtime library from $C2RUST/build/clang-xcheck-plugin.$(uname -n)/runtime/libruntime.a
A cross-checking backend library that provides the rb_xcheck function, e.g., libfakechecks for offline logging or libclevrbuf for online MVEE-based checks

Building Rust code with cross-checks is simpler that C code, and only requires a few additions to Cargo.toml and the main Rust source file. Add the following to your Cargo.toml file (replacing $C2RUST to the actual path to this repository):

[dependencies.c2rust-xcheck-plugin]
path = "$C2RUST/cross-checks/rust-checks/rustc-plugin"

[dependencies.c2rust-xcheck-derive]
path = "$C2RUST/cross-checks/rust-checks/derive-macros"

[dependencies.c2rust-xcheck-runtime]
path = "$C2RUST/cross-checks/rust-checks/runtime"
features = ["libc-hash", "fixed-length-array-hash"]

and this preamble to your lib.rs or main.rs:

#![feature(plugin, custom_attribute)]
#![cross_check(yes)]

#[macro_use] extern crate c2rust_xcheck_derive;
#[macro_use] extern crate c2rust_xcheck_runtime;

You may also add #![plugin(c2rust_xcheck_plugin(...))] to pass additional arguments to the cross-checking plugin.

Cross-checks can be customized at a fine granularity using cross-check configuration files or inline attributes.

When cross-checking in offline mode, all variants are executed independentely on the same inputs, and their cross-checks are written to either standard output or log files. After running all the variants, divergence can be detected by manually comparing the logs for mismatches. There are several backend libraries that support different types of logging outputs:

libfakechecks outputs a list of the cross-checks linearly to either standard output or a file (specified using the FAKECHECKS_OUTPUT_FILE environment variable)
zstd-logging library from cross-checks/rust-checks/backends (can also be used with the clang plugin) outputs a binary encoding of the cross-checks that is compressed using zstd, and is much more space-efficient than the text output of libfakechecks. The compressed output files can be converted to text using the xcheck-printer tool.

Before running the C and Rust variants, you may need to load in one of these libraries using LD_PRELOAD if you haven't linked against it and passed in its path using -rpath (this is fairly easy to do for a C build, but more complicated when using Cargo for Rust code), like this:

$ env LD_PRELOAD=$C2RUST/cross-checks/libfakechecks/libfakechecks.so ./a.out

Running each variant with cross-checks enabled will print a list of cross-check results to the specified output. A simple diff or cmp command will show differences in cross-checks, if any.

The other execution mode for cross-checks is the online mode, where a monitor program (the MVEE) runs all variants in parallel with exactly the same inputs (by intercepting input system calls like read and replicating their return values) and cross-checks all the output system calls and instrumentation points inserted by our plugins. This approach has several advantages over offline mode:

Input operations are fully replicated, including those from stateful resources like sockets; only the master variant performs each actual operation, and each other variant only gets a copy of the data.
Outputs are cross-checked but not duplicated, so each output operation is only executed by the master variant; the others are only cross-checked for matching outputs. For example, only the master variant opens and writes to output files.
The lock-step MVEE automatically eliminates most sources of non-determinism, like threading and non-deterministic syscalls, e.g., reading from /dev/urandom (see the Troubleshooting section below for more details)

However, the main disadvantage of this approach is that some applications may not run correctly under the MVEE, due to either incomplete support from the MVEE or fundamental MVEE limitations. In such cases, we recommend using offline mode instead.

To run your application inside our MVEE, first build it following the instructions in its README. After building it successfully, write an MVEE configuration file for your application (there is a sample file in the MVEE directory, and a few others in our examples directory), then run the MVEE:

$ ./MVEE/bin/Release/MVEE -f <path/to/MVEE_config.ini> -N<number of variants> -- <variant arguments>

The MVEE.ini configuration file is fairly self-explanatory, but there are a few notable settings that are important:

xchecks_initially_enabled disables system call replication and cross-checks up to the first function cross-check (usually for the main function), and should be false by default for cross-language checks. This is because the Rust runtime performs a few additional system calls that C code does not, and the MVEE would terminate with divergence if cross-checks were enabled.
relaxed_mman_checks and unsynced_brk disable MVEE cross-checks on the mmap family of calls and brk, respectively, and should both be set to true if the Rust code performs significantly different memory allocations.
path specifies the path to the variant's executable, and should be specified separately per variant (ReMon also supports running multiple variants for the same binary, but with different command line arguments; this is not used by C2Rust cross-checks). All variants should be files in the same directory, otherwise the MVEE will abort with a divergence inside the ELF loader.
argv specifies the arguments to pass to each variant (can be configured per-variant or globally for all variants).
env specifies the environment variables to pass to the variants, and should at least contain a LD_LIBRARY_PATH entry for the libclevrbuf.so library, and a LD_PRELOAD entry for the zeroing allocator libzero_malloc.so, like this:

{
  "variant": {
    "global": {
      "exec": {
        "env": [
          "LD_LIBRARY_PATH=../../../cross-checks/ReMon/libclevrbuf",
          "LD_PRELOAD=../../../cross-checks/zero-malloc/target/release/libzero_malloc.so"
        ]
      }
    }
  }
}

In case you run into any issues while building or running the variants, please refer to this section for possible fixes.

Builds may occasionally fail because of partially or completely unsupported features in our plugins:

Bitfields in C structures: these do not currently have a Rust equivalent, and the transpiler converts them to regular integer types. The clang plugin will exit with an error when trying to cross-check a bitfield.
Variadic functions: Rust does not support these yet, and our clang plugin cannot handle them either.
Fixed-sized arrays of large or unusual sizes: as of the writing of this document, Rust does not have const generics yet, and we need them to support arbitrary-sized arrays on the Rust side. Until then, the rustc plugin runtime only supports cross-checking on fixed-sized arrays from a limited set of sizes (all integers up to 32, all powers of 2 up to 1024).
Function pointers with more than 12 arguments: these require variadic generics in Rust.

In all these cases, we recommend that you either disable cross-checks for values of these types, or manually provide a custom cross-check function (see the cross-check configuration for more details).

Cross-checker output for C variants may sometimes contain additional cross-checks for inline functions from system headers which are missing from the corresponding Rust translation. In such cases, we recommend manually disabling cross-checks for the C inline functions using an external cross-check configuration file.

Ideally, divergence (cross-checks differences between variants) is only caused by actual semantic differences between the variants. However, there is another cause of unintended divergence: nondeterminism in program execution. In most cases, non-determinism will simply cause each variant to produce different cross-checks, but it may also occasionally cause crashes. There are many causes of program non-determinism that interfere with cross-checking:

Calls to time, RNG, PID (getpid() and friends) and other non-deterministic OS functions and operations, e.g., reads from /dev/urandom
File descriptors. We have hard-coded a fixed cross-check value for all FILE* objects for this exact reason.
Threading, i.e., non-deterministic thread scheduling
Pointer-to-integer casts under ASLR.

Generally, running the variants in online mode inside the MVEE fixes these issues by replicating all system calls between the variants, which ensures that they all receive the same values from the OS. In case the MVEE does not support a specific application and you need to run it in offline mode (or for any other reason), the recommended fix is to remove all such non-determinism from your code manually, e.g., replace all reads from /dev/urandom with constant values.

To verify that non-determinism truly is the cause of divergence, we recommend running each separate variant multiple times and cross-checking it against itself. If non-determinism really is the problem, each run will produce different cross-checks.

A note on ASLR and pointers: our cross-checkers currently check pointers by dereference instead of by address, thereby making the checks insensitive to ASLR. However, manually casting pointers to integers poses problems because integers cannot be dereferenced. Integers are cross-checked by value regardless of their source, and their values will differ across runs when they originate from pointers with ASLR enabled.

Uninitialized memory is one common source of non-determinism, since an uninitialized value may have different actual values across different runs of a program. Since our plugins cross-check pointers by dereferencing them, invalid pointers can also crash our cross-checking runtime.

To eliminate this problem, we force zero-initialization for all C and Rust values. The plugins enforce this for stack (function-local values), and all global values are already zero-initialized (as required by the C standard), which only leaves heap-allocated values, i.e., those allocated using malloc. We provide a zeroing malloc wrapper under cross-checks/zero-malloc which can be preloaded into an application using LD_PRELOAD. This wrapper library intercepts all memory allocation calls, and zeroes the allocated buffer after each call. To use this in Rust executables in place of the default jemalloc, add the following lines to your code to use the system allocator, which our library intercepts:

#[global_allocator]
static A: ::std::alloc::System = ::std::alloc::System;

We recommend that you use our zeroing memory allocator for all cross-checks.

Some C code will use pointers of some type T to refer to values of another type U, tricking our runtime into cross-checking the values incorrectly. This may not only cause divergence, but also potential crashes when our runtime attempts to cross-check actual integers as invalid pointers (see example below). If the value of the integer incidentally represents a valid memory address, the runtime will try to cross-check that memory as a T; otherwise, the runtime will most likely crash.

struct T {
  int n;
};
struct U {
  char *s;
};
void foo(struct U *x) {
  // Cross-check problem here: will try to cross-check `x` as a `struct U*`,
  // when it's a `struct T*`, so the check will most likely crash
  // when attempting to dereference x->s
}
int main() {
  T x = { 0x1234 };
  foo((struct U*)&x);
  return 0;
}

Our cross-checking runtimes can recover from attempts to dereference invalid pointers, but rely on the pointer-tracer tool that uses ptrace to check and restart all invalid pointer dereferences. To use this recovery feature, you must pointer-tracer to start the variants:

$ $C2RUST/cross-checks/pointer-tracer/target/release/pointer-tracer ./a.out

Alternatively, some issues caused by pointer aliasing can be fixed by disabling cross-checks altogether for certain types and values, or by providing custom cross-check functions for certain types. For example, one common pattern is the tagged union, where multiple structures have an overlapping prefix with a tag, followed by a type-specific part:

enum Tag {
  TAG_A,
  TAG_B
};
struct TypeA {
  // Common part
  Tag tag;
  char *foo;
  // Specific part
  int valA;
};
struct TypeB {
  // Common part
  Tag tag;
  char *foo;
  // Specific part
  void *valB;
};

For this example, you can either disable cross-checks manually for all the type-specific types, e.g., valA and valB above, or provide a custom cross-check function that cross-checks each value based on its tag.

Another common C pattern related to memory allocations is the pointer to end of buffer pattern:

struct Buf {
  char *start;
  char *end;
};
void alloc(struct Buf *buf, unsigned size) {
    buf->start = malloc(size);
    buf->end = buf->start + size;
}

In this example, cross-checks will diverge on the end pointer, since it points to the first byte after the allocation returned by malloc. Since we only require the allocation itself to be zero-initialized, the value of that byte is undefined, and could change at any time during the execution of the variant.

For any pointers outside allocated memory, we recommend disabling cross-checks altogether.

The cross-checking runtimes may attempt to cross-check pointers to values that have been deallocated, e.g., by calling free, but still linger in memory without being used by the program. This means that the runtimes may dereference these pointers, even if the program never does, which may lead to divergence since the allocator is free to reuse that memory.

struct Data {
  int allocated;
  char *buf;
};
struct Data *free_data(struct Data *data) {
    data->allocated = 0;
    free(data->buf);
    // Potential non-determinism: data->buf has been freed,
    // but our runtime will try to dereference it
    return data;
}

As of the writing of this document, we have no automatic way to detect when the runtimes attempt to dereference deallocated memory, so we recommend manually disabling cross-checks when this occurs.

In some cases, clang and rustc may optimize the generated code differently in ways that produce divergence. For example, clang (more specifically, a LLVM optimization pass enabled by clang) converts single-argument printf calls to direct calls to puts. For example, clang converts printf("Hello world!\n"); to puts("Hello world!");. The two functions have different internal implementations and make different syscalls (mainly the write syscall), so this optimization causes divergence. We recommend compiling all C code with the -fno-builtin argument to prevent this.

In many cases, we can add identical cross-checks to the original C and the transpiled Rust code, e.g., when the C code is naively translated to the perfectly equivalent Rust code, and everything just works. However, this might not always be the case, and we need to handle mismatches such as:

Type mismatches between C and Rust, e.g., a C const char* (with or without an attached length parameter) being translated to a str. Additionally, if a string+length value pair (with the types const char* and size_t) gets translated to a single str, we may want to omit the cross-check on the length parameter.
Whole functions added or removed by the transpiler or refactoring tool, e.g., helpers.

Note that this list is not exhaustive, so there may be many more cases of mismatches.

To handle all these cases, we need a language that lets us add new cross-checks, or modify or delete existing ones.

The cross-check metadata is stored as a YAML encoding of an array of configuration entries. Each configuration entry describes the configuration for that specific check.

An example configuration file for a function foo with 3 arguments a, alen and b looks something like:

main.c:
  - item: defaults
    disable_xchecks: true

  - item: function
    name: foo
    disable_xchecks: false
    args:
      a: default
      alen: none
      b: default
    return: no

main.rs:
  - item: function
    name: foo
    args:
      a: default
      b: default
    return: no

We can store the cross-check configuration entries in a few places:

Externally in separate configuration files.
Inline in the source code, attached to the checked functions and structures.

Each approach has advantages and drawbacks. Inline configuration entries are simpler to maintain, but do not scale as well to larger codebases or more complex cross-check configuration entries. Conversely, external configuration entries are more flexible and can potentially express complex configurations in a cleaner and more elegant way, but can easily get out of sync with their corresponding source code. We currently support both approaches, with external configuration settings taking priority over inline attributes where both are present.

In the current implementation of the Rust cross-checker, inline configuration settings are passed to the enclosing scope's #[cross_check] attribute, e.g.:


# #![allow(unused_variables)]
#fn main() {
#[cross_check(yes, entry(djb2="foo"))]
fn bar() { }

#[cross_check(yes, entry(fixed=0x1234))]
fn baz() { }
#}

At the top level, each configuration file is a YAML associative array mapping file names to their configuration entries. Each array element maps a file name (represented as a string) to a list of individual items, each item representing a Rust/C scope entity, i.e., function or structure. Each item is encoded in YAML as an associative array. All items have a few common array members:

item specifies the type of the current item, e.g., function, struct or others.
name specifies the name of the item, i.e., the name of the function or structure.

Function cross-checks are configured using entries with item: function. Function entries support the following fields:

Field	Role
`disable_xchecks`	Disables all cross-checks for this function and everything in it if set to `true`.
`entry`	Configures the function entry cross-check (see below for information on accepted values).
`exit`	Configures the function exit cross-check.
`all_args`	Specifies a cross-check override for all of this function's arguments. For example, setting `all_args: none` disables cross-checks for all arguments.
`args`	An associative array that maps argument names to their corresponding cross-checks. This can be used to customize the cross-checks for some of the function arguments individually. This setting overrides both the global default and the one specified in `all_args` for the current function.
`return`	Configures the function return value cross-check.
`ahasher` and `shasher`	Override the default values for the aggregate and simple hasher for this function (see the hashing documentation for the meaning of these fields).
`nested`	Recursively configures the items nested inside the current items. Since Rust allows arbitrarily deep function and structure nesting, we use this to recursively configure nested functions.
`entry_extra`	Specifies a list of additional custom cross-checks to perform after the argument. Each cross-check accepts an optional `tag` parameter that overrides the default `UNKNOWN` tag.
`exit_extra`	Specifies a list of additional custom cross-checks to perform on function return.

Structure entries configure cross-checks for Rust structure, tuple and enumeration types, and are tagged with item: struct. For a general overview of cross-checking for structures (aggregate types), see the hashing documentation. Structure entries support the following fields:

Field	Role
`disable_xchecks`	Disable automatic cross-check emission for this structure (this is generally best left out, unless the default is `true` and needs to be reset to `false`).
`field_hasher`	Configures the replacement hasher for this structure. The hasher is a Rust object that implements the `cross_check_runtime::hash::CrossCheckHasher` trait.
`custom_hash`	Specifies a function to call to hash objects of this type, instead of the default implementation. This function should have the signature `fn foo<XCHA, XCHS>(arg: &T, depth: usize) -> u64` where `T` is the name of the current type. `XCHA` and `XCHS` are template parameters passed by the caller that specify the aggregate and simple hasher to use for this computation (and can be overridden using `ahasher` and `shasher` below).
`fields`	An associative array that specifies custom hash computations for some or all of the structure's fields. Accepts values in the format of cross-check types.
`ahasher` and `shasher`	Override the aggregate and simple hasher for the default hash implementation for the current type (mainly useful if `field_hasher` is left out). These are recursively passed to the hash function call for each structure field.

The field_hasher and custom_hash provide two alternative methods of customizing the hashing algorithm for a given structure: users may either provide a custom implementation of CrossCheckHasher and pass that to field_hasher, or implement a hashing function and pass it to custom_hash. The two alternatives are mostly equivalent, and users may use whichever is more convenient. Additionally, users can choose to completely disable the automatic derivation of CrossCheckHash, and manually implement CrossCheckHasher for some of the types instead.

Cross-check types

There are several types of cross-check implemented in the compiler:

Check	Value Type	Behavior
`default`		Lets the compiler perform the default cross-check.
`none` or `disabled`		Disables cross-checking or hashing for the current value.
`fixed`	`u64`	Sets the cross-checked value to the given 64-bit integer.
`djb2`	`String`	Sets the cross-checked value to the djb2 hash of the given string. This is mainly useful for overriding function entry cross-checks, in case the function names don't match between languages.
`as_type`	`String`	Perform the default value cross-check, but after casting the value to the given type, e.g., cast it to a `u32` then cross-check it as a `u32`.
`custom`	`String`	Parses the given string as a C or Rust expression and uses it to compute the cross-checked value. In most cases, the string is inserted verbatim into the cross-check code, e.g., for function argument cross-checks.

Each cross-check is encoded in YAML as either a single word with the type, e.g., default, or a single-element associative array mapping the type to its argument, e.g., { fixed: 0x1234 }.

More cross-check types may be added as needed.

If custom_hash: { custom: "hash_foo" } is a configuration entry for structure Foo, then the compiler will insert a call to hash_foo to perform the cross checks. This function should have the following signature:


# #![allow(unused_variables)]
#fn main() {
fn hash_foo<XCHA, XCHS>(foo: &Foo, depth: usize) -> u64 { ... }
#}

The hash function receives a reference to a Foo object and a maximum depth, and should return the 64-bit hash value for the given object.

If bar: { custom: "hash_bar" } is a configuration entry for field bar, then the compiler will insert a call to hash_bar to compute the hash for bar. This function should have the following signature:


# #![allow(unused_variables)]
#fn main() {
fn hash_bar<XCHA, XCHS, S, F>(h: &mut XCHA, foo: &S, bar: &F, depth: usize)
       where XCHA: cross_check_runtime::hash::CrossCheckHasher { ... }
#}

The function receives the following arguments:

The current aggregate hasher for this structure. The function can call the hasher's write_u64 function as many times as needed.
The structure containing this field. This argument has generic type S, so the same function can be reused for different structures.
The field itself, with generic type F. The function may require additional type bounds for F to make it compatible with its callers.
The maximum hashing depth (explained in the hashing documentation).
The type parameters XCHA and XCHS bound to the current aggregate and simple value hasher for the current invocation.

This function should not return the hash value of the field. Instead, the function should call the hasher's write_u64 method directly.

The special defaults item type specifies the default cross-check settings for all items in a file. We currently support the following entries:

Field	Role
`disable_xchecks`	Disables all cross-checks for this file. Can be individually overridden per function or structure.
`entry`	Configures the default entry cross-check for all functions in this file.
`exit`	Similarly configures the function exit cross-check.
`all_args`	Specifies a cross-check override for all arguments to all functions in this file. For example, setting `all_args: default` enables cross-checks for all arguments.
`return`	Configures the function return value cross-check.

Example configuration for a function baz1(a, b):

main.rs:
  - item: function
    name: baz1
    entry: { djb2: "baz" }    // Cross-check the function as "baz"
    args:
      a: { custom: "foo(a)" } // Cross-check a as foo(a)
      b: none                 // Do not cross-check b
    entry_extra:              // Cross-check foo(b) with a FUNCTION_ARG tag
      - { custom: "foo(b)", tag: FUNCTION_ARG }
      - { custom: "a" }       // Cross-check the value "a" with UNKNOWN_TAG

Example configuration for a structure Foo (illustrated on an object foo of type Foo):

main.rs:
  - item: struct
    name: Foo
    field_hasher: "FooHasher"  // Use FooHasher as the aggregate hasher
    fields:
      a: { fixed: 0x12345678 } // Use 0x12345678 as the hash of foo.a
      b: { custom: "hash_b" }  // Hash foo.b using hash_b(foo.b)
      c: none                  // Ignore foo.c when hashing foo

In addition to the external configuration format, a subset of cross-checks can also be configured inline in the program source code. The compiler plugin provides a custom #[cross_check] attribute used to annotate functions, structures and fields with custom cross-check metadata.

The #[cross_check] function attribute currently supports the following arguments:

Argument	Type	Role
`none` or `disabled`		Disable cross-checks for this function and all its sub-items (this attribute is inherited). Each sub-item can individually override this with `yes` or `enabled`.
`yes` or `enabled`		Enable cross-checks for this function and its sub-items. Each nested item can also override this setting with `none` or `disabled`.
`entry`	`XCheckType`	Cross-check to use on function entry, same as for external configuration.
`exit`	`XCheckType`	Cross-check to use on function entry, same as for external configuration.
`all_args`	`XCheckType`	Enable cross-checks for this function's arguments (disabled by default). Takes the cross-check type as its argument.
`args(...)`		Per-argument cross-check overrides (same as for external configuration).
`return`	`XCheckType`	Cross-check to perform on the function return value, same as for external configuration.
`ahasher` and `shasher`	`String`	Same as for external configuration.
`entry_extra` and `exit_extra`	Same as for external configuration.


# #![allow(unused_variables)]
#fn main() {
#[cross_check(yes, entry(djb2="foo"))] // Cross-check this function as "foo"
fn foo1() {
  #[cross_check(none)]
  fn bar() { ... }
  bar();

  #[cross_check(yes, all_args(default), args(a(fixed=0x123)))]
  fn baz(a: u8, b: u16, c: u32) { ... }
  baz(1, 2, 3);
}
#}

The compiler plugin also supports a subset of the full external configuration settings as #[cross_check] arguments:

Argument	Type	Role
`field_hasher`	`String`	Same as for external configuration.
`custom_hash`	`String`	Same as for external configuration.
`ahasher` and `shasher`	`String`	Same as for external configuration.

The #[cross_check] attribute can also be attached to structure fields to configure hashing:

Argument	Type	Role
`none` or `disabled`		This field is skipped during hashing.
`fixed`	`u64`	Fixed 64-bit integer to use as the hash value for this field. Identical to the `fixed` external cross-check type.
`custom_hash`	`String`	Same as for external configuration.


# #![allow(unused_variables)]
#fn main() {
#[cross_check(field_hasher="MyHasher")]
struct Foo {
  #[cross_check(none)]
  foo: u64,

  #[cross_check(fixed=0x1234)]
  bar: String,

  #[cross_check(custom_hash="hash_baz")]
  baz: String,
}
#}

At any level or scope, there may be duplicate items, i.e., multiple items with the same names. It is not clear at this point how to best handle this case, since we have several conflicting requirements. On the one hand, we may wish to allow the configuration for one source file to be spread across multiple configuration files, and entries from later configuration files to be appended or replace entries from earlier files. On the other hand, we may have identically-named structures or functions in nested scopes that we want to configure separately. For an example, consider the following code:

fn foo(x: u32) -> u32 {
    if x > 22 {
        fn bar(x: u32) -> u32 {
            x - 22
        };
        bar(x)
    } else {
        fn bar(x: u32) -> u32 {
            x + 34
        }
        bar(x)
    }
}

In this example, there are two distinct foo::bar functions, and we wish to configure them separately. However, at the top level of a file, there may only be one foo function, so we can merge all entries for foo together. Alternatively, we could check for multiple top-level items with the same name and exit with an error if we encounter any duplicates.

Currently, if a certain cross-check is configured using both an external entry and an inline #[cross_check(...)] attribute, the external entry takes priority. Alternatively, we may reverse this priority, or exit with an error if both are present.

The configuration settings described above apply to the scope of an item. While most settings apply exclusively to the scope itself (for example, args and all_args settings only apply to the current function, e.g., foo above and not any of the bar functions) and not any of its nested sub-items, there are a few that apply to everything inside the scope. These attributes are internally "inherited" from each scope by its child scopes. Currently, the only inherited attributes are disable_xchecks (so that disabling cross-checks for a module or function disables them for everything inside that function), ahasher and shasher.

Custom cross-check definitions have a different format for each language. The rustc plugin accepts any Rust expression that is valid on function entry as a custom cross-check.

The clang plugin, on the other hand, only accepts a limited subset of C expressions: each cross-check specification contains the name of the function to call, optionally followed by a list of parameters to pass to the function, e.g., function or function(arg1, arg2, ...). Each parameter is the name of a global variable or function argument, and is optionally preceded by & (to pass the parameter by address instead of value) or by * (to dereference the value if it is a pointer).

C allows developers to define anonymous structures that define the type for a single value, e.g.:

struct {
  int x;
} y;

For a variety of reasons, we need to assign names to these structures ourselves. The most important reason is that we need to identify these structures in the external configuration files. We assign the names using one of the following formats, depending on the context where the anonymous structure is defined:

Assigned name	Meaning
`Foo$field$x`	This structure defines the type for the field `x` of the outer structure `Foo`. Note that `Foo` itself may also be an anonymous structure that follows the same naming policy.
`foo$arg$x`	This structure defines the type for the argument `x` of function `foo` (as illustrated below).
`foo$result`	This structure defines the return type for function `foo`.

struct Foo {
  struct {                  // This gets named `Foo$field$x`
    int x;
  }
};

struct { int a; }           // This gets the `foo$result` name
foo(struct { int b; } x) { // The `x` argument type gets the `foo$arg$x` name
}

For a given value x of a type T, our cross-checking implementation needs to hash x to a hash value H(x) of fixed size (64 bits in the current implementation), regardless of the size and layout of T. This document describes the design and implementation of the type-aware hashing algorithms used by the cross-checker.

Using an established hash functions over the raw bytes of x has a few disadvantages:

C/Rust structures contain padding bytes between consecutive fields (due to alignment requirements), and we must not include this padding in the hash.
Pointer addresses are non-deterministic due to ASLR and other factors, so we must hash them by dereference instead of address.

For these reasons, we have chosen to design our own type-aware hashing algorithms. The algorithms hash each value differently depending on its type, and are implemented by functions with the following signature:

uint64_t __c2rust_hash_T(T x, size_t depth);

We use recursive hashing algorithms for complex types. To prevent infinite recursion and long hashing times, we limit the recursion depth to a fixed value. When recursion reaches this limit, the hash function returns a constant hash instead of going deeper.

We distinguish between the following kinds of types:

Simple types, e.g., integers, booleans, characters, floats, are trivial types which can be hashed directly by value. In the current implementation, we hash these values by XORing them with a constant that depends on the type (see the C and Rust implementations for details). Since simple types cannot recurse, we perform no depth checks for this case.
Aggregate (or non-trivial) types:
- Structures. We hash the contents of each structure by recursively hashing each field (with depth increased by one), then aggregating all the hashes into one. We currently use the JodyHash function for the latter.
- Fixed-size arrays are hashed in fundamentally the same way as structures, by recursively hashing each array element then aggregating the resulting hashes.
- Pointers. We avoid hashing pointers by address for the reasons listed above. Instead, we hash each pointer by recursively hashing its dereferenced value (with depth increased by one). We have two special cases here that we need to handle:
  - Null pointers, which our hash functions check and return a special hard-coded hash value for.
  - Non-null invalid pointers. Our cross-checking implementation will crash when dereferencing these pointers. However, running the crashing program either using pointer-tracer tool or under the MVEE will fix the crashes and safely hash these pointers by returning another special hard-coded value.

Other data types, e.g., unions and structures containing bitfields, are difficult to hash programatically and require the user to specify a manual hash function.

The cross-checking configuration settings can be used to specify different hashing algorithm separately for simple and aggregate types.

This is a simple cross-check inserter for Rust code that is implemented as a Rust compiler plugin.

To use the compiler plugin, you need to take several steps. First, add the plugin as a Cargo dependency to your Cargo.toml file:

[dependencies]
c2rust-xcheck-plugin = { path = ".../C2Rust/cross-checks/rust-checks/rustc-plugin" }
c2rust-xcheck-derive = { path = ".../C2Rust/cross-checks/rust-checks/derive-macros" }
c2rust-xcheck-runtime = { path = ".../C2Rust/cross-checks/rust-checks/runtime" }

with ... as the full path to the C2Rust repository. Next, add the following preamble to your main.rs or lib.rs file:


# #![allow(unused_variables)]
#![feature(plugin)]
#![plugin(c2rust_xcheck_plugin)]

#fn main() {
#[macro_use]
extern crate c2rust_xcheck_derive;
#[macro_use]
extern crate c2rust_xcheck_runtime;
#}

Cross-checking is enabled and configured using the #[cross_check] directive, which can either be enabled globally (using #![cross_check] at the beginning of main.rs or lib.rs) or individually per function (the per-function settings override the global ones).

The directive optionally takes the following options:

yes and enabled enable cross-checking for the current scope (crate or function).
none and disabled disable cross-checking for the current scope.
entry(djb2="foo") sets the cross-checking name for the current function entry point to the DJB2 hash of foo.
entry(fixed=NNN) sets the cross-checking ID for the current function entry point to NNN.

Example:


# #![allow(unused_variables)]
#fn main() {
#[cross_check(yes, entry(djb2="foo"))]
fn bar() { }

#[cross_check(yes, entry(fixed=0x1234))]
fn baz() { }

#[cross_check(no)]
fn foo() { }
#}

This is a cross-check inserter for C programs implemented as a clang compiler plugin.

Build libfakechecks (optional, useful for testing):
```
 $ cd ../../libfakechecks
 $ make all
```
Build the clang plugin using the build script:
```
 $ ../../../scripts/build_cross_checks.py
```
To compile code using the plugin, either wrap the compilation command with the cc_wrapper.sh script from this directory:

  $ cc_wrapper.sh <path/to/clang> .../CrossChecks.so <rest of command line...>

or add the following arguments manually to the clang command line, e.g., using CFLAGS:

-Xclang -load -Xclang .../CrossChecks.so -Xclang -add-plugin -Xclang crosschecks

and link against libruntime.a. In both cases, the target binary must then be linked against one of the rb_xcheck implementation libraries: libfakechecks.so or libclevrbuf.so.

This plugin can be tested in this directory by running make test.

The following example translations illustrate how to run C2Rust on real codebases. Each example has been modified if necessary to prepare it for translation with C2Rust and each has accompanying documentation on how to translate the example.

The robotfindskitten example is accompanied by a demonstration of the refactoring tool rewriting the unsafe translated Rust into idiomatic, safe Rust.

# in examples/json-c/repo:
../configure    # use the custom c2rust configure script
intercept-build make
make check
python3 ../translate.py
ninja -C rust

This will produce rust/libjson-c.so.4.0.0.

# in examples/json-c/repo:

# Replace the C libjson-c.so with a symlink to the Rust one.
# You only need to do this the first time.
rm .libs/libjson-c.so.4.0.0
ln -s ../rust/libjson-c.so.4.0.0 .libs/libjson-c.so.4.0.0

# Run tests
make check

If you modify the C files, make check will try to rebuild some stuff and then will break because of the object files that translate.py deleted. If this happens, run make clean && make, then repeat the "running tests" steps from the top.

If the repo submodule appears to be empty or out of date, you may need to run git submodule update --init path/to/repo.

$ intercept-build make
$ c2rust transpile compile_commands.json
$ rustc test.rs

This tiny project provides an example of how to use CMake to build a C project and to generate the clang "compile_commands.json" file which is used by tools like the c2rust-ast-exporter.

Build with the following commands:

$ mkdir ../build
$ cd ../build
$ cmake ../qsort -DCMAKE_EXPORT_COMPILE_COMMANDS=1
$ cmake --build .
$ c2rust transpile compile_commands.json

Only linux is supported at the moment, but OSX might work with some tweaks.

In path/to/examples/tmux, initialize the git submodule:

git submodule update --init repo

in tmux/repo:

./autogen.sh && ./configure

in tmux/repo:

intercept-build make check

If your compile_commands.json enables optimizations(-O2, -O3, etc) you will need to remove them so that unsupported compiler_builtins are less likely to be generated and leave you in an uncompilable state.

Run rm *.o compat/*.o here to get rid of gcc generated staticlibs or else you may see CRITICAL:root:error: some ELF objects were not compiled with clang: in the next step

in tmux:

./translate.py to translate all required c files into the tmux/repo/rust/src and tmux/repo/rust/src/compat directories.

Run cargo run to build and execute tmux.

If the repo submodule appears to be empty or out of date, you may need to run git submodule update --init path/to/repo.

The steps to get the transpiled code are as follows:

$ intercept-build make
$ c2rust transpile compile_commands.json
$ rustc grabc.rs -L/usr/X11R6/lib -lX11

If you want to have the transpiler create a crate:

$ intercept-build make
$ c2rust transpile compile_commands.json --emit-build-files -m grabc --output-dir rust
$ cd rust
$ RUSTFLAGS="-L/usr/X11R6/lib -lX11" cargo build

In path/to/examples/libxml2, initialize the git submodule:

git submodule update --init repo

in libxml2/repo:

./autogen.sh

and optionally ./configure (autogen.sh currently runs this automatically, so you're not required to).

in libxml2/repo:

intercept-build make check

If your compile_commands.json enables optimizations(-O2) you will need to remove them so that unsupported compiler_builtins are less likely to be generated and leave you in an uncompilable state.

Run rm .libs/*.o here to get rid of gcc generated staticlibs or else you may see CRITICAL:root:error: some ELF objects were not compiled with clang: in the next step

in libxml2:

./translate.py to translate all required c files (including tests) into the libxml2/repo/rust/src and libxml2/repo/rust/examples directories.

in libxml2:

./patch_translated_code.py to apply patches to some known issues in the generated code.

Since each of these tests have their own main file, we decided to move them to the rust examples directory instead of trying to wrap them in the test framework.

You can run a test like so: cargo run --example EXAMPLE where EXAMPLE is one of the files in libxml2/repo/rust/examples, not including the file extension.

testReader seems to be mostly working identically but with some slight differences. Try testReader --valid test/japancrlf.xml. It produces an extra "Ns verBoom: Validation failed: no DTD found !, (null), (null)"

runtest seems to be consistently successful now
testRelax seems to work equivalently with files as in C
testXPath seems to work equivalently with files as in C
xmllint seems to work equivalently with files as in C
testSAX prints out nothing on success, just like C version
testModule prints "Success!"
testHTML works with input files from test/HTML and produces same output as C version
testRegexp works with files from test/regexp and produces same output as C version
testrecurse prints "Total 9 tests, no errors"
testlimits prints "Total 514 tests, no errors"
- Note: text output seems noticeably slower than the C version
testThreads prints nothing (but no longer prints parsing errors)
testapi runs successfully and prints "Total: 1172 functions, 280928 tests, 0 errors"
testC14N prints parsed output when given a file to read from test/c14n
testSchemas no longer crashes when provided a file from test/schemas/*.xsd
testchar prints tests completed
testdict prints "dictionary tests succeeded 20000 strings"
testAutomata takes a file from test/automata and produces equivalent output to C run
testURI waits on input from stdin, needs example input from test/URI. See Makefile.am and result/URI/uri.data for examples

testchar all cross-checks match
testdict all cross-checks match
testapi all cross-checks match (345 million)
runtest all cross-checks match
testlimits all cross-checks match, but requires -fno-builtin as a compiler argument
testSAX works
testHTML works
testRegexp works
testModule requires testdso.so, doesn't work yet
testAutomata works
testSchemas works on all files from test/schemas
testRelax works on all files from test/relaxng
testURI works
testC14N works
testXPath works on files under test/XPath/expr and test/xmlid
testThreads deadlocks, still investigating
xmlllint does not compile

To build snudown with the C2Rust translator and/or cross-checks, initialize the git submodule by running git submodule update --init path/to/repo.

Make sure to build the derive-macros, runtime and rustc-plugin projects in the cross-checks folder beforehand. The runtime project must be built with the libc-hash feature (e.g. cargo build --features libc-hash).

Next, cd into the repo directory and run python setup.py build with one of the following arguments:

--translate to translate the C code to Rust without any checks
--clang-crosschecks to build the C version of snudown with full cross-checking
--rust-crosschecks to translate to cross-checked Rust code
--use-fakechecks may be appended to use the fakechecks library to print out the cross-checks, instead of libclevrbuf from the MVEE
running with no flags will build the C version of the code
Note that -f may need to be appended to the end of the command to force a rebuild, if building multiple times consecutively

After building any of the 3 versions, run python setup.py test to test it.

If the repo submodule appears to be empty or out of date, you may need to run git submodule update --init path/to/repo.

# generate compile_commands.json
$ intercept-build make
$ c2rust transpile compile_commands.json --emit-build-files

Instead of translating with --emit-build-files to generate a library crate, you can build with --main exampleN where N is one of 1, 3, or 4 (example2.c seems to never halt in both C and Rust but translates and executes just fine). This will create a binary crate that will run the specified example.

If the repo submodule appears to be empty or out of date, you may need to run git submodule update --init path/to/repo.

$ intercept-build make
$ c2rust transpile compile_commands.json --emit-build-files -m main --output-dir rust
$ cd rust
$ cargo build

If the repo submodule appears to be empty or out of date, you may need to run git submodule update --init path/to/repo.

You may need to add #include <unistd.h> to xzoom.c for it to properly generate (otherwise main_0 goes missing). This include is normally only added with the TIMER macro enabled, but seems to be required for standard functionality. (We could fork the repo if we want to make this change explicit for the purposes of automated testing.)

clang >= 5.0
sed

$ clang -MJ compile_commands.o.json xzoom.c -L/usr/X11R6/lib -lX11
$ sed -e '1s/^/[\n/' -e '$s/,$/\n]/' *.o.json > compile_commands.json
$ c2rust transpile compile_commands.json
$ rustc xzoom.rs  -L/usr/X11R6/lib -lX11

This section details the refactoring script used to transform the initial Rust translation of robotfindskitten, as generated by c2rust transpile, into a safe Rust program. We divide the refactoring process into several major steps:

ncurses macro cleanup: The ncurses library implements parts of its API using C preprocessor macros, and a few of those macros expand to relatively complex code. We replace these expanded macro bodies with calls to equivalent functions, which are easier to recognize and refactor.
String formatting: robotfindskitten calls several printf-style string-formatting functions. We replace these unsafe variable-argument function calls with safe wrappers using Rust's format family of macros. Aside from improving memory safety, this also allows the Rust compiler to more accurately typecheck the format arguments, which is helpful for later type-directed refactoring passes.
Static string constants: robotfindskitten has two global variables containing string constants, which are translated to Rust as static mut definitions containing C-style *const c_char pointers. We refactor to remove both sources of unsafety, replacing raw pointers with checked &'static str references and converting the mutable statics to immutable ones.
Heap allocations: robotfindskitten uses a heap allocated array to track the objects in the game world. This array is represented as a raw pointer, and the underlying storage is managed explicitly with malloc and free. We replace the array with a memory-safe collection type, avoiding unsafe FFI calls and preventing out-of-bounds memory accesses.
Using the pancurses library: Calling ncurses library functions directly through the Rust FFI requires unsafe code at every call site. We replace unsafe ncurses function calls with calls to the safe wrappers provided by the pancurses crate.
Moving global state to the stack: robotfindskitten uses mutable global variables to store the game state, which turn into unsafe static mut definitions in Rust. We collect all such variables into a single stack-allocated struct, which can be mutated without unsafety.
libc calls: We replace calls to miscellaneous libc functions, such as sleep and rand, with calls to safe Rust equivalents.
Function argument types: Two remaining functions in robotfindskitten take raw pointers as arguments. We change each function's signature to use only safe Rust types, and update their callers to match.
String conversion cleanup: Several of the previous refactoring passes insert conversions between Rust and C string types. In several places, these conversions form cycles, such as &str -> *const c_char -> &str, which are both redundant and a source of unsafe code. We remove such conversion cycles to avoid unnecessary raw pointer manipulation.
Final cleanup: At this point, we have removed all the unsafe code we can. Only a few cleanup steps remain, such as removing unused unsafe qualifiers and deleting unused extern "C" definitions. In the end, we are left with a correct Rust translation of robotfindskitten that contains only a single line of unsafe code.

robotfindskitten uses a variety of macros provided by the ncurses library. Since c2rust transpile runs the C preprocessor before translating to Rust, the expansions of those macros effectively get inlined in the Rust code at each call site. In many cases, this is harmless: for example, move(y, x) expands to wmove(stdscr, y, x), which is not much harder to refactor than the original. However, the attr_get and attrset macros are more complex: they expand to multiple lines of code involving several conditionals and complex expressions. In this step, we convert the expanded code into simple function calls, which are easier to manipulate in later refactoring passes.

Fortunately, the ncurses library provides functions implementing the same operations as the troublesome macros, and we can call those functions through Rust's FFI. We begin by providing Rust declarations for these foreign functions. For ease of reading, we put the new declarations just after the existing extern "C" block:

select target 'crate; child(foreign_mod); last;' ;
create_item
    '
        extern "C" {
            fn wattr_get(win: *mut WINDOW, attrs: *mut attr_t,
                pair: *mut libc::c_short, opts: *mut libc::c_void) -> libc::c_int;
            fn wattrset(win: *mut WINDOW, attrs: libc::c_int) -> libc::c_int;
        }
    '
    after ;

Diff #1

Now we can use rewrite_expr to find Rust code that comes from the expansions of the wattrset macro and replace it with calls to the wattrset function:

rewrite_expr
    '
        if !(__win as *const libc::c_void).is_null() {
            (*__win)._attrs = __attrs
        } else {
        }
    '
    'wattrset(__win, __attrs as libc::c_int)' ;

Diff #2

The __win and __attrs metavariables in the pattern correspond to the arguments of the original C macro, and are used in the replacement to construct the equivalent Rust function call.

Next, we do the same thing for the more complicated wattr_get macro:

rewrite_expr
    '
        if !(__win as *const libc::c_void).is_null() {
            if !(&mut __attrs as *mut attr_t as *const libc::c_void).is_null() {
                __attrs = (*__win)._attrs
            } else {
            };
            if !(&mut __pair as *mut libc::c_short as *const libc::c_void).is_null() {
                __pair = (((*__win)._attrs as libc::c_ulong
                    & ((1u32 << 8i32).wrapping_sub(1u32) << 0i32 + 8i32) as libc::c_ulong)
                    >> 8i32) as libc::c_int as libc::c_short
            } else {
            };
        } else {
        }
    '
    'wattr_get(__win, &mut __attrs, &mut __pair, ::std::ptr::null_mut())' ;

Finally, we are done with this bit of cleanup, so we write the changes to disk before continuing on:

commit ;

Diff #4

robotfindskitten calls several printf-style variable-argument functions to perform string formatting. Since variable-argument function calls are considered unsafe in Rust, we must replace these with Rust-style string formatting using format! and related macros. Specifically, for each string formatting function such as printf, we will create a safe wrapper fmt_printf that takes a Rust fmt::Arguments object, and replace printf(...) calls with fmt_printf(format_args!(...)). This approach isolates all the unsafety into the fmt_printf wrapper, where it can be eliminated by later passes.

The replacement itself happens in two steps. First, we convert printf calls from printf(<C format args...>) to printf(format_args!(<Rust format args...>)). Note that the code does not typecheck in this intermediate state: C's printf function cannot accept the std::fmt::Arguments produced by the format_args! macro. The second step then replaces the printf call with a call to the fmt_printf wrapper, which does accept std::fmt::Arguments.

We run a few commands to mark the nodes involved in string formatting, before finally running the convert_format_args command to perform the actual transformation.

First, we use select and mark_arg_uses to mark the first argument of every printf call as targets:

select target 'item(printf);' ;
mark_arg_uses 0 target ;

Diff #5

convert_format_args will treat the target argument at each call site as a printf-style format string, and will treat all later arguments as format args.

Next, we mark the format string literal with fmt_str, which tells convert_format_args the exact string literal it should use as the format string. This usually is not the same as the target argument, since c2rust-transpile inserts several casts to turn a Rust string literal into a *const libc::c_char.

select fmt_str 'marked(target); desc(expr && !match_expr(__e as __t));' ;

Diff #6

With both target and fmt_str marks in place, we can apply the actual transformation:

convert_format_args ;

Diff #7

Finally, we clean up from this step by clearing all the marks.

clear_marks ;

commit would also clear the marks, but we don't want to commit these changes until we've fixed the type errors introduced in this step.

As a reminder, we currently have code that looks like this:

printf(format_args!("Hello, {}!\n", "world"))

printf itself can't accept the std::fmt::Arguments returned by format_args!, so we will define a wrapper that does accept std::fmt::Arguments and then rewrite these printf calls to call the wrapper instead.

First, we insert the wrapper:

select target 'crate; child(foreign_mod); last;' ;
create_item
    '
        fn fmt_printf(args: ::std::fmt::Arguments) -> libc::c_int {
            print!("{}", args);
            0
        }
    '
    after ;

Diff #9

Since Rust provides a print! macro with similar functionality to printf, our "wrapper" actually just calls print! directly, avoiding the string conversions necessary to call the actual C printf. (See the next subsection for an example of a "real" wrapper function.)

With the wrapper in place, we can now update the call sites:

rewrite_expr 'printf' 'fmt_printf' ;

Diff #10

Now that we've finished this step and the crate typechecks again, we can safely commit the changes:

commit ;

Diff #11

Aside from printf, robotfindskitten also uses the ncurses printw and mvprintw string-formatting functions. The refactoring script for printw is similar to the previous two steps combined:

select target 'item(printw);' ;
mark_arg_uses 0 target ;
select fmt_str 'marked(target); desc(expr && !match_expr(__e as __t));' ;

convert_format_args ;

clear_marks ;

select target 'crate; child(foreign_mod); last;' ;
create_item
    '
        fn fmt_printw(args: ::std::fmt::Arguments) -> libc::c_int {
            unsafe {
                ::printw(b"%s\0" as *const u8 as *const libc::c_char,
                         ::std::ffi::CString::new(format!("{}", args))
                             .unwrap().as_ptr())
            }
        }
    '
    after ;
rewrite_expr 'printw' 'fmt_printw' ;
commit ;

Diff #12

Aside from replacing the name printf with printw, the other notable difference from the printf script is the body of fmt_printw. There is no convenient replacement for printw in the Rust standard library, so instead we call the original printw function, passing in the result of Rust string formatting (converted to a C string) as an argument.

The mvprintw replacement is also similar, just with a few extra arguments:

select target 'item(mvprintw);' ;
mark_arg_uses 2 target ;
select fmt_str 'marked(target); desc(expr && !match_expr(__e as __t));' ;

convert_format_args ;

clear_marks ;

select target 'crate; child(foreign_mod); last;' ;
create_item
    '
        fn fmt_mvprintw(y: libc::c_int, x: libc::c_int,
                        args: ::std::fmt::Arguments) -> libc::c_int {
            unsafe {
                ::mvprintw(y, x, b"%s\0" as *const u8 as *const libc::c_char,
                         ::std::ffi::CString::new(format!("{}", args))
                             .unwrap().as_ptr())
            }
        }
    '
    after ;
rewrite_expr 'mvprintw' 'fmt_mvprintw' ;
commit ;

Diff #13

robotfindskitten defines a static string constant, ver, to store the game's version. Using ver is currently unsafe, first because its Rust type is a raw pointer (*mut c_char), and second because it's mutable. To make ver usage safe, we first change its type to &'static str (and fix up the resulting type errors), and then we change it from a static mut to an ordinary immutable static. Note that we must change the type first because Rust does not allow raw pointers to be stored in safe (non-mut) statics.

We change the type using rewrite_ty:

select target 'item(ver); child(ty);' ;
rewrite_ty 'marked!(*mut libc::c_char)' "&'static str" ;
delete_marks target ;

Diff #14

The combination of select and the marked! matching form ensures that only ver's type annotation is modified. We delete the mark afterward, since it's no longer needed.

Simply replacing *mut c_char with &str introduces type errors throughout the crate. The initializer for ver still has type *mut c_char, and all uses of ver are still expecting a *mut c_char.

Fixing the ver initializer is straightforward: we simply remove all the casts, then convert the binary string (&[u8]) literal to an ordinary string literal. For the casts, we mark all cast expressions in ver's definition, then replace each one with its subexpression:

select target 'item(ver); desc(match_expr(__e as __t));' ;
rewrite_expr 'marked!(__e as __t)' '__e' ;
delete_marks target ;

Diff #15

Only the binary string literal remains, so we mark it and change it to an ordinary str:

select target 'item(ver); child(expr);' ;
bytestr_to_str ;
delete_marks target ;

Diff #16

ver's initializer is now well-typed, but its uses are still expecting a *mut c_char instead of a &str. To fix these up, we use the type_fix_rules command, which rewrites expressions anywhere a type error occurs:

type_fix_rules '*, &str, *const __t => __old.as_ptr()' ;

Diff #17

Here we run type_fix_rules with only one rule: in any position (*), if an expression has type &str but is expected to have a raw pointer type (*const __t), then wrap the original expression in a call to .as_ptr(). This turns out to be enough to fix all the errors at uses of ver.

Now that all type errors have been corrected, we can finish our refactoring of ver. We make it immutable, then commit the changes.

select target 'item(ver);' ;
set_mutability imm ;

commit ;

Diff #18

Static string array - `messages`

Aside from ver, robotfindskitten contains a static array of strings, called messages. Like ver, accessing messages is unsafe because each element is a raw *mut c_char pointer and because messages itself is a static mut.

We rewrite the type and initializer of messages using the same strategy as for ver:

select target 'item(messages); child(ty); desc(ty);' ;
rewrite_ty 'marked!(*mut libc::c_char)' "&'static str" ;
delete_marks target ;
select target 'item(messages); child(expr); desc(expr);' ;
rewrite_expr 'marked!(__e as __t)' '__e' ;
bytestr_to_str ;
delete_marks target ;

Diff #19

We use type_fix_rules to fix up the uses of messages, as we did for ver:

type_fix_rules
    '*, &str, *const __t => __old.as_ptr()'
    '*, &str, *mut __t => __old.as_ptr() as *mut __t' ;

Diff #20

Here we needed a second rule for *mut pointers, similar to the one for *const, because robotfindskitten mistakenly declares messages as an array of char* instead of const char*.

With all type errors fixed, we can make messages immutable and commit the changes:

select target 'item(messages);' ;
set_mutability imm ;

commit ;

Diff #21

The screen variable stores a heap-allocated two-dimensional array, represented in C as an int**. In Rust, this becomes *mut *mut c_int, which is unsafe to access. We replace it with CArray<CArray<c_int>>, where CArray is a memory-safe collection type provided by the c2rust_runtime library. CArray is convenient for this purpose because it supports C-style initialization and access patterns (including pointer arithmetic) while still guaranteeing memory safety.

We actually perform the conversion from *mut to CArray in two steps. First, we replace *mut with the simpler CBlockPtr type, also defined in c2rust_runtime. CBlockPtr provides some limited bounds checking, but otherwise functions much like a raw pointer. It serves as a useful intermediate step, letting us fix up the differences between the raw-pointer and CArray APIs in two stages instead of attempting to do it all at once. Once screen has been fully converted to CBlockPtr<CBlockPtr<c_int>>, we finish the conversion to CArray in the second step.

As a preliminary, we need to add an import of the c2rust_runtime library:

select target 'crate;' ;
create_item 'extern crate c2rust_runtime;' inside ;

Diff #22

Now we can proceed with the actual refactoring.

We further break down the transition from *mut *mut c_int to CBlockPtr<CBlockPtr<c_int>> into two steps, first converting the inner pointer (leaving the overall type as *mut CBlockPtr<c_int>) and then the outer. We change the type annotation first, as we did for var and messages:

select target 'item(screen); child(ty);' ;
rewrite_ty 'marked!(*mut *mut __t)'
    '*mut ::c2rust_runtime::CBlockPtr<__t>' ;

Diff #23

This introduces type errors, letting us easily find (and fix) related expressions using type_fix_rules:

type_fix_rules
    'rval, *mut __t, ::c2rust_runtime::CBlockPtr<__u> =>
        unsafe { ::c2rust_runtime::CBlockPtr::from_ptr(__old) }'
    'rval, *mut __t, *mut __u => __old as *mut __u'
    ;

Diff #24

The first rule provided here handles the later part of screen's initialization, where the program allocates a *mut c_int array (now CBlockPtr<c_int>) for each row of the screen. The second rule handles the earlier part, where it allocates the top-level *mut *mut c_int (now *mut CBlockPtr<c_int>). Both allocations now need a cast, since the type of the rows has changed.

One category of type errors remains: the initialization code tries to dereference the result of offsetting the array pointer, which is not possible directly with the CBlockPtr API. We add the necessary method call using rewrite_expr:

rewrite_expr
    '*typed!(__e, ::c2rust_runtime::block_ptr::CBlockOffset<__t>)'
    '*__e.as_mut()' ;

Diff #25

Here, the pattern filters for dereferences of CBlockOffset expressions, which result from calling offset on a CBlockPtr, and adds a call to as_mut() before the dereference.

The conversion of screen to *mut CBlockPtr<c_int> is now complete. The conversion to CBlockPtr<CBlockPtr<c_int>> uses a similar refactoring script:

select target 'crate; item(screen); child(ty);' ;
rewrite_ty 'marked!(*mut __t)'
    '::c2rust_runtime::CBlockPtr<__t>' ;
type_fix_rules
    'rval, *mut __t, ::c2rust_runtime::CBlockPtr<__u> =>
        unsafe { ::c2rust_runtime::CBlockPtr::from_ptr(__old) }'
    'rval, *mut __t, *mut __u => __old as *mut __u'
    ;
rewrite_expr
    '*typed!(__e, ::c2rust_runtime::block_ptr::CBlockOffset<__t>)'
    '*__e.as_mut()' ;

Diff #26

The only change is in the rewrite_ty step.

There's one last bit of cleanup to perform: now that screen has the desired CBlockPtr<CBlockPtr<c_int>> type, we can rewrite the allocations that initialize it. At this point the allocations use the unsafe malloc function followed by the unsafe CBlockPtr::from_ptr, but we can change that to use the safe CBlockPtr::alloc method instead:

rewrite_expr 'malloc(__e) as *mut __t as *mut __u' 'malloc(__e) as *mut __u' ;
rewrite_expr
    '::c2rust_runtime::CBlockPtr::from_ptr(malloc(__e) as *mut __t)'
    '::c2rust_runtime::CBlockPtr::alloc(
        __e as usize / ::std::mem::size_of::<__t>())'
    ;

Diff #27

This doesn't remove the unsafe blocks wrapping each allocation - we leave those until the end of our refactoring, when we remove unnecessary unsafe blocks throughout the entire crate at once.

At this point, the refactoring of screen to is done, and we can commit the changes:

commit ;

Diff #28

The CArray and CBlockPtr APIs are deliberately quite similar, which makes this part of the screen refactoring fairly straightforward.

First, we replace all uses of CBlockPtr with CArray, both in types and in function calls:

rewrite_ty '::c2rust_runtime::CBlockPtr<__t>' '::c2rust_runtime::CArray<__t>' ;
rewrite_expr
    '::c2rust_runtime::CBlockPtr::from_ptr'
    '::c2rust_runtime::CArray::from_ptr' ;
rewrite_expr
    '::c2rust_runtime::CBlockPtr::alloc'
    '::c2rust_runtime::CArray::alloc' ;

Diff #29

Next, we fix up calls to offset. Unlike CBlockPtr (and raw pointers in general), CArray distinguishes between mutable and immutable offset pointers. We handle this by simply replacing all offset calls with offset_mut:

rewrite_expr
    'typed!(__e, ::c2rust_runtime::CArray<__t>).offset(__f)'
    '__e.offset_mut(__f)' ;

Diff #30

This works fine for robotfindskitten, though in other codebases it may be necessary to properly distinguish mutable and immutable uses of offset.

With this change, the code typechecks with screens new memory-safe type, so we could stop here. However, unlike CBlockPtr, CArray supports array indexing - ptr[i] - in place of the convoluted *arr.offset(i).as_mut() syntax. So we perform a simple rewrite to make the code a little easier to read:

rewrite_expr
    'typed!(__e, ::c2rust_runtime::CArray<__t>).offset_mut(__f).as_mut()'
    '&mut __e[__f as usize]' ;
rewrite_expr '*&mut __e' '__e' ;

Diff #31

Finally, we remove unsafety from screen's static initializer. It currently calls CArray::from_ptr(0 as *mut _), which is unsafe because CArray::from_ptr requires its pointer argument to must satisfy certain properties. But CArray also provides a safe method specifically for initializing a CArray to null, which we can use instead:

rewrite_expr
    '::c2rust_runtime::CArray::from_ptr(cast!(0))'
    '::c2rust_runtime::CArray::empty()' ;

Diff #32

This completes the refactoring of screen, as all raw pointer manipulations have been replaced with safe CArray method calls. The only remaining unsafety arises from the fact that screen is a static mut, which we address in a later refactoring step.

commit ;

The pancurses library provides safe wrappers around ncurses APIs. Since the pancurses and ncurses APIs are so similar, we can automatically convert the unsafe ncurses FFI calls in robotfindskitten to safe pancurses calls, avoiding the need to maintain safe wrappers in robotfindskitten itself.

There are two preliminary steps before we do the actual conversion. First, we must import the pancurses library:

select target 'crate;' ;
create_item 'extern crate pancurses;' inside ;

Diff #33

And second, we must create a global variable to store the main pancurses Window:

select target 'crate;' ;
create_item 'static mut win: Option<::pancurses::Window> = None;' inside ;

Diff #34

pancurses doesn't have an equivalent of the global stdscr window that ncurses provides. Instead, the pancurses initialization function creates an initial Window object that must be passed around to each function that updates the display. We store that initial Window in the global win variable so that it's accessible everywhere that stdscr is used.

Note that making win a static mut makes it unsafe to access. However, a later refactoring pass will gather up all static muts, including win, and collect them into a stack-allocated struct, at which point accessing win will no longer be unsafe.

We convert ncurses library calls to pancurses ones in a few stages.

First, for functions that don't require a window object, we simply replace each ncurses function with its equivalent in the pancurses library:

rewrite_expr 'nonl' '::pancurses::nonl' ;
rewrite_expr 'noecho' '::pancurses::noecho' ;
rewrite_expr 'cbreak' '::pancurses::cbreak' ;
rewrite_expr 'has_colors' '::pancurses::has_colors' ;
rewrite_expr 'start_color' '::pancurses::start_color' ;
rewrite_expr 'endwin' '::pancurses::endwin' ;
rewrite_expr 'init_pair' '::pancurses::init_pair' ;

Diff #35

Next, functions taking a window are replaced with method calls on the static win variable we defined earlier:

rewrite_expr 'wrefresh(stdscr)' 'win.refresh()' ;
rewrite_expr 'wrefresh(curscr)' 'win.refresh()' ;
rewrite_expr 'keypad(stdscr, __bf)' 'win.keypad(__bf)' ;
rewrite_expr 'wmove(stdscr, __my, __mx)' 'win.mv(__my, __mx)' ;
rewrite_expr 'wclear(stdscr)' 'win.clear()' ;
rewrite_expr 'wclrtoeol(stdscr)' 'win.clrtoeol()' ;
rewrite_expr 'waddch(stdscr, __ch)' 'win.addch(__ch)' ;

rewrite_expr
    'wattr_get(stdscr, __attrs, __pair, __e)'
    '{
        let tmp = win.attrget();
        *__attrs = tmp.0;
        *__pair = tmp.1;
        0
    }' ;
rewrite_expr
    'wattrset(stdscr, __attrs)'
    'win.attrset(__attrs as ::pancurses::chtype)' ;

Diff #36

For simplicity, we write win.f(...) in the rewrite_expr replacement arguments, even though win is actually an Option<Window>, not a Window. Later, we replace win with win.as_ref().unwrap() throughout the crate to correct the resulting type errors.

We next replace some ncurses global variables with calls to corresponding pancurses functions:

rewrite_expr 'LINES' 'win.get_max_y()' ;
rewrite_expr 'COLS' 'win.get_max_x()' ;

Diff #37

Finally, we handle a few special cases.

waddnstr takes a string argument, which in general could be any *const c_char. However, robotfindskitten calls it only on string literals, which lets us perform a more specialized rewrite that avoids unsafe C string conversions:

rewrite_expr
    'waddnstr(stdscr, __str as *const u8 as *const libc::c_char, __n)'
    "win.addnstr(::std::str::from_utf8(__str).unwrap().trim_end_matches('\0'),
                 __n as usize)" ;

Diff #38

intrflush has no pancurses equivalent, so we replace it with a no-op of the same type:

rewrite_expr 'intrflush(__e, __f)' '0' ;

Diff #39

That covers all of the "ordinary" ncurses functions used in robotfindskitten. The remaining subsections cover the more complex cases.

We previously replaced calls to the ncurses printw and mvprintw string-formatting functions with code using Rust's safe string formatting macros. This removes unsafety from the call site, but uses wrapper functions (fmt_printw and fmt_mvprintw) that call unsafe code internally. But now that we are using the pancurses library, we can replace those wrappers with safer equivalents.

select target 'item(fmt_printw);' ;
create_item '
    fn fmt_printw(args: ::std::fmt::Arguments) -> libc::c_int {
        unsafe {
            win.printw(&format!("{}", args))
        }
    }
' after ;
delete_items ;
clear_marks ;

select target 'item(fmt_mvprintw);' ;
create_item '
    fn fmt_mvprintw(y: libc::c_int, x: libc::c_int,
                    args: ::std::fmt::Arguments) -> libc::c_int {
        unsafe {
            win.mvprintw(y, x, &format!("{}", args))
        }
    }
' after ;
delete_items ;
clear_marks ;

Diff #40

The wrappers still use unsafe code to access win, a static mut, but no longer make FFI calls or manipulate raw C strings. When we later remove all static muts from the program, these functions will become entirely safe.

Adapting ncurses-based input handling to use pancurses requires some extra care. The pancurses getch function returns a Rust enum, while the ncurses version simply returns an integer. robotfindskitten matches those integers against various ncurses keycode constants, which, after macro expansion, become integer literals in the Rust code.

The more idiomatic approach would be to replace each integer literal with the matching pancurses::Input enum variant when switching from ncurses getch to the pancurses version. However, we instead take the easier approach of converting pancurses::Input values back to ncurses integer keycodes, so the existing robotfindskitten input handling code can remain unchanged.

First, we inject a translation function from pancurses to ncurses keycodes:

select target 'item(initialize_ncurses);' ;
create_item '
    fn encode_input(inp: Option<::pancurses::Input>) -> libc::c_int {
        use ::pancurses::Input::*;
        let inp = match inp {
            Some(x) => x,
            None => return -1,
        };
        match inp {
            // TODO: unicode inputs in the range 256 .. 512 can
            // collide with ncurses special keycodes
            Character(c) => c as u32 as libc::c_int,
            Unknown(i) => i,
            special => {
                let idx = ::pancurses::SPECIAL_KEY_CODES.iter()
                    .position(|&k| k == special).unwrap();
                let code = idx as i32 + ::pancurses::KEY_OFFSET;
                if code > ::pancurses::KEY_F15 {
                    code + 48
                } else {
                    code
                }
            },
        }
    }
' after ;

Diff #41

Then, we translate ncurses wgetch calls to use the pancurses getch method, wrapping the result in encode_input to keep the results unchanged.

rewrite_expr 'wgetch(stdscr)' '::encode_input(win.getch())' ;

Diff #42

As mentioned previously, we use win to obtain the current window object throughout the ncurses refactoring process, even though win is actually an Option<Window>, not a Window. Now that we are done with all the rewrites, we can update thote uses to access the Window properly:

rewrite_expr 'win' 'win.as_ref().unwrap()' ;

Diff #43

The final step is to initialize win. This corresponds to the call to the ncurses initscr initialization function:

rewrite_expr 'initscr()' 'win = Some(::pancurses::initscr())' ;

Diff #44

We save this for last only so that the win to win.as_ref().unwrap() rewrite doesn't produce an erroneous assignment win.as_ref().unwrap() = ....

At this point, we are done with the current refactoring step: robotfindskitten has been fully adapted to use the safe pancurses API in place of raw ncurses FFI calls.

commit

Diff #45

robotfindskitten uses global variables - static muts in Rust - to store the game state. Accessing these globals is unsafe, due to the difficulty of preventing simultaneous borrowing and mutation. In this refactoring step, we move the global state onto the stack and pass it by reference to every function that needs it, which allows the borrow checker to analyze its usage and ensure safety.

Most of the work in this step is handled by the static_to_local_ref refactoring command. This command identifies all functions that use a given static, and modifies those functions to access the global through a reference (passed as an argument to the function) instead of accessing it directly. (See the static_to_local_ref command documentation for examples.)

However, running static_to_local_ref separately on each of robotfindskitten's seven global variables would add up to seven new arguments to many of robotfindskitten's functions, making their signatures difficult to read. Instead, we proceed in two steps. First, we gather up all the global variables into a single global struct. Then, we run static_to_local_ref on just the struct, achieving safety while adding only a single new argument to each affected function.

We collect the statics into a struct using static_collect_to_struct:

select target 'crate; child(static && mut);' ;
static_collect_to_struct State S

Diff #46

Then we run static_to_local_ref to pass a reference to the new State object everywhere it is used:

select target 'crate; child(static && name("S"));' ;
select user 'crate; desc(fn && !name("main|main_0"));' ;
static_to_local_ref ;

Diff #47

The functions that previously accessed the global S now use a reference argument S_, removing a source of unsafety.

The only function that still accesses S directly is main_0. And since main_0 is called only once per run of the program, we can replace the global S with a local variable declared inside main_0 without affecting the behavior of the program. The static_to_local command performs the necessary transformation (using the marks we previous set up for static_to_local_ref):

static_to_local

Diff #48

Now there are no static muts remaining in the program.

There is one final cleanup step to perform. The struct State appears in the signature of several public functions, but State itself is not public, so rustc reports an error. We could make State public, but since there is no reason for the functions in question to be public in the first place, we make the functions private instead:

select target 'crate; desc(fn && !name("main"));' ;
set_visibility '' ;

commit

Diff #49

robotfindskitten makes a number of calls to libc functions, such as sleep and rand, using the FFI. Rust's standard library provides most of the same functionality, so we can replace these libc calls with safe equivalents.

We replace sleep with std::thread::sleep:

rewrite_expr 'sleep(__e)'
    '::std::thread::sleep(
        ::std::time::Duration::from_secs(__e as u64))' ;

Diff #50

We replace atoi with a call to from_str:

rewrite_expr 'atoi(__e)'
    '<libc::c_int as ::std::str::FromStr>::from_str(
        ::std::ffi::CStr::from_ptr(__e).to_str().unwrap()).unwrap()' ;

Diff #51

In the version of glibc we used for translating robotfindskitten, atoi is actually provided as an inline wrapper function in the libc headers. That means the Rust translation of robotfindskitten actually includes a full definition of fn atoi(...) { ... }. Now that we've replaced the atoi call, we can delete the definition as well:

select target 'item(atoi);' ;
delete_items ;
clear_marks ;

Diff #52

We replace exit with std::process::exit:

rewrite_expr 'exit(__e)' '::std::process::exit(__e as i32)' ;

Diff #53

For rand, no equivalent is available in the Rust standard library. Instead, we import the rand crate from crates.io:

select target 'crate;' ;
create_item 'extern crate rand;' inside ;
clear_marks ;

Diff #54

robotfindskitten uses the common srand(time()) pattern to initialize the random number generator, suggesting it does not rely on the ability to control or reuse seeds. That means we can use the thread-local RNG provided by the rand crate, instead of explicitly constructing an RNG with a specific seed. So we replace rand with calls to rand::random:

rewrite_expr 'rand()'
    '(::rand::random::<libc::c_uint>() >> 1) as libc::c_int' ;

Diff #55

And we delete srand calls entirely, relying on the rand crate's automatic initialization of the thread-local RNG:

rewrite_expr 'srand(__e)' '()' ;

Diff #56

At this point, the only remaining FFI call is to signal. robotfindskitten sets up a SIGINT handler to ensure that ncurses (now pancurses) is shut down properly and the terminal is returned to a normal state when the user terminates the program with ^C. Unfortunately, there is no general way to make signal handling safe: to achieve memory safety, signal handling functions must obey a number of special restrictions above and beyond Rust's normal notions of safety, and these properties cannot be checked by the Rust compiler.

We therefore leave the call to signal as unsafe code. Since this will be the only unsafe operation in the program once we finish refactoring, we wrap it in its own unsafe block:

rewrite_expr 'signal(__e, __f)' 'unsafe { signal(__e, __f) }' ;

Diff #57

We've now covered all of the libc functions used by robotfindskitten, and replaced nearly all of them with safe code.

commit

Two functions in robotfindskitten accept raw pointers: message takes a pointer to a string to display on the screen, and main_0 takes an array of string pointers argv containing the program's command line arguments. To make these functions safe, we must replace their raw pointer arguments with safe equivalents.

`message`

We begin with message because it is simpler. This function takes a single argument of type *mut c_char, which we want to replace with &str:

select target
    'item(message); child(arg); child(match_ty(*mut libc::c_char));' ;
rewrite_ty 'marked!(*mut libc::c_char)' '&str' ;
delete_marks target ;

Diff #59

Of course, simply changing the type annotation is not sufficient. Like when we retyped the ver and messages constants, this change has introduced two kinds of type errors: callers of message still pass *mut c_char where &str is now expected, and the body of message still uses the message_0: &str argument in contexts that require a *mut c_char. We fix these using type_fix_rules:

type_fix_rules
    '*, *mut __t, &str =>
        ::std::ffi::CStr::from_ptr(__old).to_str().unwrap()'
    '*, &str, *const __t =>
        ::std::ffi::CString::new(__old.to_owned()).unwrap().as_ptr()'
    ;

Diff #60

The first rule handles callers of message, using CStr methods to convert their *mut c_char raw pointers into safe &str references. The second handles errors in the body of message, using CString to convert &strs back into *const c_char. Note we must use CString instead of CStr in the second rule because an allocation is required: a &str is not guaranteed to end with a null terminator, so CString must copy it into a larger buffer and add the null terminator to produce a valid *const c_char string pointer. Since the CString is temporary, it will be deallocated at the end of the containing expression, but this is good enough for the code we encounter inside of message. More complex string manipulation, however, would likely require a different refactoring approach.

`main_0`

The Rust function main_0 is the translation of the C main function of robotfindskitten. The Rust main is a c2rust-generated wrapper that handles the differences between C's main signature and Rust's before invoking main_0.

As in the message case, we wish to replace the unsafe pointer types in main_0's argument list with safe equivalents. However, in this case our choice of safe reference type is more constrained. main_0 calls argv.offset to access the individual command-line arguments, so we must use CArray (which supports such access patterns) for the outer pointer. For the inner pointer, we use Option<&CStr>: CStr supports the conversions we will need to perform in main and main_0, and Option<&CStr> can be safely zero-initialized, which is required by CArray.

We begin, as with message, by rewriting the argument type:

select target
    'item(main_0); child(arg && name("argv")); child(ty);' ;
rewrite_ty 'marked!(*mut *mut libc::c_char)'
    '::c2rust_runtime::CArray<Option<&::std::ffi::CStr>>' ;
delete_marks target ;

Diff #61

Next, we fix type errors in main, which is the only caller of main_0. Since c2rust always generates the same main wrapper function, rather than refactor it, we can simply replace it entirely with a new version that is compatible with main_0's new signature:

select target 'item(main);' ;
create_item '
    fn main() {
        // Collect argv into a vector.
        let mut args_owned: Vec<::std::ffi::CString> = Vec::new();
        for arg in ::std::env::args() {
            args_owned.push(::std::ffi::CString::new(arg).unwrap());
        }

        // Now that the length is known, we can build a CArray.
        let mut args: ::c2rust_runtime::CArray<Option<&::std::ffi::CStr>> =
            ::c2rust_runtime::CArray::alloc(args_owned.len() + 1);
        for i in 0 .. args_owned.len() {
            args[i] = Some(&args_owned[i]);
        }
        // The last element of `args` remains `None`.

        unsafe {
            ::std::process::exit(main_0(
                (args.len() - 1) as libc::c_int,
                args) as i32);
        }
    }
' after ;
delete_items ;
clear_marks ;

Diff #62

Now to fix errors in main_0 itself. We changed both the inner and outer pointer types of argv, so there are two kinds of errors to clean up.

For the outer pointer, where we changed *mut T to CArray<T>, the problem we see is that argv.offset(...) returns &CArrayOffset<T>, not *mut T, and &CArrayOffset<T> requires two derefs to obtain a T (&CArrayOffset<T> derefs to CArrayOffset<T>, which derefs to T) instead of just one. We handle this with type_fix_rules, looking for cases where a single deref resulted in CArrayOffset<T> but some other type was expected, and adding the second deref:

type_fix_rules
    '*, ::c2rust_runtime::array::CArrayOffset<__t>, __u => *__old'
    ;

Diff #63

For the inner pointer type, which we changed from *mut c_char to Option<&CStr>, we need only insert a simple conversion anywhere the new type is used but *mut c_char is expected:

type_fix_rules
    '*, ::std::option::Option<&::std::ffi::CStr>, *const i8 =>
        opt_c_str_to_ptr(__old)'
    ;

Diff #64

The only quirk here is that we wrap up the conversion in a helper function, making it easier to recognize in the later refactoring step where we clean up redundant string conversions. Of course, now we must define that helper function:

select target 'item(main);' ;
create_item '
    fn opt_c_str_to_ptr(x: Option<&::std::ffi::CStr>) -> *const libc::c_char {
        match x {
            None => ::std::ptr::null(),
            Some(x) => x.as_ptr(),
        }
    }
' after ;
clear_marks ;

Diff #65

And with that, we are done. All raw pointer arguments in robotfindskitten have now been replaced with safe equivalents.

commit

A number of the previous refactoring steps involved changing the type of some variable from a raw C string (*const c_char) to a safe Rust string (&str), inserting conversions between the two forms everywhere the variable was initialized or used. But now that we have finished transitioning the entire crate to Rust strings, many of those conversions have become redundant. Essentially, we began with code like this:

fn f(s1: *const c_char) { ... }

fn g(s2: *const c_char) {
    ... f(s2) ...
}

By incrementally refactoring C strings into Rust string, we first transitioned to code like this:

fn f(s1: &str) { ... }

fn g(s2: *const c_char) {
    ... f(c_str_to_rust(s2)) ...
}

And then to code like this:

fn f(s1: &str) { ... }

fn g(s2: &str) {
    ... f(rust_str_to_c(c_str_to_rust(s2))) ...
}

But rust_str_to_c(c_str_to_rust(s2)) is the same as just s2 - the two conversions are redundant and can be removed:

fn f(s1: &str) { ... }

fn g(s2: &str) {
    ... f(s2) ...
}

This doesn't merely affect readability - the actual conversion operations represented by c_str_to_rust are unsafe, so we must remove them to complete our refactoring of robotfindskitten.

The actual refactoring process we apply to robotfindskitten mostly consists of removing specific types of redundant conversions with rewrite_expr. The patterns we use here are general, taking advantage of overlap between different conversion cases rather than hardcoding a rewrite for each distinct conversion in robotfindskitten.

To begin with, converting CString to *const c_char to CStr can be replaced with a no-op (CString derefs to CStr, so it can be used almost anywhere a CStr is required):

rewrite_expr
    '::std::ffi::CStr::from_ptr(
        cast!(typed!(__e, ::std::ffi::CString).as_ptr()))'
    '__e' ;

Diff #67

Converting String to CString to Option<&str> is not strictly a no-op, but can still be simplified:

rewrite_expr
    '::std::ffi::CString::new(__e).unwrap().to_str()'
    'Some(&__e)' ;

Diff #68

In some places, the code actually converts &str to *const c_char directly, rather than using CString, and then converts *const c_char to CStr to &str. This is memory-safe only when the &str already includes a null terminator, and the CStr to str conversion will trim it off. We rewrite the code to simply trim off the null terminator directly, avoiding these complex (and unsafe) conversions:

rewrite_expr
    '::std::ffi::CStr::from_ptr(
        cast!(typed!(__e, &str).as_ptr())).to_str()'
    "Some(__e.trim_end_matches('\0'))" ;

Diff #69

For code in main_0 using the opt_c_str_to_ptr helper function we introduced earlier, the Option<&CStr> to &CStr conversion can be replaced with a simple unwrap():

rewrite_expr
    '::std::ffi::CStr::from_ptr(cast!(opt_c_str_to_ptr(__e)))'
    '__e.unwrap()' ;

Diff #70

Conversions of bytestring literals (b"...", whose type is &[u8; _]) to *const c_char to CStr to str can be simplified down to a direct conversion from &[u8; _] to &str, plus removal of the null terminator:

rewrite_expr
    '::std::ffi::CStr::from_ptr(
        cast!(typed!(__e, &[u8; __f]))).to_str()'
    "Some(::std::str::from_utf8(__e).unwrap().trim_end_matches('\0'))" ;

Diff #71

This removes the unsafety, but with a little more work, we can further improve readability. First, we convert the byte strings to ordinary string literals (b"..." to "..."):

select target
    'crate; desc(match_expr(::std::str::from_utf8(__e))); desc(expr);' ;
bytestr_to_str ;
clear_marks ;

Diff #72

This introduces type errors, as the type of the literal has changed from &str to &[u8]. We fix these by inserting calls to str::as_bytes:

type_fix_rules '*, &str, &[u8] => __old.as_bytes()' ;

Diff #73

Finally, we remove the redundant conversion from &str to &[u8] to &str:

rewrite_expr
    '::std::str::from_utf8(__e.as_bytes())'
    'Some(__e)' ;

Diff #74

With the replacements above, we have removed all redundant string conversions from the crate. This was the last major source of unnecessary unsafety in robotfindskitten.

The last few changes we make are purely cosmetic - they do not affect safety. First, Some(x).unwrap() is the same as just x:

rewrite_expr
    'Some(__x).unwrap()'
    '__x' ;

Diff #75

And second, "foo\0".trim_end_matches('\0') is the same as just "foo". This one is a little more complicated to rewrite. We first remove null terminators throughout the crate, then remove the calls to trim_end_matches:

select target 'crate; desc(expr);' ;
remove_null_terminator ;
clear_marks ;

rewrite_expr "__e.trim_end_matches('\0')" '__e' ;

Diff #76

This indiscriminate use of remove_null_terminator could introduce bugs (including memory unsafety) if the program still contained code that relies on the presence of the null terminator, such as calls to CStr::from_ptr or libc string functions. But previous refactoring steps have already removed all uses of those functions from robotfindskitten, so this transformation is safe.

commit

At this point, we have removed all the major sources of unsafety from robotfindskitten. We finish the refactoring with an assortment of minor cleanup steps.

We want to remove unnecessary unsafe blocks, but right now every unsafe block is considered unused because they all occur inside unsafe fns. None of these functions actually need to be unsafe at this point, so we mark them safe:

select target 'crate; desc(item && fn);' ;
set_unsafety safe ;
clear_marks ;

Diff #78

This part can't be fully automated. In general, there is no easy way to tell whether the safety of a given function relies on unchecked assumptions about its input, or whether it might break invariants that other functions rely on. In the case of robotfindskitten, every function really is safe, but for other applications or libraries, it might be necessary to be more selective when removing the unsafe qualifier.

Now that all functions are safe, fix_unused_unsafe will remove any unsafe blocks that contain no unsafe operations:

fix_unused_unsafe

Diff #79

Next, we remove a number of unused items from the crate. We have replaced all uses of the FFI declarations generated by c2rust with alternatives, except for one call to signal. We generate a new extern "C" block containing only the declaration of signal, then delete the old unused extern "C" blocks. remove the declarations now:

select target 'crate; desc(foreign_mod); last;' ;
create_item '
    extern "C" {
        fn signal(sig: libc::c_int, handler: __sighandler_t) -> __sighandler_t;
    }
' after ;

select target 'crate; desc(foreign_mod && !marked(new));' ;
delete_items ;

Diff #80

Furthermore, we can delete a number of type declarations that were previously used only in foreign functions:

select target '
    item(__time_t);
    item(time_t);
    item(pdat);
    item(_win_st);
    item(WINDOW);
' ;
delete_items ;

Diff #81

Similarly, we can delete the opt_c_str_to_ptr helper function, which we used only temporarily while cleaning up string-pointer function arguments:

select target '
    item(opt_c_str_to_ptr);
' ;
delete_items ;

Diff #82

Now we are done refactoring robotfindskitten. We have preserved the functionality of the original C program, but all unsafe code has been removed, with the exception of a single signal call that cannot be made safe. The refactored Rust version of robotfindskitten is still unidiomatic and somewhat difficult to read, but by removing nearly all of the unsafe code, we have established a solid foundation for future improvements.

Here is the final refactored version of robotfindskitten:

Diff #83

There are three ways to build the C2Rust project:

Using Vagrant.
Using Docker.
Manually, as explained below.

The previous two options automatically install all prerequisites during provisioning. You can also provision a macOS or Linux system manually.

If you are on a Debian-based OS, you can run scripts/provision_deb.sh to do so.
If you are on macOS, install the Xcode command-line tools (e.g., xcode-select --install) and homebrew first. Then run scripts/provision_mac.sh.
If you prefer to install dependencies yourself, or are using a non Debian-based Linux OS, our dependencies are as follows:
- cmake >= 3.9.1
- dirmngr
- curl
- git
- gnupg2
- gperf
- ninja
- unzip
- clang 5.0+
- intercept-build or bear - see why here
- python-dev
- python 3.6+
- python dependencies
- rustc version
- rustfmt-preview component for the above rustc version
- libssl (development library, dependency of the refactoring tool)

The quickest way to build the C2Rust transpiler is with LLVM and clang system libraries (LLVM 6 and 7 are currently supported). If you have libLLVM.so and the libclang libraries (libclangAST.a, libclangTooling.a, etc. or their shared variants) installed, you can build the transpiler with:

$ cd c2rust-transpile
$ cargo build

You can customize the location where the build system will look for LLVM using the following environment variables at compile time:

LLVM_CONFIG_PATH = Path to the llvm-config tool of the LLVM installation
LLVM_LIB_DIR = Path to the lib directory of the LLVM installation (not necessary if you use LLVM_CONFIG_PATH)
LLVM_SYSTEM_LIBS = Additional system libraries LLVM needs to link against (e.g. -lz -lrt -ldl). Not necessary with llvm-config.
CLANG_PATH = Path to a clang that is the same version as your libclang.so. If this is necessary the build system will return an error message explaining as much.

C2Rust (indirectly) uses the clang-sys crate which can be configured with its own environment variables.

To develop on components that interact with LLVM, we recommend building against a local copy of LLVM. This will ensure that you have debug symbols and IDE integration for both LLVM and C2Rust. However, building C2Rust from source with LLVM takes a while. For a shorter build that links against prebuilt LLVM and clang system libraries, you should be able to cargo build in the c2rust-transpile directory (see the general README).

The following from source full build script has been tested on recent versions of macOS and Ubuntu:

$ ./scripts/build_translator.py

This downloads and builds LLVM under a new top-level folder named build. Use the C2RUST_BUILD_SUFFIX variable to do multiple side-by-side builds against a local copy of LLVM like this:

$ C2RUST_BUILD_SUFFIX=.debug ./scripts/build_translator.py --debug

NOTE: Set C2RUST_BUILD_SUFFIX if building inside and outside of the provided Docker or Vagrant environments from a single C2Rust checkout.

Tests are found in the tests folder. If you build the translator successfully, you should be able to run the tests with:

$ ./scripts/test_translator.py tests

This basically tests that the original C file and translated Rust file produce the same output when compiled and run. More details about tests can be found in the tests folder.

Ubuntu 18.04 (GNU/Linux 4.15.0-12-generic x86_64)
CMake 3.10.2
Ninja 1.8.2

Download a copy of vagrant from https://www.vagrantup.com/downloads.html. Vagrant supports a range of virtualization engines. We recommend you use either VirtualBox or on the VMWare editions, e.g., VMWare Workstation Player.

On Windows, you may need to run with administrative privileges.

vagrant up

Requires paid plug-in. See https://www.vagrantup.com/vmware/index.html

install plugin vagrant plugin install vagrant-vmware-fusion
install license vagrant plugin license vagrant-vmware-fusion /path/to/license.lic
start vagrant vagrant up --provider vmware_fusion

Tested with Docker Community Edition 18.03. The version distributed with your host OS may be too old. Follow the installation instructions to get the latest version.

Building the docker image:

$ cd /path/to/c2rust/docker
$ ../scripts/docker_build.sh

The docker_build.sh script takes two optional arguments:

the name of the base image (ubuntu:bionic by default)
the name of the provisioning script (provision_deb.sh by default)

Creating a container:

$ ./docker_run.sh

The docker_run.sh scripts takes the image name as an optional argument:

$ ./docker_run.sh immunant/c2rust:ubuntu-xenial-20190131

Stopping and starting containers:

$ docker start c2rust
$ docker stop c2rust

Connect to a running container:

$ ./docker_exec.sh

Delete c2rust container (force stop if running)

$ docker rm -f c2rust

removing all containers:

docker rm `docker ps -aq`

pruning all images:

docker system prune
# remove *all* images, not just unused ones
docker system prune -a

To add a new test case, simply create a new .c file. For example:

void example(unsigned buffer_size, int buffer[]) {
    /* your code here */
}

Then create a new .rs file with the following skeleton (does not need to be a buffer, can check return values as well):


# #![allow(unused_variables)]
#fn main() {
extern crate libc;

use c_file::rust_example;

use self::libc::c_int;

#[link(name = "test")]
extern "C" {
    #[no_mangle]
    fn example(_: c_uint, _: *mut c_int);
}

// The length can be any value
const BUFFER_SIZE: usize = 1024;

pub fn test_example() {
    let mut buffer = [0; BUFFER_SIZE];
    let mut rust_buffer = [0; BUFFER_SIZE];
    let expected_buffer = [/* this can be used as another measure of correctness */];

    unsafe {
        example(BUFFER_SIZE as u32, buffer.as_mut_ptr());
        rust_example(BUFFER_SIZE as u32, rust_buffer.as_mut_ptr());
    }

    assert_eq!(buffer, rust_buffer);
    assert_eq!(buffer, expected_buffer);
}
#}

The C code can do one of two things: modify some sort of buffer or return a value.

To completely skip the translation of a C file, you must add the comment //! skip_translation at the top of the file. That will prevent the case from showing up as red in the console output.

You can also mark a Rust file as unexpected to compile, by adding //! xfail to the top of the file, or just expect an individual test function to fail to run by adding // xfail prior to the function definition.

Adding //! extern_crate_X to the top of a test file will ensure extern crate X; gets added to the main binary driver. Be sure to also add the X crate to the test directory's Cargo.toml.

Similarly, //! feature_X adds #![feature(X)] to the top of the main driver file.

From the project root, run ./scripts/test_translator.py tests to run all of the tests in the tests folder. Here are a couple other handy options:

# run a subset of the tests
$ ./scripts/test_translator.py --only-directories="loops" tests
# show output of failed tests
$ ./scripts/test_translator.py --log ERROR                tests
# keep all of the files generated during testing
$ ./scripts/test_translator.py --keep=all                 tests
# get help with the command line options
$ ./scripts/test_translator.py --help

This tests directory contains regression, feature, and unit tests. A test directory goes through the following set of steps:

A compile_commands.json file is created for the Clang plugin in c2rust-ast-exporter to recognize its C source input
This JSON and the C source file are fed to the c2rust-ast-exporter to produce CBOR data of the Clang type-annotated abstract syntax tree.
This CBOR data is fed to the c2rust-transpile to produce a Rust source file supposedly preserving the semantics of the initial C source file.
Rust test files (test_xyz.rs) are compiled into a single main wrapper and main test binary and are automatically linked against other Rust and C files thanks to cargo.
The executable from the previous step is run one or more times parameterized to a specific test function.

cargo install --git https://github.com/immunant/mdBook.git --branch installable (May require --force if you already have mdbook installed. Requires custom changes to resolve symlinks, hopefully will be merged into upstream soon)
mdbook build in the root source directory
The manual should now be available in the book subdirectory.

Add a new Markdown file somewhere in the repository.
Edit manual/SUMMARY.md and add a link to the new file. Use a path relative to the repository root.
Add the new Markdown file to the git index (git add ...)
Run scripts/link_manual.py from the root directory. This will create a symlink for the new file in the manual/ directory. This symlink should be added to git as well.

The manual/generator_dispatch.py script runs as an mdbook preprocessor and replaces {{#generate GEN ARGS}} anywhere in the book with the output of running generator GEN on ARGS. The set of available generators is defined in generator_dispatch.py.

As one example, this is used in manual/c2rust-refactor/commands.md to replace the {{#generate refactor_commands}} placeholder with auto-generated docs for refactoring commands, by running c2rust-refactor/doc/gen_command_docs.py.

API documentation for the Lua scripting interface is generated by calling ldoc . in the c2rust-refactor directory. This updates the c2rust-refactor/doc/scripting_api.html file which we keep checked into source control and linked into the manual.

This guide provides insight into the program structure of the c2rust translator and should be helpful to anyone wanting contribute to its development.

This project provides tooling for translating C programs into Rust, refactoring Rust programs, and cross-checking the execution of C and Rust programs.

The c2rust project is divided into 6 different crates. The purposes of each crate is described below.

The c2rust crate provides a unified command-line interface to the translator and to the refactorer. This is intended to the be the top-level crate that a user would install and interact with.

This crate contains logic for dispatching command-line arguments to the correct sub crate. It should not contain any logic for translating or refactoring code itself.

The c2rust-ast-builder crate provides an AST building abstraction on top of rustc's libsyntax. This is used for code generation both in translation and refactoring.

The builder implemented in this package provides a more stable interface to AST generation that we'd get depending directly on libsyntax. Libsyntax itself is consided unstable and subject to change dramatically at each nightly release. Libsyntax provides is own AST building functionality, but it doesn't have many of the conveniences that we use in our own implementation.

The c2rust project uses clang as a library in order to get reliable pre-processing, parsing, and type-checking of C code. The c2rust-ast-exporter crate provides a mix of C++ and Rust in order to provide a dump of the clang-generated AST. The exporter exports the AST using CBOR.

This crate implements all of the translation logic for getting from C to Rust. It consumes input from the c2rust-ast-exporter crate and generates Rust using c2rust-ast-builder. It is invoked by the c2rust crate.

This crate implements various rewrites and analyses for refactoring our generated Rust code into more idiomatic code.

This crate provides tools for instrumenting Rust executables to be suitable for running in the multi-variant execution engine.

The Builder type allows for short-cuts when building up AST elements by providing a place to store default values for attributes that typically are not changed. This effectively allows us to simulate optional arguments to methods.

New Builder values can be constructed using mk().

For example the default behavior is for patterns to be immutable. If we want to emit a mutable pattern we can store the mutability flag on the Builder and then generate a pattern.


# #![allow(unused_variables)]
#fn main() {
let mut_x_pat = mk().mutbl().ident_pat("x"); // generates: mut x
let y_pat = mk().ident_pat("y"); // generates: y
#}

The Make trait allows for convenient, implicit coercions when using the Builder. Many methods will be parameterized over an arbitrary Make implementation to avoid needing manual conversions. It's quite common to see methods requiring Make<Ident> arguments instead of Ident so that we can accept a number of types. Any new methods implemented for Builder should look to see if there are useful Make implementations for its argument types.


# #![allow(unused_variables)]
#fn main() {
pub trait Make<T> { fn make(self, mk: &Builder) -> T; }
#}

The P type comes from the libsyntax crate and provides functionality similar to Box for immutable, shared values. Many components of the Rust AST will store P<T> instead of T when there are potential savings to be had from shared references.

The Rust AST types are designed to be able to be cross-referenced to source-file locations and various type-information metadata maps. These references are tracked through span and node IDs scattered throughout the AST type definitions. In the case of generating new syntax we don't have any corresponding metadata maps to align with. Instead we fill all of these ID fields with various dummy values: DUMMY_SP and DUMMY_NODE_ID.

Builder methods are named using the pattern kind_type. For example to make a P<Ty> that is a pointer to another Ty use the ptr_ty method because internally you're making a TyKind::Ptr.

The c2rust-transpile crate is broken up into 4 major pieces. First the c_ast modules are used to import and handle the C AST representation of the program to be translated. The cfg modules implement Relooper logic to compile away control-flow constructs that exist in C but not in Rust. In particular we use this to get rid of goto and switch statements. The translator modules do the bulk of the actual translation for declarations, statements, and expressions. The rust_ast modules provide helpers for generating the final Rust code.

In order to preserve comments across translations we instruct clang to parse and export all comments. The mechanism provided by libsyntax for emitting comments relies on matching up comments with particular span IDs, so we have to carefully track the span IDs of the Rust AST that we generate when we want to associate a comment. Additionally libsyntax requires that span IDs are found in order, so before we emit the final code we renumber all span IDs to be in order.

The c_ast module provides the Rust types that mirror the AST from Clang along with methods for deserializing from CBOR into these types. There are 4 kinds of AST element that we distinguish between: Types, Expressions, Declarations, and Statements. Each of these has a corresponding enum: CTypeKind, CExprKind, CDeclKind, and CStmtKind. All of these IDs can be dereferenced using the indexing operator on a TypedAstContext. The typed part of that name is to distinguish it from the conversion time ConversionContext where raw CBOR nodes are processed before we know which IDs correspond to one of the 4 categories above.

The c_ast.iterator module provides a depth-first iterator of the four C AST types. This is used in the code to query different kinds of properties of the input code when translating. For example it can be used to check if a section of code uses a goto statement.

The rust_ast module provides functionality for working with the Rust AST and not for building it. Build functionality can be found in the c2rust-ast-builder crate.

The rust_ast.traverse module provides a depth-first visitor pattern for Rust ASTs. Transformations can be written and made to be instances of the Traversal trait to specify the desired behavior at each AST element. The default implementations of the trait's methods will simply recursively apply the traversal to the child nodes. This can be used both to transform ASTs as well as to query them.

The rust.comment_store module handles the logic for tracking comments that are going to be reinserted back into the final generated Rust.

This is a substantial type that carries the running state of a translation. This struct carries keeps track of all of the items generated so far, the language features used, the association between declarations and their Rust identifiers, and the configuration values set for this translation.

There are various methods for translating elements of C syntax defined on this struct. When these are used any supporting items, imports, or features will be tracked in addition to returning the translated item as the method result.

The WithStmts type is a convenient way to keep track of all of the supporting statements that go along with a value after it has been translated. When translating an expression that will not be used for anything other than its side effects, the val component will be set to a panic macro to make it easy to detect the mistake in the generated code.

ExprContext type

The ExprContext struct tracks information about how to translate expressions. This value is updated as translation progresses through the AST keeping track of the different contexts that expression translation can be in.

The used attribute has one of the biggest effects on translation. This indicates when the val field of the WithStmts result is going to be emitted or discarded. This can be modified in the current context with the .used() and .unused() methods.

The is_static attribute indicates that an expression is being used in the initializer for a static variable. Rust has many extra restrictions on the expressions that can be used for a static initializer. In some cases we can still generate valid code at the cost of readability. This fallback is enabled by this attribute.

The decay_ref attribute keeps track of whether or not we're in a context in which Rust will infer that a reference can decay in to a pointer. This can happen at method calls, variable initializers, and possibly more locations. This allows the translation to omit some otherwise superfluous casts.

The va_decl attribute indicates which, if any, declaration corresponds to the variable-argument list for the current variadic function. This enables us to drop the associated declaration, va_start, and va_end for that variable during translation.

Handling Comments

Comments are tricky to translate. Part of the issue is that comments are typically removed from C source code as part of the pre-processing phase. To handle this we extract comments from the original source code and track their source position information. In addition we track source position information for the various syntactic elements of the C translation unit.

During translation of a statement or a declaration we look into the set of comments to see if we're as close as we're going to get to the home location of this comment. If we are we allocated a unique, temporary span idea to associated between the comment and the syntax element that should carry the comment.

Once all of the translation is complete we revisit all of the synthetic span IDs to reassign them so that they are in ascending order, as required by libsyntax. Once we've done this renumbering, libsyntax is able to emit the comments in the correct location during the final rendering of the Rust AST.

The translator.named_references provides support for naming expressions that need to be able to be read or written two multiple times without reevaluating the expression. This module helps by identifying when a temporary variable will be needed to hold onto a reference so that it can support read and write operations.

1	fn main() {	1	fn main() {
2	println!("{}", 1 + 1);	2	println!("{}", 2);
3	println!("{}", 1 + /comment/ 1);	3	println!("{}", 2);
4	println!("{}", 1 + 11);	4	println!("{}", 1 + 11);
5	}	5	}

1	use std::collections::hash_map::HashMap;	1	use std::collections::hash_map::HashMap;
2		2
3	fn main() {	3	fn main() {
4	let m: HashMap<i32, i32> = HashMap::new();	4	let m: HashMap<i32, i32> = HashMap::new();
5	}	5	}

1	use std::collections::hash_map::HashMap;	1	use std::collections::hash_map::HashMap;
2		2
3	fn main() {	3	fn main() {
4	let m: HashMap<i32, i32> = HashMap::new();	4	let m: HashMap<i32, i32> = ::std::collections::HashMap::with_capacity(10);
5	}	5	}

1	fn main() {	1	fn main() {
2	let x = 100_i32;	2	let x = 0;

4	let z = x + y;	4	let z = 0;
5		5
6	let a = "hello";	6	let a = "hello";
7	let b = format!("{}, {}", a, "world");	7	let b = format!("{}, {}", a, "world");
8	}	8	}

1	fn main() {	1	fn main() {
2	let a = "hello";	2	let a = "hello";
3	let b = format!("{}, {}", a, "world");	3	let b = format!("{}, {}", a, "world");
4	}	4	}

1	use std::mem;	1	use std::mem;
2		2
3	unsafe fn foo(ptr: *const u32) {	3	unsafe fn foo(ptr: *const u32) {
4	let r: &u32 = mem::transmute::<*const u32, Option<&u32>>(ptr).unwrap();	4	let r: &u32 = mem::transmute::<*const u32, Option<&u32>>(ptr).unwrap();
5		5
6	let opt_r2: Option<&u32> = mem::transmute(ptr);	6	let opt_r2: Option<&u32> = mem::transmute(ptr);
7	let r2 = opt_r2.unwrap();	7	let r2 = opt_r2.unwrap();
8	let ptr2: *const u32 = mem::transmute(r2);	8	let ptr2: *const u32 = mem::transmute(r2);
9		9
10	{	10	{
11	use std::mem::transmute;	11	use std::mem::transmute;
12	let opt_r3: Option<&u32> = transmute(ptr);	12	let opt_r3: Option<&u32> = ptr.as_ref();
13	let r3 = opt_r2.unwrap();	13	let r3 = opt_r2.unwrap();
14	}	14	}
15		15
16	/* ... */	16	/* ... */
17	}	17	}

1	use std::mem;	1	use std::mem;
2		2
3	unsafe fn foo(ptr: *const u32) {	3	unsafe fn foo(ptr: *const u32) {
4	let r: &u32 = mem::transmute::<*const u32, Option<&u32>>(ptr).unwrap();	4	let r: &u32 = ptr.as_ref().unwrap();
5		5
6	let opt_r2: Option<&u32> = mem::transmute(ptr);	6	let opt_r2: Option<&u32> = ptr.as_ref();
7	let r2 = opt_r2.unwrap();	7	let r2 = opt_r2.unwrap();
8	let ptr2: *const u32 = mem::transmute(r2);	8	let ptr2: *const u32 = r2.as_ref();
9		9
10	{	10	{
11	use std::mem::transmute;	11	use std::mem::transmute;
12	let opt_r3: Option<&u32> = transmute(ptr);	12	let opt_r3: Option<&u32> = ptr.as_ref();
13	let r3 = opt_r2.unwrap();	13	let r3 = opt_r2.unwrap();
14	}	14	}
15		15
16	/* ... */	16	/* ... */
17	}	17	}

1	use std::mem;	1	use std::mem;
2		2
3	unsafe fn foo(ptr: *const u32) {	3	unsafe fn foo(ptr: *const u32) {
4	let r: &u32 = mem::transmute::<*const u32, Option<&u32>>(ptr).unwrap();	4	let r: &u32 = ptr.as_ref().unwrap();
5		5
6	let opt_r2: Option<&u32> = mem::transmute(ptr);	6	let opt_r2: Option<&u32> = ptr.as_ref();
7	let r2 = opt_r2.unwrap();	7	let r2 = opt_r2.unwrap();
8	let ptr2: *const u32 = mem::transmute(r2);	8	let ptr2: *const u32 = mem::transmute(r2);
9		9
10	{	10	{
11	use std::mem::transmute;	11	use std::mem::transmute;
12	let opt_r3: Option<&u32> = transmute(ptr);	12	let opt_r3: Option<&u32> = ptr.as_ref();
13	let r3 = opt_r2.unwrap();	13	let r3 = opt_r2.unwrap();
14	}	14	}
15		15
16	/* ... */	16	/* ... */
17	}	17	}

1	fn f() {}	1	▶fn f() {}◀
2	trait T {}	2	▶trait T {}◀
3	struct S {}	3	▶struct S {}◀
4	mod m {	4	mod m {
5	fn g() {}	5	▶fn g() {}◀
6	}	6	}

C2Rust Manual

What is C2Rust?

Installation

Prerequisites

Building C2Rust

Translating C to Rust

Generating compile_commands.json files

... with cmake

... with intercept-build

... with bear (linux only)

C2Rust Transpiler

Basic Usage

Creating cargo build files

Cross-check instrumentation

For Developers

Known Limitations of Translation

Unimplemented

Unimplemented, might be implementable but very low priority

Likely won't ever support

C2Rust-Bitfields Crate

Requirements

Example

Tests

Acknowledgements

C2Rust Refactoring Tool

Usage

Marks

Refactoring Commands

abstract

autoretype

bitcast_retype

bytestr_to_str

canonicalize_externs

canonicalize_structs

char_literals

clear_marks

commit

convert_cast_as_ptr

convert_format_args

convert_printfs

copy_marks

create_item

delete_items

delete_marks

fix_unused_unsafe

fold_let_assign

func_to_method

generalize_items

ionize

let_x_uninitialized

link_funcs

link_incomplete_types

mark_arg_uses

mark_callers

mark_field_uses

mark_pub_in_mod

mark_related_types

mark_uses

ownership_annotate

ownership_mark_pointers

ownership_split_variants

pick_node

print_marks

print_spans

reconstruct_for_range

reconstruct_while

remove_null_terminator

remove_redundant_casts

remove_redundant_let_types

remove_unused_labels

rename_items_regex

rename_marks

rename_struct

rename_unnamed

reoganize_definitions

replace_items

retype_argument

retype_return

retype_static

rewrite_expr

Generating `compile_commands.json` files

... with `cmake`

... with `intercept-build`

... with `bear` (linux only)

`abstract`

`autoretype`

`bitcast_retype`

`bytestr_to_str`

`canonicalize_externs`

`canonicalize_structs`

`char_literals`

`clear_marks`

`commit`

`convert_cast_as_ptr`

`convert_format_args`

`convert_printfs`

`copy_marks`

`create_item`

`delete_items`

`delete_marks`

`fix_unused_unsafe`

`fold_let_assign`

`func_to_method`

`generalize_items`

`ionize`

`let_x_uninitialized`

`link_funcs`

`link_incomplete_types`

`mark_arg_uses`

`mark_callers`

`mark_field_uses`

`mark_pub_in_mod`

`mark_related_types`

`mark_uses`

`ownership_annotate`

`ownership_mark_pointers`

`ownership_split_variants`

`pick_node`

`print_marks`

`print_spans`

`reconstruct_for_range`

`reconstruct_while`

`remove_null_terminator`

`remove_redundant_casts`

`remove_redundant_let_types`

`remove_unused_labels`

`rename_items_regex`

`rename_marks`

`rename_struct`

`rename_unnamed`

`reoganize_definitions`

`replace_items`

`retype_argument`

`retype_return`

`retype_static`

`rewrite_expr`

`rewrite_stmts`

`rewrite_ty`

`select`

`select_phase2`

`set_mutability`

`set_visibility`

`sink_lets`

`sink_unsafe`

`static_collect_to_struct`

`static_to_local`

`static_to_local_ref`

`struct_assign_to_update`

`struct_merge_updates`

`test_analysis_ownership`

`test_analysis_type_eq`

`test_debug_callees`

`test_f_plus_one`

`test_insert_remove_args`

`test_one_plus_one`

`test_reflect`

`test_replace_stmts`

`test_typeck_loop`

`type_fix_rules`

`uninit_to_default`