c2rust refactor
provides a general-purpose rewriting command, rewrite_expr
,
for transforming expressions.
In its most basic form, rewrite_expr
replaces one expression with another,
everywhere in the crate:
rewrite_expr '1+1' '2'
1 | fn main() { | 1 | fn main() { |
2 | println!("{}", 1 + 1); | 2 | println!("{}", 2); |
3 | println!("{}", 1 + /*comment*/ 1); | 3 | println!("{}", 2); |
4 | println!("{}", 1 + 11); | 4 | println!("{}", 1 + 11); |
5 | } | 5 | } |
Here, all instances of the expression 1+1
(the "pattern") are replaced with
2
(the "replacement").
rewrite_expr
parses both the pattern and the replacement as Rust expressions,
and compares the structure of the expression instead of its raw text when
looking for occurrences of the pattern. This lets it recognize that 1 + 1
and 1 + /* comment */
both match the pattern 1+1
(despite being textually
distinct), while 1+11
does not (despite being textually similar).
Metavariables
In rewrite_expr
's expression pattern, any name beginning with double
underscores is a metavariable. Just as a variable in an ordinary Rust
match
expression will match any value (and bind it for later use), a
metavariable in an expression pattern will match any Rust code. For example,
the expression pattern __x + 1
will match any expression that adds 1 to
something:
rewrite_expr '__x + 1' '11'
1 | fn f() -> i32 { | 1 | fn f() -> i32 { |
2 | 123 | 2 | 123 |
3 | } | 3 | } |
4 | 4 | ||
5 | fn main() { | 5 | fn main() { |
6 | println!("a = {}", 1 + 1); | 6 | println!("a = {}", 11); |
7 | println!("b = {}", 2 * 3 + 1); | 7 | println!("b = {}", 11); |
8 | println!("c = {}", 4 + 5 + 1); | 8 | println!("c = {}", 11); |
9 | println!("d = {}", f() + 1); | 9 | println!("d = {}", 11); |
10 | } | 10 | } |
In these examples, the __x
metavariable matches the expressions 1
, 2 * 3
,
and f()
.
Using bindings
When a metavariable matches against some piece of code, the code it matches is
bound to the variable for later use. Specifically, rewrite_expr
's
replacement argument can refer back to those metavariables to substitute in the
matched code:
rewrite_expr '__x + 1' '11 * __x'
1 | fn f() -> i32 { | 1 | fn f() -> i32 { |
2 | 123 | 2 | 123 |
3 | } | 3 | } |
4 | 4 | ||
5 | fn main() { | 5 | fn main() { |
6 | println!("a = {}", 1 + 1); | 6 | println!("a = {}", 11 * 1); |
7 | println!("b = {}", 2 * 3 + 1); | 7 | println!("b = {}", 11 * (2 * 3)); |
8 | println!("c = {}", 4 + 5 + 1); | 8 | println!("c = {}", 11 * (4 + 5)); |
9 | println!("d = {}", f() + 1); | 9 | println!("d = {}", 11 * f()); |
10 | } | 10 | } |
In each case, the expression bound to the __x
metavariable is substituted
into the right-hand side of the multiplication in the replacement.
Multiple occurences
Finally, the same metavariable can appear multiple times in the pattern. In that case, the pattern matches only if each occurence of the metavariable matches the same expression. For example:
rewrite_expr '__x + __x' '2 * __x'
1 | fn f() -> i32 { | 1 | fn f() -> i32 { |
2 | 123 | 2 | 123 |
3 | } | 3 | } |
4 | 4 | ||
5 | fn main() { | 5 | fn main() { |
6 | let a = 2; | 6 | let a = 2; |
7 | println!("{}", 1 + 1); | 7 | println!("{}", 2 * 1); |
8 | println!("{}", a + a); | 8 | println!("{}", 2 * a); |
9 | println!("{}", f() + f()); | 9 | println!("{}", 2 * f()); |
10 | println!("{}", f() + 1); | 10 | println!("{}", f() + 1); |
11 | } | 11 | } |
Here a + a
and f() + f()
are both replaced, but f() + 1
is not because
__x
cannot match both f()
and 1
at the same time.
Example: adding a function argument
Suppose we wish to add an argument to an existing function. All current
callers of the function should pass a default value of 0
for this new
argument. We can update the existing calls like this:
rewrite_expr 'my_func(__x, __y)' 'my_func(__x, __y, 0)'
1 | fn my_func(x: i32, y: i32) { | 1 | fn my_func(x: i32, y: i32) { |
2 | /* ... */ | 2 | /* ... */ |
3 | } | 3 | } |
4 | 4 | ||
5 | fn main() { | 5 | fn main() { |
6 | my_func(1, 2); | 6 | my_func(1, 2, 0); |
7 | let x = 123; | 7 | let x = 123; |
8 | my_func(x, x); | 8 | my_func(x, x, 0); |
9 | my_func(0, { | 9 | my_func( |
10 | 0, | ||
11 | { | ||
10 | let y = x; | 12 | let y = x; |
11 | y + y | 13 | y + y |
14 | }, | ||
15 | 0, | ||
12 | }); | 16 | ); |
13 | } | 17 | } |
Every call to my_func
now passes a third argument, and we can update the
definition of my_func
to match.
Special matching forms
rewrite_expr
supports several special matching forms that can appear in
patterns to add extra restrictions to matching.
def!
A pattern such as def!(::foo::f)
matches any ident or path expression that
resolves to the function whose absolute path is ::foo::f
. For example, to
replace all expressions referencing the function foo::f
with ones referencing
foo::g
:
rewrite_expr 'def!(::foo::f)' '::foo::g'
1 | mod foo { | 1 | mod foo { |
2 | fn f() { | 2 | fn f() { |
3 | /* ... */ | 3 | /* ... */ |
4 | } | 4 | } |
5 | fn g() { | 5 | fn g() { |
6 | /* ... */ | 6 | /* ... */ |
7 | } | 7 | } |
8 | } | 8 | } |
9 | 9 | ||
10 | fn main() { | 10 | fn main() { |
11 | use self::foo::f; | 11 | use self::foo::f; |
12 | // All these calls get rewritten | 12 | // All these calls get rewritten |
13 | f(); | 13 | f(); |
14 | foo::f(); | ||
15 | ::foo::f(); | 14 | ::foo::g(); |
15 | ::foo::g(); | ||
16 | } | 16 | } |
17 | 17 | ||
18 | mod bar { | 18 | mod bar { |
19 | fn f() {} | 19 | fn f() {} |
20 | 20 | ||
21 | fn f_caller() { | 21 | fn f_caller() { |
22 | // This call does not... | 22 | // This call does not... |
23 | f(); | 23 | f(); |
24 | // But this one still does | 24 | // But this one still does |
25 | super::foo::f(); | 25 | ::foo::g(); |
26 | } | 26 | } |
27 | } | 27 | } |
This works for all direct references to f
, whether by relative path
(foo::f
), absolute path (::foo::f
), or imported identifier (just f
, with
use foo::f
in scope). It can even handle imports under a different name
(f2
with use foo::f as f2
in scope), since it checks only the path of the
referenced definition, not the syntax used to reference it.
Under the hood
When rewrite_expr
attempts to match def!(path)
against some expression e
,
it actually completely ignores the content of e
itself. Instead, it performs
these steps:
- Check
rustc
's name resolution results to find the definitiond
thate
resolves to. (Ife
doesn't resolve to a definition, then the matching fails.) - Construct an absolute path
dpath
referring tod
. For definitions in the current crate, this path looks like::mod1::def1
. For definitions in other crates, it looks like::crate1::mod1::def1
. - Match
dpath
against thepath
pattern provided as the argument ofdef!
. Thene
matchesdef!(path)
ifdpath
matchespath
, and fails to match otherwise.
Debugging match failures
Matching with def!
can sometimes fail in surprising ways, since the
user-provided path
is matched against a generated path that may not appear
explicitly anywhere in the source code. For example, this attempt to match
HashMap::new
does not succeed:
rewrite_expr
'def!(::std::collections::hash_map::HashMap::new)()'
'::std::collections::hash_map::HashMap::with_capacity(10)'
1 | use std::collections::hash_map::HashMap; | 1 | use std::collections::hash_map::HashMap; |
2 | 2 | ||
3 | fn main() { | 3 | fn main() { |
4 | let m: HashMap<i32, i32> = HashMap::new(); | 4 | let m: HashMap<i32, i32> = HashMap::new(); |
5 | } | 5 | } |
The debug_match_expr
command exists to diagnose such problems. It takes only
a pattern, and prints information about attempts to match it at various points
in the crate:
debug_match_expr 'def!(::std::collections::hash_map::HashMap::new)()'
Here, its output includes this line:
def!(): trying to match pattern path(::std::collections::hash_map::HashMap::new) against AST path(::std::collections::HashMap::new)
Which reveals the problem: the absolute path def!
generates for
HashMap::new
uses the reexport at std::collections::HashMap
, not the
canonical definition at std::collections::hash_map::HashMap
. Updating the
previous rewrite_expr
command allows it to succeed:
rewrite_expr
'def!(::std::collections::HashMap::new)()'
'::std::collections::HashMap::with_capacity(10)'
1 | use std::collections::hash_map::HashMap; | 1 | use std::collections::hash_map::HashMap; |
2 | 2 | ||
3 | fn main() { | 3 | fn main() { |
4 | let m: HashMap<i32, i32> = HashMap::new(); | 4 | let m: HashMap<i32, i32> = ::std::collections::HashMap::with_capacity(10); |
5 | } | 5 | } |
Metavariables
The argument to def!
is a path pattern, which can contain metavariables just
like the overall expression pattern. For instance, we can rewrite all calls to
functions from the foo
module:
rewrite_expr 'def!(::foo::__name)()' '123'
1 | mod foo { | 1 | mod foo { |
2 | fn f() { | 2 | fn f() { |
3 | /* ... */ | 3 | /* ... */ |
4 | } | 4 | } |
5 | fn g() { | 5 | fn g() { |
6 | /* ... */ | 6 | /* ... */ |
7 | } | 7 | } |
8 | } | 8 | } |
9 | 9 | ||
10 | mod bar { | 10 | mod bar { |
11 | fn f() { | 11 | fn f() { |
12 | /* ... */ | 12 | /* ... */ |
13 | } | 13 | } |
14 | fn g() { | 14 | fn g() { |
15 | /* ... */ | 15 | /* ... */ |
16 | } | 16 | } |
17 | } | 17 | } |
18 | 18 | ||
19 | fn main() { | 19 | fn main() { |
20 | foo::f(); | 20 | 123; |
21 | foo::g(); | 21 | 123; |
22 | } | 22 | } |
Since every definition in the foo
module has an absolute path of the form
::foo::(something)
, they all match the expression pattern
def!(::foo::__name)
.
Like any other metavariable, the ones in a def!
path pattern can be used in
the replacement expression to substitute in the captured name. For example, we
can replace all references to items in the foo
module with references to the
same-named items in the bar
module:
rewrite_expr 'def!(::foo::__name)' '::bar::__name'
1 | mod foo { | 1 | mod foo { |
2 | fn f() { | 2 | fn f() { |
3 | /* ... */ | 3 | /* ... */ |
4 | } | 4 | } |
5 | fn g() { | 5 | fn g() { |
6 | /* ... */ | 6 | /* ... */ |
7 | } | 7 | } |
8 | } | 8 | } |
9 | 9 | ||
10 | mod bar { | 10 | mod bar { |
11 | fn f() { | 11 | fn f() { |
12 | /* ... */ | 12 | /* ... */ |
13 | } | 13 | } |
14 | fn g() { | 14 | fn g() { |
15 | /* ... */ | 15 | /* ... */ |
16 | } | 16 | } |
17 | } | 17 | } |
18 | 18 | ||
19 | fn main() { | 19 | fn main() { |
20 | foo::f(); | 20 | ::bar::f(); |
21 | foo::g(); | 21 | ::bar::g(); |
22 | } | 22 | } |
Note, however, that each metavariable in a path pattern can match only a single
ident. This means foo::__name
will not match the path to an item in a
submodule, such as foo::one::two
. Handling these would require an additional
rewrite step, such as rewrite_expr 'def!(::foo::__name1::__name2)' '::bar::__name1::__name2'
.
typed!
A pattern of the form typed!(e, ty)
matches any expression that matches the
pattern e
, but only if the type of that expression matches the pattern ty
.
For example, we can perform a rewrite that only affects i32
s:
rewrite_expr 'typed!(__e, i32)' '0'
1 | fn main() { | 1 | fn main() { |
2 | let x = 100_i32; | 2 | let x = 0; |
4 | let z = x + y; | 4 | let z = 0; |
5 | 5 | ||
6 | let a = "hello"; | 6 | let a = "hello"; |
7 | let b = format!("{}, {}", a, "world"); | 7 | let b = format!("{}, {}", a, "world"); |
8 | } | 8 | } |
Every expression matches the metavariable __e
, but only the i32
s (whether
literals or variables of type i32
) are affected by the rewrite.
Under the hood
Internally, typed!
works much like def!
. To match an expression e
against typed!(e_pat, ty_pat)
, rewrite_expr
follows these steps:
- Consult
rustc
's typechecking results to get the type ofe
. Call that typerustc_ty
. rustc_ty
is an internal, abstract representation of the type, which is not suitable for matching. Construct a concrete representation ofrustc_ty
, and call itty
.- Match
e
againste_pat
andty
againstty_pat
. Thene
matchestyped!(e_pat, ty_pat)
if both matches succeed, and fails to match otherwise.
Debugging match failures
When matching fails unexpectedly, debug_match_expr
is once again useful for
understanding the problem. For example, this rewriting command has no effect:
rewrite_expr "typed!(__e, &'static str)" '"hello"'
1 | fn main() { | 1 | fn main() { |
2 | let a = "hello"; | 2 | let a = "hello"; |
3 | let b = format!("{}, {}", a, "world"); | 3 | let b = format!("{}, {}", a, "world"); |
4 | } | 4 | } |
Passing the same pattern to debug_match_expr
produces output that includes
the following:
typed!(): trying to match pattern type(&'static str) against AST type(&str)
Now the problem is clear: the concrete type representation constructed for
matching omits lifetimes. Replacing &'static str
with &str
in the pattern
causes the rewrite to succeed:
rewrite_expr 'typed!(__e, &str)' '"hello"'
1 | fn main() { | 1 | fn main() { |
2 | let a = "hello"; | 2 | let a = "hello"; |
3 | let b = format!("{}, {}", a, "world"); | 3 | let b = format!("{}, {}", "hello", "hello"); |
4 | } | 4 | } |
Metavariables
The expression pattern and type pattern arguments of typed!(e, ty)
are
handled using the normal rewrite_expr
matching engine, which means they can
contain metavariables and other special matching forms. For example,
metavariables can capture both parts of the expression and parts of its type
for use in the replacement:
rewrite_expr
'typed!(Vec::with_capacity(__n), ::std::vec::Vec<__ty>)'
'::std::iter::repeat(<__ty>::default())
.take(__n)
.collect::<Vec<__ty>>()'
1 | fn main() { | 1 | fn main() { |
2 | let v: Vec<&'static str> = Vec::with_capacity(20); | 2 | let v: Vec<&'static str> = ::std::iter::repeat(<&str>::default()) |
3 | .take(20) | ||
4 | .collect::<Vec<&str>>(); | ||
3 | 5 | ||
4 | let v: Vec<_> = Vec::with_capacity(10); | 6 | let v: Vec<_> = ::std::iter::repeat(<i32>::default()) |
7 | .take(10) | ||
8 | .collect::<Vec<i32>>(); | ||
5 | // Allow `v`'s element type to be inferred | 9 | // Allow `v`'s element type to be inferred |
6 | let x: i32 = v[0]; | 10 | let x: i32 = v[0]; |
7 | } | 11 | } |
Notice that the rewritten code has the correct element type in the call to
default
, even in cases where the type is not written explicitly in the
original expression! The matching of typed!
obtains the inferred type
information from rustc
, and those inferred types are captured by
metavariables in the type pattern.
Example: transmute
to <*const T>::as_ref
This example demonstrates usage of def!
and typed!
.
Suppose we have some unsafe code that uses transmute
to convert a raw
pointer that may be null (*const T
) into an optional reference
(Option<&T>
). This conversion is better expressed using the as_ref
method
of *const T
, and we'd like to apply this transformation automatically.
Initial attempt
Here is a basic first attempt:
rewrite_expr 'transmute(__e)' '__e.as_ref()'
1 | use std::mem; | 1 | use std::mem; |
2 | 2 | ||
3 | unsafe fn foo(ptr: *const u32) { | 3 | unsafe fn foo(ptr: *const u32) { |
4 | let r: &u32 = mem::transmute::<*const u32, Option<&u32>>(ptr).unwrap(); | 4 | let r: &u32 = mem::transmute::<*const u32, Option<&u32>>(ptr).unwrap(); |
5 | 5 | ||
6 | let opt_r2: Option<&u32> = mem::transmute(ptr); | 6 | let opt_r2: Option<&u32> = mem::transmute(ptr); |
7 | let r2 = opt_r2.unwrap(); | 7 | let r2 = opt_r2.unwrap(); |
8 | let ptr2: *const u32 = mem::transmute(r2); | 8 | let ptr2: *const u32 = mem::transmute(r2); |
9 | 9 | ||
10 | { | 10 | { |
11 | use std::mem::transmute; | 11 | use std::mem::transmute; |
12 | let opt_r3: Option<&u32> = transmute(ptr); | 12 | let opt_r3: Option<&u32> = ptr.as_ref(); |
13 | let r3 = opt_r2.unwrap(); | 13 | let r3 = opt_r2.unwrap(); |
14 | } | 14 | } |
15 | 15 | ||
16 | /* ... */ | 16 | /* ... */ |
17 | } | 17 | } |
This has two major shortcomings, which we will address in order:
- It works only on code that calls exactly
transmute(foo)
. The instances that importstd::mem
and callmem::transmute(foo)
do not get rewritten. - It rewrites transmutes between any types, not just
*const T
toOption<&T>
. Only transmutes between those types should be replaced withas_ref
.
Identifying transmute
calls with def!
We want to rewrite calls to std::mem::transmute
, regardless of how those
calls are written. This is a perfect use case for def!
:
rewrite_expr 'def!(::std::intrinsics::transmute)(__e)' '__e.as_ref()'
1 | use std::mem; | 1 | use std::mem; |
2 | 2 | ||
3 | unsafe fn foo(ptr: *const u32) { | 3 | unsafe fn foo(ptr: *const u32) { |
4 | let r: &u32 = mem::transmute::<*const u32, Option<&u32>>(ptr).unwrap(); | 4 | let r: &u32 = ptr.as_ref().unwrap(); |
5 | 5 | ||
6 | let opt_r2: Option<&u32> = mem::transmute(ptr); | 6 | let opt_r2: Option<&u32> = ptr.as_ref(); |
7 | let r2 = opt_r2.unwrap(); | 7 | let r2 = opt_r2.unwrap(); |
8 | let ptr2: *const u32 = mem::transmute(r2); | 8 | let ptr2: *const u32 = r2.as_ref(); |
9 | 9 | ||
10 | { | 10 | { |
11 | use std::mem::transmute; | 11 | use std::mem::transmute; |
12 | let opt_r3: Option<&u32> = transmute(ptr); | 12 | let opt_r3: Option<&u32> = ptr.as_ref(); |
13 | let r3 = opt_r2.unwrap(); | 13 | let r3 = opt_r2.unwrap(); |
14 | } | 14 | } |
15 | 15 | ||
16 | /* ... */ | 16 | /* ... */ |
17 | } | 17 | } |
Now our rewrite catches all uses of transmute
, whether they're written as
transmute(foo)
, mem::transmute(foo)
, or even ::std::mem::transmute(foo)
.
Notice that we refer to transmute
as std::intrinsics::transmute
: this is
the location of its original definition, which is re-exported in std::mem
.
See the "def!
: debugging match failures" section
for an explanation of how we discovered this.
Filtering transmute
calls by type
We now have a command for rewriting all transmute
calls, but we'd like it to
rewrite only transmutes from *const T
to Option<&T>
. We can achieve this
by filtering the input and output types with typed!
:
rewrite_expr '
typed!(
def!(::std::intrinsics::transmute)(
typed!(__e, *const __ty)
),
::std::option::Option<&__ty>
)
' '__e.as_ref()'
1 | use std::mem; | 1 | use std::mem; |
2 | 2 | ||
3 | unsafe fn foo(ptr: *const u32) { | 3 | unsafe fn foo(ptr: *const u32) { |
4 | let r: &u32 = mem::transmute::<*const u32, Option<&u32>>(ptr).unwrap(); | 4 | let r: &u32 = ptr.as_ref().unwrap(); |
5 | 5 | ||
6 | let opt_r2: Option<&u32> = mem::transmute(ptr); | 6 | let opt_r2: Option<&u32> = ptr.as_ref(); |
7 | let r2 = opt_r2.unwrap(); | 7 | let r2 = opt_r2.unwrap(); |
8 | let ptr2: *const u32 = mem::transmute(r2); | 8 | let ptr2: *const u32 = mem::transmute(r2); |
9 | 9 | ||
10 | { | 10 | { |
11 | use std::mem::transmute; | 11 | use std::mem::transmute; |
12 | let opt_r3: Option<&u32> = transmute(ptr); | 12 | let opt_r3: Option<&u32> = ptr.as_ref(); |
13 | let r3 = opt_r2.unwrap(); | 13 | let r3 = opt_r2.unwrap(); |
14 | } | 14 | } |
15 | 15 | ||
16 | /* ... */ | 16 | /* ... */ |
17 | } | 17 | } |
Now only those transmutes that turn *const T
into Option<&T>
are affected
by the rewrite. And because typed!
has access to the results of type
inference, this works even on transmute
calls that are not fully annotated
(transmute(foo)
, not just transmute::<*const T, Option<&T>>(foo)
).
marked!
The marked!
form is simple: marked!(e, label)
matches an expression only if
e
matches the expression and the expression is marked with the given label
.
See the documentation on marks and select
for more
information.
Other commands
Several other refactoring commands use the same pattern-matching engine as
rewrite_expr
:
rewrite_ty PAT REPL
(docs) works likerewrite_expr
, except it matches and replaces type annotations instead of expressions.abstract SIG PAT
(docs) replaces expressions matching a pattern with calls to a newly-created function.type_fix_rules
(docs) uses type patterns to find the appropriate rule to fix each type error.select
'smatch_expr
(docs) and similar filters use syntax patterns to identify nodes to mark.