Cross-checking Configuration
In many cases, we can add identical cross-checks to the original C and the transpiled Rust code, e.g., when the C code is naively translated to the perfectly equivalent Rust code, and everything just works. However, this might not always be the case, and we need to handle mismatches such as:
- Type mismatches between C and Rust, e.g., a C
const char*
(with or without an attached length parameter) being translated to astr
. Additionally, if a string+length value pair (with the typesconst char*
andsize_t
) gets translated to a singlestr
, we may want to omit the cross-check on the length parameter. - Whole functions added or removed by the transpiler or refactoring tool, e.g., helpers.
Note that this list is not exhaustive, so there may be many more cases of mismatches.
To handle all these cases, we need a language that lets us add new cross-checks, or modify or delete existing ones.
The cross-check language
The cross-check metadata is stored as a YAML encoding of an array of configuration entries. Each configuration entry describes the configuration for that specific check.
An example configuration file for a function foo
with 3 arguments a
, alen
and b
looks something like:
main.c:
- item: defaults
disable_xchecks: true
- item: function
name: foo
disable_xchecks: false
args:
a: default
alen: none
b: default
return: no
main.rs:
- item: function
name: foo
args:
a: default
b: default
return: no
Inline vs external configuration
We can store the cross-check configuration entries in a few places:
- Externally in separate configuration files.
- Inline in the source code, attached to the checked functions and structures.
Each approach has advantages and drawbacks. Inline configuration entries are simpler to maintain, but do not scale as well to larger codebases or more complex cross-check configuration entries. Conversely, external configuration entries are more flexible and can potentially express complex configurations in a cleaner and more elegant way, but can easily get out of sync with their corresponding source code. We currently support both approaches, with external configuration settings taking priority over inline attributes where both are present.
In the current implementation of the Rust cross-checker, inline configuration settings are passed to the enclosing scope's #[cross_check]
attribute, e.g.:
# #![allow(unused_variables)] #fn main() { #[cross_check(yes, entry(djb2="foo"))] fn bar() { } #[cross_check(yes, entry(fixed=0x1234))] fn baz() { } #}
Configuration file format
At the top level, each configuration file is a YAML associative array mapping file names to their configuration entries. Each array element maps a file name (represented as a string) to a list of individual items, each item representing a Rust/C scope entity, i.e., function or structure. Each item is encoded in YAML as an associative array. All items have a few common array members:
item
specifies the type of the current item, e.g.,function
,struct
or others.name
specifies the name of the item, i.e., the name of the function or structure.
Function cross-check configuration
Function cross-checks are configured using entries with item: function
.
Function entries support the following fields:
Field | Role |
---|---|
disable_xchecks | Disables all cross-checks for this function and everything in it if set to true . |
entry | Configures the function entry cross-check (see below for information on accepted values). |
exit | Configures the function exit cross-check. |
all_args | Specifies a cross-check override for all of this function's arguments. For example, setting all_args: none disables cross-checks for all arguments. |
args | An associative array that maps argument names to their corresponding cross-checks. This can be used to customize the cross-checks for some of the function arguments individually. This setting overrides both the global default and the one specified in all_args for the current function. |
return | Configures the function return value cross-check. |
ahasher and shasher | Override the default values for the aggregate and simple hasher for this function (see the hashing documentation for the meaning of these fields). |
nested | Recursively configures the items nested inside the current items. Since Rust allows arbitrarily deep function and structure nesting, we use this to recursively configure nested functions. |
entry_extra | Specifies a list of additional custom cross-checks to perform after the argument. Each cross-check accepts an optional tag parameter that overrides the default UNKNOWN tag. |
exit_extra | Specifies a list of additional custom cross-checks to perform on function return. |
Structure cross-check configuration
Structure entries configure cross-checks for Rust structure, tuple and enumeration types, and are tagged with item: struct
.
For a general overview of cross-checking for structures (aggregate types), see the hashing documentation.
Structure entries support the following fields:
Field | Role |
---|---|
disable_xchecks | Disable automatic cross-check emission for this structure (this is generally best left out, unless the default is true and needs to be reset to false ). |
field_hasher | Configures the replacement hasher for this structure. The hasher is a Rust object that implements the cross_check_runtime::hash::CrossCheckHasher trait. |
custom_hash | Specifies a function to call to hash objects of this type, instead of the default implementation. This function should have the signature fn foo<XCHA, XCHS>(arg: &T, depth: usize) -> u64 where T is the name of the current type. XCHA and XCHS are template parameters passed by the caller that specify the aggregate and simple hasher to use for this computation (and can be overridden using ahasher and shasher below). |
fields | An associative array that specifies custom hash computations for some or all of the structure's fields. Accepts values in the format of cross-check types. |
ahasher and shasher | Override the aggregate and simple hasher for the default hash implementation for the current type (mainly useful if field_hasher is left out). These are recursively passed to the hash function call for each structure field. |
The field_hasher
and custom_hash
provide two alternative methods of customizing the hashing algorithm for a given structure: users may either provide a custom implementation of CrossCheckHasher
and pass that to field_hasher
, or implement a hashing function and pass it to custom_hash
. The two alternatives are mostly equivalent, and users may use whichever is more convenient. Additionally, users can choose to completely disable the automatic derivation of CrossCheckHash
, and manually implement CrossCheckHasher
for some of the types instead.
Cross-check types
There are several types of cross-check implemented in the compiler:
Check | Value Type | Behavior |
---|---|---|
default | Lets the compiler perform the default cross-check. | |
none or disabled | Disables cross-checking or hashing for the current value. | |
fixed | u64 | Sets the cross-checked value to the given 64-bit integer. |
djb2 | String | Sets the cross-checked value to the djb2 hash of the given string. This is mainly useful for overriding function entry cross-checks, in case the function names don't match between languages. |
as_type | String | Perform the default value cross-check, but after casting the value to the given type, e.g., cast it to a u32 then cross-check it as a u32 . |
custom | String | Parses the given string as a C or Rust expression and uses it to compute the cross-checked value. In most cases, the string is inserted verbatim into the cross-check code, e.g., for function argument cross-checks. |
Each cross-check is encoded in YAML as either a single word with the type, e.g., default
, or a single-element associative array mapping the type to its argument, e.g., { fixed: 0x1234 }
.
More cross-check types may be added as needed.
Custom hash functions for structures
If custom_hash: { custom: "hash_foo" }
is a configuration entry for structure Foo
, then the compiler will insert a call to hash_foo
to perform the cross checks. This function should have the following signature:
# #![allow(unused_variables)] #fn main() { fn hash_foo<XCHA, XCHS>(foo: &Foo, depth: usize) -> u64 { ... } #}
The hash function receives a reference to a Foo
object and a maximum depth, and should return the 64-bit hash value for the given object.
Custom hash functions for structure fields
If bar: { custom: "hash_bar" }
is a configuration entry for field bar
, then the compiler will insert a call to hash_bar
to compute the hash for bar
. This function should have the following signature:
# #![allow(unused_variables)] #fn main() { fn hash_bar<XCHA, XCHS, S, F>(h: &mut XCHA, foo: &S, bar: &F, depth: usize) where XCHA: cross_check_runtime::hash::CrossCheckHasher { ... } #}
The function receives the following arguments:
- The current aggregate hasher for this structure. The function can call the hasher's
write_u64
function as many times as needed. - The structure containing this field. This argument has generic type
S
, so the same function can be reused for different structures. - The field itself, with generic type
F
. The function may require additional type bounds forF
to make it compatible with its callers. - The maximum hashing depth (explained in the hashing documentation).
- The type parameters
XCHA
andXCHS
bound to the current aggregate and simple value hasher for the current invocation.
This function should not return the hash value of the field. Instead, the function should call the hasher's write_u64
method directly.
Per-file default settings
The special defaults
item type specifies the default cross-check settings for all items in a file.
We currently support the following entries:
Field | Role |
---|---|
disable_xchecks | Disables all cross-checks for this file. Can be individually overridden per function or structure. |
entry | Configures the default entry cross-check for all functions in this file. |
exit | Similarly configures the function exit cross-check. |
all_args | Specifies a cross-check override for all arguments to all functions in this file. For example, setting all_args: default enables cross-checks for all arguments. |
return | Configures the function return value cross-check. |
More examples
Function example
Example configuration for a function baz1(a, b)
:
main.rs:
- item: function
name: baz1
entry: { djb2: "baz" } // Cross-check the function as "baz"
args:
a: { custom: "foo(a)" } // Cross-check a as foo(a)
b: none // Do not cross-check b
entry_extra: // Cross-check foo(b) with a FUNCTION_ARG tag
- { custom: "foo(b)", tag: FUNCTION_ARG }
- { custom: "a" } // Cross-check the value "a" with UNKNOWN_TAG
Structure example
Example configuration for a structure Foo
(illustrated on an object foo
of type Foo
):
main.rs:
- item: struct
name: Foo
field_hasher: "FooHasher" // Use FooHasher as the aggregate hasher
fields:
a: { fixed: 0x12345678 } // Use 0x12345678 as the hash of foo.a
b: { custom: "hash_b" } // Hash foo.b using hash_b(foo.b)
c: none // Ignore foo.c when hashing foo
Inline cross-check configuration
In addition to the external configuration format, a subset of cross-checks can also be configured inline in the program source code. The compiler plugin provides a custom #[cross_check]
attribute used to annotate functions, structures and fields with custom cross-check metadata.
Inline function configuration
The #[cross_check]
function attribute currently supports the following arguments:
Argument | Type | Role |
---|---|---|
none or disabled | Disable cross-checks for this function and all its sub-items (this attribute is inherited). Each sub-item can individually override this with yes or enabled . | |
yes or enabled | Enable cross-checks for this function and its sub-items. Each nested item can also override this setting with none or disabled . | |
entry | XCheckType | Cross-check to use on function entry, same as for external configuration. |
exit | XCheckType | Cross-check to use on function entry, same as for external configuration. |
all_args | XCheckType | Enable cross-checks for this function's arguments (disabled by default). Takes the cross-check type as its argument. |
args(...) | Per-argument cross-check overrides (same as for external configuration). | |
return | XCheckType | Cross-check to perform on the function return value, same as for external configuration. |
ahasher and shasher | String | Same as for external configuration. |
entry_extra and exit_extra | Same as for external configuration. |
Function example
# #![allow(unused_variables)] #fn main() { #[cross_check(yes, entry(djb2="foo"))] // Cross-check this function as "foo" fn foo1() { #[cross_check(none)] fn bar() { ... } bar(); #[cross_check(yes, all_args(default), args(a(fixed=0x123)))] fn baz(a: u8, b: u16, c: u32) { ... } baz(1, 2, 3); } #}
Inline structure configuration
The compiler plugin also supports a subset of the full external configuration settings as #[cross_check]
arguments:
Argument | Type | Role |
---|---|---|
field_hasher | String | Same as for external configuration. |
custom_hash | String | Same as for external configuration. |
ahasher and shasher | String | Same as for external configuration. |
The #[cross_check]
attribute can also be attached to structure fields to configure hashing:
Argument | Type | Role |
---|---|---|
none or disabled | This field is skipped during hashing. | |
fixed | u64 | Fixed 64-bit integer to use as the hash value for this field. Identical to the fixed external cross-check type. |
custom_hash | String | Same as for external configuration. |
Structure example
# #![allow(unused_variables)] #fn main() { #[cross_check(field_hasher="MyHasher")] struct Foo { #[cross_check(none)] foo: u64, #[cross_check(fixed=0x1234)] bar: String, #[cross_check(custom_hash="hash_baz")] baz: String, } #}
Caveats
Duplicate items
At any level or scope, there may be duplicate items, i.e., multiple items with the same names. It is not clear at this point how to best handle this case, since we have several conflicting requirements. On the one hand, we may wish to allow the configuration for one source file to be spread across multiple configuration files, and entries from later configuration files to be appended or replace entries from earlier files. On the other hand, we may have identically-named structures or functions in nested scopes that we want to configure separately. For an example, consider the following code:
fn foo(x: u32) -> u32 {
if x > 22 {
fn bar(x: u32) -> u32 {
x - 22
};
bar(x)
} else {
fn bar(x: u32) -> u32 {
x + 34
}
bar(x)
}
}
In this example, there are two distinct foo::bar
functions, and we wish to configure them separately.
However, at the top level of a file, there may only be one foo
function, so we can merge all entries for foo
together. Alternatively, we could check for multiple top-level items with the same name and exit with an error if we encounter any duplicates.
Configuration priority
Currently, if a certain cross-check is configured using both an external entry and an inline #[cross_check(...)]
attribute, the external entry takes priority. Alternatively, we may reverse this priority, or exit with an error if both are present.
Scope configuration inheritance
The configuration settings described above apply to the scope of an item. While most settings apply exclusively to the scope itself (for example, args
and all_args
settings only apply to the current function, e.g., foo
above and not any of the bar
functions) and not any of its nested sub-items, there are a few that apply to everything inside the scope. These attributes are internally "inherited" from each scope by its child scopes. Currently, the only inherited attributes are disable_xchecks
(so that disabling cross-checks for a module or function disables them for everything inside that function), ahasher
and shasher
.
Custom cross-check parameters
Custom cross-check definitions have a different format for each language. The rustc plugin accepts any Rust expression that is valid on function entry as a custom cross-check.
The clang plugin, on the other hand, only accepts a limited subset of C expressions: each cross-check specification contains the name of the function to call, optionally followed by a list of parameters to pass to the function, e.g., function
or function(arg1, arg2, ...)
. Each parameter is the name of a global variable or function argument, and is optionally preceded by &
(to pass the parameter by address instead of value) or by *
(to dereference the value if it is a pointer).
Anonymous structures
C allows developers to define anonymous structures that define the type for a single value, e.g.:
struct {
int x;
} y;
For a variety of reasons, we need to assign names to these structures ourselves. The most important reason is that we need to identify these structures in the external configuration files. We assign the names using one of the following formats, depending on the context where the anonymous structure is defined:
Assigned name | Meaning |
---|---|
Foo$field$x | This structure defines the type for the field x of the outer structure Foo . Note that Foo itself may also be an anonymous structure that follows the same naming policy. |
foo$arg$x | This structure defines the type for the argument x of function foo (as illustrated below). |
foo$result | This structure defines the return type for function foo . |
Examples
struct Foo {
struct { // This gets named `Foo$field$x`
int x;
}
};
struct { int a; } // This gets the `foo$result` name
foo(struct { int b; } x) { // The `x` argument type gets the `foo$arg$x` name
}