rust the rust programming language
Getting Start
rustup
: a command line tool for managing Rust versions and associated tools.rustc
: a command line tool for compiling Rust code.rustfmt
: a command line automatic formatter tool to format your code in a particular style.rust-analyzer
: IDE support.cargo
: Rust’s build system and package manager.
# check compiler version
rustc --version
# compile a single rust file. After compiling successfully, Rust outputs a binary executable
rustc [your_file_path].rs
# update Rust to a newly released version
rustup update
# uninstall Rust and rustup
rustup self uninstall
# open local documentation in your browser
rustup doc
# check cargo version
cargo --version
# create a project
cargo new [your_project_name]
# compile the code and create an executable file with debug mode in target/debug/[your_project_name] (or target\debug\[your_project_name].exe on Windows)
cargo build
# compile the code with optimizations for release and create an executable in target/release (compile slower but run faster than debug mode)
cargo build --release
# compile the code and then run the resultant executable all in one command
cargo run
# quickly check your code to make sure it compiles but doesn't produce an executable (much faster than cargo build)
cargo check
# ignore the Cargo.lock file and bump to the latest MINOR SemVer
cargo update
# build documentation provided by all your dependencies locally and open it in your browser
cargo doc --open
Compiling and Running Are Separate Steps
Rust files always end with the .rs extension. If you’re using more than one word in your filename, the convention is to use an underscore to separate them. For example, use hello_world.rs rather than helloworld.rs.
Before running a Rust program, you must compile it using the Rust compiler by entering the rustc
command and passing it the name of your source file, like this:
rustc main.rs
After compiling successfully, Rust outputs a binary executable.
On Linux, macOS, and PowerShell on Windows, you can see the executable by entering the ls
command in your shell:
$ ls
main main.rs
On Linux and macOS, you’ll see two files. With PowerShell on Windows, you’ll see the same three files that you would see using CMD. With CMD on Windows, you would enter the following:
> dir /B %= the /B option says to only show the file names =%
main.exe
main.pdb
main.rs
This shows the source code file with the .rs extension, the executable file (main.exe on Windows, but main on all other platforms), and, when using Windows, a file containing debugging information with the .pdb extension. From here, you run the main or main.exe file, like this:
./main # or .\main.exe on Windows
Anatomy of a Cargo Project
cargo new hello_cargo
cd hello_cargo
Go into the hello_cargo directory and list the files. You’ll see that Cargo has generated two files and one directory for us: a Cargo.toml file and a src directory with a main.rs file inside.
Cargo expects your source files to live inside the src directory. The top-level project directory is just for README files, license information, configuration files, and anything else not related to your code.
It has also initialized a new Git repository along with a .gitignore file. Git files won’t be generated if you run cargo new
within an existing Git repository; you can override this behavior by using cargo new --vcs=git
.
Note: Git is a common version control system. You can change
cargo new
to use a different version control system or no version control system by using the--vcs
flag.
Open Cargo.toml in your text editor of choice.
Filename: Cargo.toml
[package]
name = "hello_cargo"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
This file is in the TOML (Tom’s Obvious, Minimal Language) format, which is Cargo’s configuration format.
The first line, [package]
, is a section heading that indicates that the following statements are configuring a package.
The last line, [dependencies]
, is the start of a section for you to list any of your project’s dependencies. In Rust, packages of code are referred to as crates.
Now open src/main.rs and take a look:
Filename: src/main.rs
fn main() {
println!("Hello, world!");
}
The
main
function is special: it is always the first code that runs in every executable Rust program.Rust style is to indent with four spaces, not a tab.
println!
calls a Rust macro. If it had called a function instead, it would be entered asprintln
(without the!
). For now, you just need to know that using a!
means that you’re calling a macro instead of a normal function and that macros don’t always follow the same rules as functions.you see the
"Hello, world!"
string. We pass this string as an argument toprintln!
, and the string is printed to the screen.we end the line with a semicolon (
;
), which indicates that this expression is over and the next one is ready to begin. Most lines of Rust code end with a semicolon.
Programming a Guessing Game
cargo new guessing_game
cd guessing_game
Filename: Cargo.toml
[package]
name = "guessing_game"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
rand = "0.8.5" # include the rand crate as a dependency
Cargo understands Semantic Versioning (sometimes called SemVer), which is a standard for writing version numbers. The specifier 0.8.5
is actually shorthand for ^0.8.5
, which means any version that is at least 0.8.5 but below 0.9.0.
When you build a project for the first time, Cargo figures out all the versions of the dependencies that fit the criteria and then writes them to the Cargo.lock file. When you build your project in the future, Cargo will see that the Cargo.lock file exists and will use the versions specified there rather than doing all the work of figuring out versions again. In other words, your project will remain at 0.8.5 until you explicitly upgrade, thanks to the Cargo.lock file.
When you do want to update a crate, Cargo provides the command update
, which will ignore the Cargo.lock file and figure out all the latest versions that fit your specifications in Cargo.toml. Cargo will then write those versions to the Cargo.lock file. Otherwise, by default, Cargo will only look for versions greater than 0.8.5 and less than 0.9.0. If the rand
crate has released the two new versions 0.8.6 and 0.9.0, you would see the version bumped to 0.8.6 rather than 0.9.0. To use rand
version 0.9.0 or any version in the 0.9.x series, you’d have to update the Cargo.toml file manually.
Note: Another neat feature of Cargo is that running the
cargo doc --open
command will build documentation provided by all your dependencies locally and open it in your browser.
Filename: src/main.rs
// By default, Rust has a set of items defined in the standard library that it brings into the scope of every program. This set is called the prelude. If a type you want to use isn't in the prelude, you have to bring that type into scope explicitly with a use statement.
use rand::Rng;
use std::cmp::Ordering;
use std::io;
fn main() {
println!("Guess the number!");
let secret_number = rand::thread_rng().gen_range(1..=100);
loop {
println!("Please input your guess.");
let mut guess = String::new();
// The stdin function returns an instance of std::io::Stdin, which is a type that represents a handle to the standard input for your terminal.
io::stdin()
.read_line(&mut guess)
.expect("Failed to read line");
// Rust allows us to shadow the previous value of `guess` with a new one. Shadowing lets us reuse the `guess` variable name rather than forcing us to create two unique variables, such as `guess_str` and `guess`, for example.
let guess: u32 = match guess.trim().parse() {
Ok(num) => num,
Err(_) => continue,
};
println!("You guessed: {guess}");
match guess.cmp(&secret_number) {
Ordering::Less => println!("Too small!"),
Ordering::Greater => println!("Too big!"),
Ordering::Equal => {
println!("You win!");
break;
}
}
}
}
Common Programming Concepts
Variables and Mutability
By default, variables are immutable. When a variable is immutable, once a value is bound to a name, you can’t change that value.
Filename: src/main.rs
fn main() {
let x = 5;
println!("The value of x is: {x}");
x = 6; // error! cannot assign twice to immutable variable `x`
println!("The value of x is: {x}");
}
But mutability can be very useful, and can make code more convenient to write. Although variables are immutable by default, you can make them mutable by adding mut
in front of the variable name.
Filename: src/main.rs
fn main() {
let mut x = 5;
println!("The value of x is: {x}");
x = 6;
println!("The value of x is: {x}");
}
Constants
Differences between constants and variables:
you aren’t allowed to use
mut
with constants. You declare constants using theconst
keyword instead of thelet
keyword, and the type of the value must be annotated.constants may be set only to a constant expression, not the result of a value that could only be computed at runtime.
Here’s an example of a constant declaration:
const THREE_HOURS_IN_SECONDS: u32 = 60 * 60 * 3;
Rust’s naming convention for constants is to use all uppercase with underscores between words.
Constants are valid for the entire time a program runs, within the scope in which they were declared.
Shadowing
You can declare a new variable with the same name as a previous variable. In effect, the second variable overshadows the first, taking any uses of the variable name to itself until either it itself is shadowed or the scope ends. We can shadow a variable by using the same variable’s name and repeating the use of the let
keyword as follows:
Filename: src/main.rs
fn main() {
let x = 5;
let x = x + 1;
{
let x = x * 2;
println!("The value of x in the inner scope is: {x}");
// The value of x in the inner scope is: 12
}
println!("The value of x is: {x}");
// The value of x is: 6
}
Differences between shadowing and making a variable:
we’ll get a compile-time error if we accidentally try to reassign to this variable without using the
let
keyword. By usinglet
, we can perform a few transformations on a value but have the variable be immutable after those transformations have been completed.we’re effectively creating a new variable when we use the
let
keyword again, we can change the type of the value but reuse the same name.
let spaces = " "; // string type
let spaces = spaces.len(); // number type
However, if we try to use mut
for this, as shown here, we’ll get a compile-time error:
let mut spaces = " ";
spaces = spaces.len(); // error! mismatched types
Data Types
Keep in mind that Rust is a statically typed language, which means that it must know the types of all variables at compile time. The compiler can usually infer what type we want to use based on the value and how we use it.
In cases when many types are possible, such as when we converted a String
to a numeric type using parse
, we must add a type annotation, like this:
let guess: u32 = "42".parse().expect("Not a number!");
Scalar Types
A scalar type represents a single value. Rust has four primary scalar types: integers, floating-point numbers, Booleans, and characters.
Integer Types
An integer is a number without a fractional component.
Length | Signed | Unsigned |
---|---|---|
8-bit | i8 |
u8 |
16-bit | i16 |
u16 |
32-bit | i32 |
u32 |
64-bit | i64 |
u64 |
128-bit | i128 |
u128 |
arch | isize |
usize |
Each signed variant can store numbers from -2^(n - 1) to 2^(n - 1) - 1 inclusive, where n is the number of bits that variant uses. So an i8
can store numbers from -2^7 to 2^7 - 1, which equals -128 to 127.
Unsigned variants can store numbers from 0 to 2^n - 1. So a u8
can store numbers from 0 to 2^8 - 1, which equals 0 to 255.
Additionally, the isize
and usize
types depend on the architecture of the computer your program is running on, which is denoted in the table as “arch”: 64 bits if you’re on a 64-bit architecture and 32 bits if you’re on a 32-bit architecture.
Note that number literals that can be multiple numeric types allow a type suffix, such as 57u8
, to designate the type. Number literals can also use _
as a visual separator to make the number easier to read, such as 1_000
, which will have the same value as if you had specified 1000
.
Number Literals | Example |
---|---|
Decimal | 98_222 |
Hex | 0xff |
Octal | 0o77 |
Binary | 0b1111_0000 |
Byte (u8 only) |
b'A' |
So how do you know which type of integer to use? If you’re unsure, Rust’s defaults are generally good places to start: integer types default to i32
. The primary situation in which you’d use isize
or usize
is when indexing some sort of collection.
Integer Overflow
If you try to change the variable to a value outside that range, such as
256u8
, integer overflow will occur, which can result in one of two behaviors:
When you’re compiling in debug mode, Rust includes checks for integer overflow that cause your program to panic at runtime if this behavior occurs. Rust uses the term panicking when a program exits with an error.
When you’re compiling in release mode with the
--release
flag, Rust does not include checks for integer overflow that cause panics. Instead, if overflow occurs, Rust performs two’s complement wrapping. In short, values greater than the maximum value the type can hold “wrap around” to the minimum of the values the type can hold. In the case of au8
, the value 256 becomes 0, the value 257 becomes 1, and so on. The program won’t panic, but the variable will have a value that probably isn’t what you were expecting it to have.Relying on integer overflow’s wrapping behavior is considered an error.
To explicitly handle the possibility of overflow, you can use these families of methods provided by the standard library for primitive numeric types:
Wrap in all modes with the
wrapping_*
methods, such aswrapping_add
.Return the
None
value if there is overflow with thechecked_*
methods.Return the value and a boolean indicating whether there was overflow with the
overflowing_*
methods.Saturate at the value’s minimum or maximum values with the
saturating_*
methods.
Floating-Point Types
Rust’s floating-point types are f32
and f64
, which are 32 bits and 64 bits in size, respectively.
The default type is f64
because on modern CPUs, it’s roughly the same speed as f32
but is capable of more precision.
All floating-point types are signed.
Filename: src/main.rs
fn main() {
let x = 2.0; // f64
let y: f32 = 3.0; // f32
}
Floating-point numbers are represented according to the IEEE-754 standard. The f32
type is a single-precision float, and f64
has double precision.
Numeric Operations
Rust supports the basic mathematical operations you’d expect for all the number types: addition, subtraction, multiplication, division, and remainder.
Integer division truncates toward zero to the nearest integer.
Filename: src/main.rs
fn main() {
// addition
let sum = 5 + 10;
// subtraction
let difference = 95.5 - 4.3;
// multiplication
let product = 4 * 30;
// division
let quotient = 56.7 / 32.2;
let truncated = -5 / 3; // Results in -1
// remainder
let remainder = 43 % 5;
}
The Boolean Type
A Boolean type in Rust has two possible values: true
and false
.
Booleans are one byte in size.
The Boolean type in Rust is specified using bool
.
Filename: src/main.rs
fn main() {
let t = true;
let f: bool = false; // with explicit type annotation
}
The Character Type
Rust’s char
type is the language’s most primitive alphabetic type.
Filename: src/main.rs
fn main() {
let c = 'z';
let z: char = 'ℤ'; // with explicit type annotation
let heart_eyed_cat = '😻';
}
Note that we specify char
literals with single quotes, as opposed to string literals, which use double quotes.
Rust’s char
type is four bytes in size and represents a Unicode Scalar Value, which means it can represent a lot more than just ASCII. Accented letters; Chinese, Japanese, and Korean characters; emoji; and zero-width spaces are all valid char
values in Rust. Unicode Scalar Values range from U+0000
to U+D7FF
and U+E000
to U+10FFFF
inclusive.
However, a “character” isn’t really a concept in Unicode, so your human intuition for what a “character” is may not match up with what a char
is in Rust.
Compound Types
Compound types can group multiple values into one type. Rust has two primitive compound types: tuples and arrays.
The Tuple Type
Tuples have a fixed length: once declared, they cannot grow or shrink in size.
Each position in the tuple has a type, and the types of the different values in the tuple don’t have to be the same.
We’ve added optional type annotations in this example:
Filename: src/main.rs
fn main() {
let tup: (i32, f64, u8) = (500, 6.4, 1);
}
To get the individual values out of a tuple, we can use pattern matching to destructure a tuple value(即解构赋值), like this:
Filename: src/main.rs
fn main() {
let tup = (500, 6.4, 1);
let (x, y, z) = tup;
println!("The value of y is: {y}");
}
We can also access a tuple element directly by using a period (.
) followed by the index of the value we want to access. For example:
Filename: src/main.rs
fn main() {
let x: (i32, f64, u8) = (500, 6.4, 1);
let five_hundred = x.0;
let six_point_four = x.1;
let one = x.2;
}
The tuple without any values has a special name, unit. This value and its corresponding type are both written ()
and represent an empty value or an empty return type. Expressions implicitly return the unit value if they don’t return any other value.
The Array Type
Unlike a tuple, every element of an array must have the same type. Unlike arrays in some other languages, arrays in Rust have a fixed length.
Filename: src/main.rs
fn main() {
let a = [1, 2, 3, 4, 5];
}
Arrays are useful when you want your data allocated on the stack rather than the heap or when you want to ensure you always have a fixed number of elements.
An array isn’t as flexible as the vector type, though. A vector is a similar collection type provided by the standard library that is allowed to grow or shrink in size. If you’re unsure whether to use an array or a vector, chances are you should use a vector.
Write an array’s type using square brackets with the type of each element, a semicolon, and then the number of elements in the array, like so:
let a: [i32; 5] = [1, 2, 3, 4, 5];
You can also initialize an array to contain the same value for each element by specifying the initial value, followed by a semicolon, and then the length of the array in square brackets, as shown here:
let a = [3; 5];
// The same as
// let a = [3, 3, 3, 3, 3];
Accessing Array Elements
An array is a single chunk of memory of a known, fixed size that can be allocated on the stack. You can access elements of an array using indexing, like this:
Filename: src/main.rs
fn main() {
let a = [1, 2, 3, 4, 5];
let first = a[0];
let second = a[1];
}
Invalid Array Element Access
Let’s see what happens if you try to access an element of an array that is past the end of the array. Say you run this code to get an array index from the user:
Filename: src/main.rs
use std::io;
fn main() {
let a = [1, 2, 3, 4, 5];
println!("Please enter an array index.");
let mut index = String::new();
io::stdin()
.read_line(&mut index)
.expect("Failed to read line");
let index: usize = index
.trim()
.parse()
.expect("Index entered was not a number");
let element = a[index];
println!("The value of the element at index {index} is: {element}");
}
This code compiles successfully. The program resulted in a runtime error at the point of using an invalid value in the indexing operation.
When you attempt to access an element using indexing, Rust will check that the index you’ve specified is less than the array length. If the index is greater than or equal to the length, Rust will panic.
Functions
Rust code uses snake case as the conventional style for function and variable names, in which all letters are lowercase and underscores separate words. Here’s a program that contains an example function definition:
Filename: src/main.rs
fn main() {
println!("Hello, world!");
another_function();
}
fn another_function() {
println!("Another function.");
}
Rust doesn’t care where you define your functions, only that they’re defined somewhere in a scope that can be seen by the caller.
Parameters
Technically, the concrete values are called arguments, but in casual conversation, people tend to use the words parameter and argument interchangeably for either the variables in a function’s definition or the concrete values passed in when you call a function.
Filename: src/main.rs
fn main() {
print_labeled_measurement(5, 'h');
}
fn print_labeled_measurement(value: i32, unit_label: char) {
println!("The measurement is: {value}{unit_label}");
}
In function signatures, you must declare the type of each parameter.
Statements and Expressions
Function bodies are made up of a series of statements optionally ending in an expression.
Statements are instructions that perform some action and do not return a value.
Expressions evaluate to a resultant value.
Creating a variable and assigning a value to it with the let
keyword is a statement. Function definitions are also statements.
Filename: src/main.rs
fn main() { // Function definitions are also statements
let y = 6; // let y = 6; is a statement. 6 is an expression
}
Statements do not return values. Therefore, you can’t assign a let statement to another variable, as the following code tries to do; you’ll get an error:
Filename: src/main.rs
fn main() {
let x = (let y = 6); // error! expected expression, found `let` statement
}
This is different from what happens in other languages, such as C and Ruby, where the assignment returns the value of the assignment. In those languages, you can write x = y = 6
and have both x
and y
have the value 6
; that is not the case in Rust.
Expressions can be part of statements. The 6
in the statement let y = 6;
is an expression that evaluates to the value 6
.
Calling a function is an expression. Calling a macro is an expression. A new scope block created with curly brackets is an expression, for example:
Filename: src/main.rs
fn main() {
let y = {
let x = 3;
x + 1 // Note that this line doesn't have a semicolon at the end
};
println!("The value of y is: {y}");
}
Expressions do not include ending semicolons. If you add a semicolon to the end of an expression, you turn it into a statement, and it will then not return a value.
Functions with Return Values
Functions can return values to the code that calls them. We don’t name return values, but we must declare their type after an arrow (->
).
In Rust, the return value of the function is synonymous with the value of the final expression in the block of the body of a function. You can return early from a function by using the return
keyword and specifying a value, but most functions return the last expression implicitly. Here’s an example of a function that returns a value:
Filename: src/main.rs
fn five() -> i32 {
5
}
fn main() {
let x = five();
println!("The value of x is: {x}");
}
But if we place a semicolon at the end of the line containing 5
, changing it from an expression to a statement, we’ll get an error:
Filename: src/main.rs
fn five() -> i32 {
5; // error! mismatched types
}
fn main() {
let x = five();
println!("The value of x is: {x}");
}
The definition of the function five
says that it will return an i32
, but statements don’t evaluate to a value, which is expressed by ()
, the unit type. Therefore, nothing is returned, which contradicts the function definition and results in an error.
Comments
In Rust, the idiomatic comment style starts a comment with two slashes, and the comment continues until the end of the line.
For comments that extend beyond a single line, you’ll need to include //
on each line, like this:
// So we're doing something complicated here, long enough that we need
// multiple lines of comments to do it! Whew! Hopefully, this comment will
// explain what's going on.
Comments can also be placed at the end of lines containing code:
Filename: src/main.rs
fn main() {
let lucky_number = 7; // I'm feeling lucky today
}
Rust also has another kind of comment, documentation comments.
Control Flow
if Expressions
An if
expression allows you to branch your code depending on conditions.
Filename: src/main.rs
fn main() {
let number = 3;
if number < 5 {
println!("condition was true");
} else {
println!("condition was false");
}
}
It’s also worth noting that the condition must be a bool
. If the condition isn’t a bool
, we’ll get an error. For example, try running the following code:
Filename: src/main.rs
fn main() {
let number = 3;
if number { // error! mismatched types. expected `bool`, found integer
println!("number was three");
}
}
Unlike languages such as Ruby and JavaScript, Rust will not automatically try to convert non-Boolean types to a Boolean. You must be explicit and always provide if
with a Boolean as its condition.
Handling Multiple Conditions with else if
You can use multiple conditions by combining if
and else
in an else if
expression. For example:
Filename: src/main.rs
fn main() {
let number = 6;
if number % 4 == 0 {
println!("number is divisible by 4");
} else if number % 3 == 0 {
println!("number is divisible by 3");
} else if number % 2 == 0 {
println!("number is divisible by 2");
} else {
println!("number is not divisible by 4, 3, or 2");
}
}
// output
// number is divisible by 3
When this program executes, it checks each if
expression in turn and executes the first body for which the condition evaluates to true
.
Note that even though 6 is divisible by 2, we don’t see the output number is divisible by 2
, nor do we see the number is not divisible by 4, 3, or 2
text from the else
block. That’s because Rust only executes the block for the first true
condition, and once it finds one, it doesn’t even check the rest.
Using too many else if
expressions can clutter your code, so if you have more than one, you might want to refactor your code by match
.
Using if in a let Statement
Because if
is an expression, we can use it on the right side of a let
statement to assign the outcome to a variable.
Filename: src/main.rs
fn main() {
let condition = true;
let number = if condition { 5 } else { 6 };
println!("The value of number is: {number}");
}
The values that have the potential to be results from each arm of the if
must be the same type. If the types are mismatched, as in the following example, we’ll get an error:
Filename: src/main.rs
fn main() {
let condition = true;
let number = if condition { 5 } else { "six" }; // error! `if` and `else` have incompatible types
println!("The value of number is: {number}");
}
Repetition with Loops
Rust has three kinds of loops: loop
, while
, and for
.
Repeating Code with loop
The loop
keyword tells Rust to execute a block of code over and over again forever or until you explicitly tell it to stop.
Filename: src/main.rs
fn main() {
loop {
println!("again!");
}
}
You can place the break
keyword within the loop to tell the program when to stop executing the loop.
We can also use continue
in a loop, which tells the program to skip over any remaining code in this iteration of the loop and go to the next iteration.
Returning Values from Loops
One of the uses of a loop
is to retry an operation you know might fail, such as checking whether a thread has completed its job.
You might also need to pass the result of that operation out of the loop to the rest of your code. To do this, you can add the value you want returned after the break
expression you use to stop the loop; that value will be returned out of the loop so you can use it, as shown here:
fn main() {
let mut counter = 0;
let result = loop {
counter += 1;
if counter == 10 {
break counter * 2;
}
};
println!("The result is {result}");
}
// The result is 20
Loop Labels to Disambiguate Between Multiple Loops
If you have loops within loops, break
and continue
apply to the innermost loop at that point. You can optionally specify a loop label on a loop that you can then use with break
or continue
to specify that those keywords apply to the labeled loop instead of the innermost loop. Loop labels must begin with a single quote. Here’s an example with two nested loops:
fn main() {
let mut count = 0;
'counting_up: loop {
println!("count = {count}");
let mut remaining = 10;
loop {
println!("remaining = {remaining}");
if remaining == 9 {
break;
}
if count == 2 {
break 'counting_up;
}
remaining -= 1;
}
count += 1;
}
println!("End count = {count}");
}
// Output
//
// count = 0
// remaining = 10
// remaining = 9
// count = 1
// remaining = 10
// remaining = 9
// count = 2
// remaining = 10
// End count = 2
Conditional Loops with while
A program will often need to evaluate a condition within a loop. While the condition is true
, the loop runs. When the condition ceases to be true
, the program calls break
, stopping the loop. However, this pattern is so common that Rust has a built-in language construct for it, called a while
loop.
Filename: src/main.rs
fn main() {
let mut number = 3;
while number != 0 {
println!("{number}!");
number -= 1;
}
println!("LIFTOFF!!!");
}
Looping Through a Collection with for
You can use a for
loop and execute some code for each item in a collection.
Filename: src/main.rs
fn main() {
let a = [10, 20, 30, 40, 50];
for element in a {
println!("the value is: {element}");
}
}
Range
can also be used in for loops.
Filename: src/main.rs
fn main() {
for number in (1..4).rev() {
println!("{number}!");
}
println!("LIFTOFF!!!");
}
Understanding Ownership
Ownership enables Rust to make memory safety guarantees without needing a garbage collector, so it’s important to understand how ownership works.
What Is Ownership?
Ownership is a set of rules that govern how a Rust program manages memory.
Memory is managed through a system of ownership with a set of rules that the compiler checks. If any of the rules are violated, the program won’t compile. None of the features of ownership will slow down your program while it’s running.
The Stack and the Heap
Stack
The stack stores values in the order it gets them and removes the values in the opposite order. This is referred to as last in, first out.
Adding data is called pushing onto the stack, and removing data is called popping off the stack.
All data stored on the stack must have a known, fixed size. Data with an unknown size at compile time or a size that might change must be stored on the heap instead.
Heap
The heap is less organized: when you put data on the heap, you request a certain amount of space. The memory allocator finds an empty spot in the heap that is big enough, marks it as being in use, and returns a pointer, which is the address of that location. This process is called allocating on the heap and is sometimes abbreviated as just allocating (pushing values onto the stack is not considered allocating). Because the pointer to the heap is a known, fixed size, you can store the pointer on the stack, but when you want the actual data, you must follow the pointer.
Pushing to the stack is faster than allocating on the heap because the allocator never has to search for a place to store new data; that location is always at the top of the stack. Comparatively, allocating space on the heap requires more work because the allocator must first find a big enough space to hold the data and then perform bookkeeping to prepare for the next allocation.
Accessing data in the heap is slower than accessing data on the stack because you have to follow a pointer to get there. Contemporary processors are faster if they jump around less in memory. A processor can do its job better if it works on data that’s close to other data (as it is on the stack) rather than farther away (as it can be on the heap).
When your code calls a function, the values passed into the function (including, potentially, pointers to data on the heap) and the function’s local variables (just the pointers and references, not the data) get pushed onto the stack. When the function is over, those values get popped off the stack.
Ownership Rules
Each value in Rust has an owner.
There can only be one owner at a time.
When the owner goes out of scope, the value will be dropped.
Variable Scope
A variable and the scope in which it is valid.
{ // s is not valid here, it's not yet declared
let s = "hello"; // s is valid from this point forward
// do stuff with s
} // this scope is now over, and s is no longer valid
The String Type
String type manages data allocated on the heap and as such is able to store an amount of text that is unknown to us at compile time.
You can create a String
from a string literal using the from
function, like so:
let s = String::from("hello");
The double colon ::
operator allows us to namespace this particular from
function under the String
type rather than using some sort of name like string_from
.
This kind of string can be mutated:
let mut s = String::from("hello");
s.push_str(", world!"); // push_str() appends a literal to a String
println!("{}", s); // This will print `hello, world!`
Memory and Allocation
With the String
type, in order to support a mutable, growable piece of text, we need to allocate an amount of memory on the heap, unknown at compile time, to hold the contents. This means:
The memory must be requested from the memory allocator at runtime.
We need a way of returning this memory to the allocator when we’re done with our
String
.
The memory is automatically returned once the variable that owns it goes out of scope. Here’s a version of our scope example using a String
instead of a string literal:
{
let s = String::from("hello"); // s is valid from this point forward
// do stuff with s
} // this scope is now over, and s is no longer valid
There is a natural point at which we can return the memory our String
needs to the allocator: when s
goes out of scope. When a variable goes out of scope, Rust calls a special function for us. This function is called drop
, and it’s where the author of String
can put the code to return the memory. Rust calls drop
automatically at the closing curly bracket.
Variables and Data Interacting with Move
Multiple variables can interact with the same data in different ways in Rust. Let’s look at an example using an integer in the following listing.
let x = 5;
let y = x;
We can probably guess what this is doing: “bind the value 5
to x
; then make a copy of the value in x
and bind it to y
.” We now have two variables, x
and y
, and both equal 5
. This is indeed what is happening, because integers are simple values with a known, fixed size, and these two 5
values are pushed onto the stack.
Now let’s look at the String
version:
let s1 = String::from("hello");
let s2 = s1;
s1
on the stack (not valid after assigning to s2
)
name | value |
---|---|
ptr | (s1 on the heap) |
len | 5 |
capacity | 5 |
s1
on the heap
index | value |
---|---|
0 | h |
1 | e |
2 | l |
3 | l |
4 | o |
s2
on the stack
name | value |
---|---|
ptr | (s1 on the heap) |
len | 5 |
capacity | 5 |
The length is how much memory, in bytes, the contents of the
String
are currently using. The capacity is the total amount of memory, in bytes, that theString
has received from the allocator. The difference between length and capacity matters, but not in this context, so for now, it’s fine to ignore the capacity.
If you’ve heard the terms shallow copy and deep copy while working with other languages, the concept of copying the pointer, length, and capacity without copying the data probably sounds like making a shallow copy. But because Rust also invalidates the first variable, instead of being called a shallow copy, it’s known as a move.
Move helps us to solve the double free error: When a variable goes out of scope, Rust automatically calls the drop
function and cleans up the heap memory for that variable. Freeing memory twice can lead to memory corruption. After the line let s2 = s1
, Rust considers s1
as no longer valid. Therefore, Rust doesn’t need to free anything when s1
goes out of scope.
Check out what happens when you try to use s1
after s2
is created; it won’t work:
let s1 = String::from("hello");
let s2 = s1;
println!("{}, world!", s1);
// You'll get an error because Rust prevents you from using the invalidated reference
Rust will never automatically create “deep” copies of your data. Therefore, any automatic copying can be assumed to be inexpensive in terms of runtime performance.
Variables and Data Interacting with Clone
If we do want to deeply copy the heap data of the String
, not just the stack data, we can use a common method called clone
. Here’s an example of the clone method in action:
let s1 = String::from("hello");
let s2 = s1.clone();
println!("s1 = {}, s2 = {}", s1, s2);
When you see a call to clone
, you know that some arbitrary code is being executed and that code may be expensive. It’s a visual indicator that something different is going on.
Stack-Only Data: Copy
This code using integers works and is valid:
let x = 5;
let y = x;
println!("x = {}, y = {}", x, y);
But this code seems to contradict what we just learned: we don’t have a call to clone
, but x
is still valid and wasn’t moved into y
.
The reason is that types such as integers that have a known size at compile time are stored entirely on the stack, so copies of the actual values are quick to make. There’s no difference between deep and shallow copying here.
Rust has a special annotation called the Copy
trait that we can place on types that are stored on the stack. If a type implements the Copy
trait, variables that use it do not move, but rather are trivially copied, making them still valid after assignment to another variable.
Rust won’t let us annotate a type with Copy
if the type, or any of its parts, has implemented the Drop
trait. If the type needs something special to happen when the value goes out of scope and we add the Copy
annotation to that type, we’ll get a compile-time error.
Drop
trait has no relationship with move mechanism.String
can move and hasDrop
trait. But a struct only contains scalar type can move and doesn’t haveDrop
trait.use std::mem::needs_drop; struct Point { x: i32, y: i32, } fn main() { let p1 = Point { x: 1, y: 2 }; let p2 = p1; // `p1` is moved to `p2` // `p1` can no longer be used, but `p2` can be used normally println!("p2: ({}, {})", p2.x, p2.y); println!("Point has Drop trait: {}", needs_drop::<Point>()); println!("String has Drop trait: {}", needs_drop::<String>()); // Output // p2: (1, 2) // Point has Drop trait: false // String has Drop trait: true }
As a general rule, any group of simple scalar values can implement Copy
, and nothing that requires allocation or is some form of resource can implement Copy
. Here are some of the types that implement Copy
:
All the integer types, such as
u32
.The Boolean type,
bool
, with valuestrue
andfalse
.All the floating-point types, such as
f64
.The character type,
char
.Tuples, if they only contain types that also implement
Copy
. For example,(i32, i32)
implementsCopy
, but(i32, String)
does not.
Drop
trait and drop mechanismWhy scalar types like
u32
doesn’t implementDrop
trait but still be dropped when out of scope?Because scalar types deallocate memory by the drop mechanism. The
Drop
trait just adds custom cleanup logic to the existing drop mechanism.
Drop
trait: Provides a way to define extra custom cleanup logic for a type. When a type implementingDrop
trait goes out of scope, its drop method is called.Drop mechanism: The overarching system that ensures memory and resources are properly cleaned up when values go out of scope. Known as
drop
method (not thedrop
method onDrop
trait).
Ownership and Functions
Passing a variable to a function will move or copy, just as assignment does. The following listing has an example with some annotations showing where variables go into and out of scope.
Filename: src/main.rs
fn main() {
let s = String::from("hello"); // s comes into scope
takes_ownership(s); // s's value moves into the function...
// ... and so is no longer valid here
let x = 5; // x comes into scope
makes_copy(x); // x would move into the function,
// but i32 is Copy, so it's okay to still use x afterward
} // Here, x goes out of scope, then s. But because s's value was moved, nothing special happens.
fn takes_ownership(some_string: String) { // some_string comes into scope
println!("{}", some_string);
} // Here, some_string goes out of scope and `drop` is called. The backing memory is freed.
fn makes_copy(some_integer: i32) { // some_integer comes into scope
println!("{}", some_integer);
} // Here, some_integer goes out of scope. Nothing special happens.
Return Values and Scope
Returning values can also transfer ownership. The following listing shows an example of a function that returns some value.
Filename: src/main.rs
fn main() {
let s1 = gives_ownership(); // gives_ownership moves its return value into s1
let s2 = String::from("hello"); // s2 comes into scope
let s3 = takes_and_gives_back(s2); // s2 is moved into takes_and_gives_back, which also moves its return value into s3
} // Here, s3 goes out of scope and is dropped. s2 was moved, so nothing happens. s1 goes out of scope and is dropped.
fn gives_ownership() -> String { // gives_ownership will move its return value into the function that calls it
let some_string = String::from("yours"); // some_string comes into scope
some_string // some_string is returned and moves out to the calling function
}
// This function takes a String and returns one
fn takes_and_gives_back(a_string: String) -> String { // a_string comes into scope
a_string // a_string is returned and moves out to the calling function
}
The ownership of a variable follows the same pattern every time: assigning a value to another variable moves it. When a variable that includes data on the heap goes out of scope, the value will be cleaned up by drop
unless ownership of the data has been moved to another variable.
References and Borrowing
A reference is like a pointer in that it’s an address we can follow to access the data stored at that address; that data is owned by some other variable. Unlike a pointer, a reference is guaranteed to point to a valid value of a particular type for the life of that reference.
Here is how you would define and use a calculate_length
function that has a reference to an object as a parameter instead of taking ownership of the value:
Filename: src/main.rs
fn main() {
let s1 = String::from("hello");
let len = calculate_length(&s1);
println!("The length of '{}' is {}.", s1, len);
}
fn calculate_length(s: &String) -> usize { // s is a reference to a String
s.len()
} // Here, s goes out of scope. But because it does not have ownership of what it refers to, it is not dropped.
These ampersands(&
) represent references, and they allow you to refer to some value without taking ownership of it.
s on the stack
name | value |
---|---|
ptr | (s1 ptr) |
s1 on the stack
name | value |
---|---|
ptr | (s1 on the heap) |
len | 5 |
capacity | 5 |
s1 on the heap
index | value |
---|---|
0 | h |
1 | e |
2 | l |
3 | l |
4 | o |
Note: The opposite of referencing by using
&
is dereferencing, which is accomplished with the dereference operator,*
.
When functions have references as parameters instead of the actual values, we won’t need to return the values in order to give back ownership, because we never had ownership, the value it points to will not be dropped when the reference stops being used.
We call the action of creating a reference borrowing.
Just as variables are immutable by default, so are references. We’re not allowed to modify something we have a reference to.
Mutable References
Filename: src/main.rs
fn main() {
let mut s = String::from("hello");
change(&mut s);
}
fn change(some_string: &mut String) {
some_string.push_str(", world");
}
We create a mutable reference with &mut
.
Mutable references have one big restriction: if you have a mutable reference to a value, you can have no other references to that value.
The benefit of having this restriction is that Rust can prevent data races at compile time.
Data races cause undefined behavior and can be difficult to diagnose and fix when you’re trying to track them down at runtime; Rust prevents this problem by refusing to compile code with data races.
As always, we can use curly brackets to create a new scope, allowing for multiple mutable references, just not simultaneous ones:
let mut s = String::from("hello");
{
let r1 = &mut s;
} // r1 goes out of scope here, so we can make a new reference with no problems.
let r2 = &mut s;
We also cannot have a mutable reference while we have an immutable one to the same value.
let mut s = String::from("hello");
let r1 = &s; // no problem
let r2 = &s; // no problem
let r3 = &mut s; // BIG PROBLEM
println!("{}, {}, and {}", r1, r2, r3);
Users of an immutable reference don’t expect the value to suddenly change out from under them! However, multiple immutable references are allowed because no one who is just reading the data has the ability to affect anyone else’s reading of the data.
Note that a reference’s scope starts from where it is introduced and continues through the last time that reference is used. For instance, this code will compile because the last usage of the immutable references, the println!
, occurs before the mutable reference is introduced:
let mut s = String::from("hello");
let r1 = &s; // no problem
let r2 = &s; // no problem
println!("{} and {}", r1, r2);
// variables r1 and r2 will be used after this point
let r3 = &mut s; // no problem
println!("{}", r3);
Dangling References
In languages with pointers, it’s easy to erroneously create a dangling pointer—a pointer that references a location in memory that may have been given to someone else—by freeing some memory while preserving a pointer to that memory.
In Rust, by contrast, the compiler guarantees that references will never be dangling references: if you have a reference to some data, the compiler will ensure that the data will not go out of scope before the reference to the data does.
Let’s try to create a dangling reference to see how Rust prevents them with a compile-time error:
Filename: src/main.rs
fn main() {
let reference_to_nothing = dangle();
}
fn dangle() -> &String { // dangle returns a reference to a String
let s = String::from("hello"); // s is a new String
&s // we return a reference to the String, s
} // Here, s goes out of scope, and is dropped. Its memory goes away. Danger!
Because s
is created inside dangle
, when the code of dangle
is finished, s
will be deallocated. But we tried to return a reference to it. That means this reference would be pointing to an invalid String
. That’s no good! Rust won’t let us do this.
The Rules of References
At any given time, you can have either one mutable reference or any number of immutable references.
References must always be valid.
The Slice Type
Slices let you reference a contiguous sequence of elements in a collection rather than the whole collection. A slice is a kind of reference, so it does not have ownership.
String Slices
A string slice is a reference to part of a String
.
let s = String::from("hello world");
let hello = &s[0..5];
let world = &s[6..11];
We create slices using a range within brackets by specifying [starting_index..ending_index]
, where starting_index
is the first position in the slice and ending_index
is one more than the last position in the slice. Internally, the slice data structure stores the starting position and the length of the slice, which corresponds to ending_index
minus starting_index
.
s
name | value |
---|---|
ptr | (s on the heap) at index 0 |
len | 11 |
capacity | 11 |
world
name | value |
---|---|
ptr | (s on the heap) at index 6 |
len | 5 |
s
on the heap
index | value |
---|---|
0 | h |
1 | e |
2 | l |
3 | l |
4 | o |
5 | |
6 | w |
7 | o |
8 | r |
9 | l |
10 | d |
With Rust’s ..
range syntax, if you want to start at index 0, you can drop the value before the two periods. In other words, these are equal:
let s = String::from("hello");
let slice = &s[0..2];
let slice = &s[..2];
By the same token, if your slice includes the last byte of the String
, you can drop the trailing number. That means these are equal:
let s = String::from("hello");
let len = s.len();
let slice = &s[3..len];
let slice = &s[3..];
You can also drop both values to take a slice of the entire string. So these are equal:
let s = String::from("hello");
let len = s.len();
let slice = &s[0..len];
let slice = &s[..];
Note: String slice range indices must occur at valid UTF-8 character boundaries. If you attempt to create a string slice in the middle of a multibyte character, your program will exit with an error.
With all this information in mind, let’s rewrite first_word
to return a slice. The type that signifies “string slice” is written as &str
:
Filename: src/main.rs
fn first_word(s: &String) -> &str {
let bytes = s.as_bytes();
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return &s[0..i];
}
}
&s[..]
}
Remember the bug in the program in the previous listing, when we got the index to the end of the first word but then cleared the string so our index was invalid? That code was logically incorrect but didn’t show any immediate errors. Using the slice version of first_word will throw a compile-time error:
Filename: src/main.rs
fn main() {
let mut s = String::from("hello world");
let word = first_word(&s); // immutable borrow occurs here
s.clear(); // error! mutable borrow occurs here
println!("the first word is: {}", word); // immutable borrow later used here
}
Recall from the borrowing rules that if we have an immutable reference to something, we cannot also take a mutable reference. Because clear
needs to truncate the String
, it needs to get a mutable reference.
String Literals as Slices
String literals being stored inside the binary.
let s = "Hello, world!";
The type of s
here is &str
: it’s a slice pointing to that specific point of the binary. This is also why string literals are immutable; &str
is an immutable reference.
String Slices as Parameters
Using &str
instead of &String
as parameters allows us to use the same function on both &String
values and &str
values. This flexibility takes advantage of deref coercions.
fn first_word(s: &String) -> &str {
fn first_word(s: &str) -> &str {
Defining a function to take a string slice instead of a reference to a String
makes our API more general and useful without losing any functionality:
Filename: src/main.rs
fn main() {
let my_string = String::from("hello world");
// `first_word` works on slices of `String`s, whether partial or whole
let word = first_word(&my_string[0..6]);
let word = first_word(&my_string[..]);
// `first_word` also works on references to `String`s, which are equivalent to whole slices of `String`s
let word = first_word(&my_string);
let my_string_literal = "hello world";
// `first_word` works on slices of string literals, whether partial or whole
let word = first_word(&my_string_literal[0..6]);
let word = first_word(&my_string_literal[..]);
// Because string literals *are* string slices already, this works too, without the slice syntax!
let word = first_word(my_string_literal);
}
Other Slices
let a = [1, 2, 3, 4, 5];
let slice = &a[1..3];
assert_eq!(slice, &[2, 3]);
This slice has the type &[i32]
. It works the same way as string slices do, by storing a reference to the first element and a length. You’ll use this kind of slice for all sorts of other collections.
Using Structs to Structure Related Data
A struct, or structure, is a custom data type that lets you package together and name multiple related values that make up a meaningful group.
Defining and Instantiating Struct
For example, the following listing shows a struct that stores information about a user account:
Filename: src/main.rs
struct User {
active: bool,
username: String,
email: String,
sign_in_count: u64,
}
For example, we can declare a particular user as shown in the following listing:
Filename: src/main.rs
fn main() {
let user1 = User {
active: true,
username: String::from("someusername123"),
email: String::from("[email protected]"),
sign_in_count: 1,
};
}
To get a specific value from a struct, we use dot notation. If the instance is mutable, we can change a value by using the dot notation and assigning into a particular field.
Filename: src/main.rs
fn main() {
let mut user1 = User {
active: true,
username: String::from("someusername123"),
email: String::from("[email protected]"),
sign_in_count: 1,
};
user1.email = String::from("[email protected]");
}
Note that the entire instance must be mutable; Rust doesn’t allow us to mark only certain fields as mutable.
As with any expression, we can construct a new instance of the struct as the last expression in the function body to implicitly return that new instance.
Filename: src/main.rs
fn build_user(email: String, username: String) -> User {
User {
active: true,
username: username,
email: email,
sign_in_count: 1,
}
}
Using the Field Init Shorthand
Because the parameter names and the struct field names are exactly the same in previous listing, we can use the field init shorthand syntax to rewrite build_user
so it behaves exactly the same but doesn’t have the repetition of username
and email
, as shown in the following listing:
Filename: src/main.rs
fn build_user(email: String, username: String) -> User {
User {
active: true,
username,
email,
sign_in_count: 1,
}
}
Creating Instances from Other Instances with Struct Update Syntax
The syntax ..
specifies that the remaining fields not explicitly set should have the same value as the fields in the given instance.
Filename: src/main.rs
fn main() {
// --snip--
let user2 = User {
email: String::from("[email protected]"),
..user1
};
}
The syntax ..user1
must come last to specify that any remaining fields should get their values from the corresponding fields in user1
, but we can choose to specify values for as many fields as we want in any order, regardless of the order of the fields in the struct’s definition.
Note that the struct update syntax uses =
like an assignment; this is because it moves the data. In this example, we can no longer use user1
as a whole after creating user2
because the String
in the username
field of user1
was moved into user2
.
If we had given user2
new String
values for both email
and username
, and thus only used the active
and sign_in_count
values from user1
, then user1
would still be valid after creating user2
. Both active
and sign_in_count
are types that implement the Copy
trait, so types that implement the Copy
trait (stack-only data), wouldn’t be moved and still be valid after creating new instance by struct update syntax.
Using Tuple Structs Without Named Fields to Create Different Types
Rust also supports structs that look similar to tuples, called tuple structs. Tuple structs have the added meaning the struct name provides but don’t have names associated with their fields; rather, they just have the types of the fields.
Tuple structs are useful when you want to give the whole tuple a name and make the tuple a different type from other tuples, and when naming each field as in a regular struct would be verbose or redundant.
For example, here we define and use two tuple structs named Color
and Point
:
Filename: src/main.rs
struct Color(i32, i32, i32);
struct Point(i32, i32, i32);
fn main() {
let black = Color(0, 0, 0);
let origin = Point(0, 0, 0);
}
Note that the black
and origin
values are different types because they’re instances of different tuple structs. Each struct you define is its own type, even though the fields within the struct might have the same types.
Otherwise, tuple struct instances are similar to tuples in that you can destructure them into their individual pieces, and you can use a .
followed by the index to access an individual value.
Unit-Like Structs Without Any Fields
You can also define structs that don’t have any fields! These are called unit-like structs because they behave similarly to ()
. Unit-like structs can be useful when you need to implement a trait on some type but don’t have any data that you want to store in the type itself.
Here’s an example of declaring and instantiating a unit struct named AlwaysEqual
:
Filename: src/main.rs
struct AlwaysEqual;
fn main() {
let subject = AlwaysEqual;
}
Ownership of Struct Data
It’s also possible for structs to store references to data owned by something else, but to do so requires the use of lifetimes. Lifetimes ensure that the data referenced by a struct is valid for as long as the struct is. Let’s say you try to store a reference in a struct without specifying lifetimes, like the following; this won’t work:
Filename: src/main.rs
struct User { active: bool, username: &str, email: &str, sign_in_count: u64, } fn main() { let user1 = User { active: true, username: "someusername123", email: "[email protected]", sign_in_count: 1, }; } // error[E0106]: missing lifetime specifier
An Example Program Using Structs
To understand when we might want to use structs, let’s write a program that calculates the area of a rectangle.
Refactoring with Structs: Adding More Meaning
Filename: src/main.rs
struct Rectangle {
width: u32,
height: u32,
}
fn main() {
let rect1 = Rectangle {
width: 30,
height: 50,
};
println!(
"The area of the rectangle is {} square pixels.",
area(&rect1)
);
}
fn area(rectangle: &Rectangle) -> u32 {
rectangle.width * rectangle.height
}
Adding Useful Functionality with Derived Traits
It’d be useful to be able to print an instance of Rectangle
while we’re debugging our program and see the values for all its fields. The following listing tries using the println!
macro. This won’t work, however:
Filename: src/main.rs
struct Rectangle {
width: u32,
height: u32,
}
fn main() {
let rect1 = Rectangle {
width: 30,
height: 50,
};
println!("rect1 is {}", rect1);
}
// error[E0277]: `Rectangle` doesn't implement `std::fmt::Display`
The println!
macro can do many kinds of formatting, and by default, the curly brackets tell println!
to use formatting known as Display
: output intended for direct end user consumption.
The primitive types we’ve seen so far implement Display
by default because there’s only one way you’d want to show a 1
or any other primitive type to a user. But with structs, the way println!
should format the output is less clear because there are more display possibilities: Do you want commas or not? Do you want to print the curly brackets? Should all the fields be shown? Due to this ambiguity, Rust doesn’t try to guess what we want, and structs don’t have a provided implementation of Display
to use with println!
and the {}
placeholder.
Putting the specifier :?
inside the curly brackets tells println!
we want to use an output format called Debug
. The Debug
trait enables us to print our struct in a way that is useful for developers so we can see its value while we’re debugging our code.
Rust does include functionality to print out debugging information, but we have to explicitly opt in to make that functionality available for our struct. To do that, we add the outer attribute #[derive(Debug)]
just before the struct definition as shown in the following list.
Filename: src/main.rs
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
fn main() {
let rect1 = Rectangle {
width: 30,
height: 50,
};
println!("rect1 is {:?}", rect1);
}
When we have larger structs, it’s useful to have output that’s a bit easier to read; in those cases, we can use {:#?}
instead of {:?}
in the println!
string.
Using the {:?}
style will output the following:
rect1 is Rectangle { width: 30, height: 50 }
Using the {:#?}
style will output the following:
rect1 is Rectangle {
width: 30,
height: 50,
}
Another way to print out a value using the Debug
format is to use the dbg!
macro, which takes ownership of an expression (as opposed to println!
, which takes a reference), prints the file and line number of where that dbg!
macro call occurs in your code along with the resultant value of that expression, and returns ownership of the value.
Note: Calling the
dbg!
macro prints to the standard error console stream (stderr
), as opposed toprintln!
, which prints to the standard output console stream (stdout
).
Here’s an example where we’re interested in the value that gets assigned to the width
field, as well as the value of the whole struct in rect1
:
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
fn main() {
let scale = 2;
let rect1 = Rectangle {
width: dbg!(30 * scale),
height: 50,
};
dbg!(&rect1);
}
We can put dbg!
around the expression 30 * scale
and, because dbg!
returns ownership of the expression’s value, the width
field will get the same value as if we didn’t have the dbg!
call there. We don’t want dbg!
to take ownership of rect1
, so we use a reference to rect1
in the next call. Here’s what the output of this example looks like:
[src/main.rs:10] 30 * scale = 60
[src/main.rs:14] &rect1 = Rectangle {
width: 60,
height: 50,
}
Method Syntax
Unlike functions, methods are defined within the context of a struct (or an enum or a trait object), and their first parameter is always self
, which represents the instance of the struct the method is being called on.
Defining Methods
Let’s change the area
function that has a Rectangle
instance as a parameter and instead make an area
method defined on the Rectangle
struct, as shown in the following listing.
Filename: src/main.rs
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
impl Rectangle {
fn area(&self) -> u32 {
self.width * self.height
}
}
fn main() {
let rect1 = Rectangle {
width: 30,
height: 50,
};
println!(
"The area of the rectangle is {} square pixels.",
rect1.area()
);
}
The &self
is actually short for self: &Self
. Within an impl
block, the type Self
is an alias for the type that the impl
block is for.
Methods must have a parameter named self
of type Self
for their first parameter, so Rust lets you abbreviate this with only the name self
in the first parameter spot.
Note that we still need to use the &
in front of the self
shorthand to indicate that this method borrows the Self
instance, just as we did in rectangle: &Rectangle
: we don’t want to take ownership, and we just want to read the data in the struct, not write to it.
Methods can take ownership of self
, borrow self
immutably, as we’ve done here, or borrow self
mutably, just as they can any other parameter.
If we wanted to change the instance that we’ve called the method on as part of what the method does, we’d use &mut self
as the first parameter.
Having a method that takes ownership of the instance by using just self
as the first parameter is rare; this technique is usually used when the method transforms self
into something else and you want to prevent the caller from using the original instance after the transformation.
The main reason for using methods instead of functions, in addition to providing method syntax and not having to repeat the type of self
in every method’s signature, is for organization.
Note that we can choose to give a method the same name as one of the struct’s fields. For example, we can define a method on Rectangle
that is also named width
:
Filename: src/main.rs
impl Rectangle {
fn width(&self) -> bool {
self.width > 0
}
}
fn main() {
let rect1 = Rectangle {
width: 30,
height: 50,
};
if rect1.width() {
println!("The rectangle has a nonzero width; it is {}", rect1.width);
}
}
In main, when we follow rect1.width
with parentheses, Rust knows we mean the method width
. When we don’t use parentheses, Rust knows we mean the field width
.
Often, but not always, when we give a method the same name as a field we want it to only return the value in the field and do nothing else. Methods like this are called getters, and Rust does not implement them automatically for struct fields as some other languages do. Getters are useful because you can make the field private but the method public, and thus enable read-only access to that field as part of the type’s public API.
Methods with More Parameters
Let’s practice using methods by implementing a second method on the Rectangle
struct.
Filename: src/main.rs
// --snip--
impl Rectangle {
fn area(&self) -> u32 {
self.width * self.height
}
fn can_hold(&self, other: &Rectangle) -> bool {
self.width > other.width && self.height > other.height
}
}
fn main() {
let rect1 = Rectangle {
width: 30,
height: 50,
};
let rect2 = Rectangle {
width: 10,
height: 40,
};
let rect3 = Rectangle {
width: 60,
height: 45,
};
println!("Can rect1 hold rect2? {}", rect1.can_hold(&rect2));
println!("Can rect1 hold rect3? {}", rect1.can_hold(&rect3));
// The output
// Can rect1 hold rect2? true
// Can rect1 hold rect3? false
}
Methods can take multiple parameters that we add to the signature after the self
parameter, and those parameters work just like parameters in functions.
Associated Functions
All functions defined within an impl
block are called associated functions because they’re associated with the type named after the impl
.
We can define associated functions that don’t have self
as their first parameter (and thus are not methods) because they don’t need an instance of the type to work with.
Associated functions that aren’t methods are often used for constructors that will return a new instance of the struct. These are often called new
, but new
isn’t a special name and isn’t built into the language.
For example, we could choose to provide an associated function named square
that would have one dimension parameter and use that as both width
and height
, thus making it easier to create a square Rectangle
rather than having to specify the same value twice:
Filename: src/main.rs
impl Rectangle {
fn square(size: u32) -> Self {
Self {
width: size,
height: size,
}
}
}
let sq = Rectangle::square(3);
The Self
keywords in the return type and in the body of the function are aliases for the type that appears after the impl
keyword, which in this case is Rectangle
.
To call this associated function, we use the ::
syntax with the struct name. Associated function is namespaced by the struct: the::
syntax is used for both associated functions and namespaces created by modules.
Multiple impl
Blocks
Each struct is allowed to have multiple impl
blocks. For example, the following listing is equivalent to the code shown in the previous listing, which has each method in its own impl
block.
impl Rectangle {
fn area(&self) -> u32 {
self.width * self.height
}
}
impl Rectangle {
fn can_hold(&self, other: &Rectangle) -> bool {
self.width > other.width && self.height > other.height
}
}
Enums and Pattern Matching
Defining an Enum
Any IP address can be either a version four or a version six address, but not both at the same time. That property of IP addresses makes the enum data structure appropriate because an enum value can only be one of its variants.
We can express this concept in code by defining an IpAddrKind
enumeration and listing the possible kinds an IP address can be, V4
and V6
. These are the variants of the enum:
enum IpAddrKind {
V4,
V6,
}
Enum Values
We can create instances of each of the two variants of IpAddrKind
like this:
let four = IpAddrKind::V4;
let six = IpAddrKind::V6;
Note that the variants of the enum are namespaced under its identifier, and we use a double colon to separate the two. This is useful because now both values IpAddrKind::V4
and IpAddrKind::V6
are of the same type: IpAddrKind
. We can then, for instance, define a function that takes any IpAddrKind
:
fn route(ip_kind: IpAddrKind) {}
And we can call this function with either variant:
route(IpAddrKind::V4);
route(IpAddrKind::V6);
Rather than an enum inside a struct, we can put data directly into each enum variant. This new definition of the IpAddr enum says that both V4 and V6 variants will have associated String values:
enum IpAddr {
V4(String),
V6(String),
}
let home = IpAddr::V4(String::from("127.0.0.1"));
let loopback = IpAddr::V6(String::from("::1"));
The name of each enum variant that we define also becomes a function that constructs an instance of the enum.
There’s another advantage to using an enum rather than a struct: each variant can have different types and amounts of associated data.
enum IpAddr {
V4(u8, u8, u8, u8),
V6(String),
}
let home = IpAddr::V4(127, 0, 0, 1);
let loopback = IpAddr::V6(String::from("::1"));
Let’s look at how the standard library defines IpAddr
: it has the exact enum and variants that we’ve defined and used, but it embeds the address data inside the variants in the form of two different structs, which are defined differently for each variant:
struct Ipv4Addr {
// --snip--
}
struct Ipv6Addr {
// --snip--
}
enum IpAddr {
V4(Ipv4Addr),
V6(Ipv6Addr),
}
You can put any kind of data inside an enum variant: strings, numeric types, or structs, for example. You can even include another enum.
Let’s look at another example of an enum in the following listing: this one has a wide variety of types embedded in its variants.
enum Message {
Quit,
Move { x: i32, y: i32 },
Write(String),
ChangeColor(i32, i32, i32),
}
Defining an enum with variants is similar to defining different kinds of struct definitions, except the enum doesn’t use the struct
keyword and all the variants are grouped together under the Message
type. The following structs could hold the same data that the preceding enum variants hold:
struct QuitMessage; // unit struct
struct MoveMessage {
x: i32,
y: i32,
}
struct WriteMessage(String); // tuple struct
struct ChangeColorMessage(i32, i32, i32); // tuple struct
But if we used the different structs, each of which has its own type, we couldn’t as easily define a function to take any of these kinds of messages as we could with the Message
enum defined in previous listing, which is a single type.
There is one more similarity between enums and structs: just as we’re able to define methods on structs using impl
, we’re also able to define methods on enums. Here’s a method named call that we could define on our Message
enum:
impl Message {
fn call(&self) {
// method body would be defined here
}
}
let m = Message::Write(String::from("hello"));
m.call();
The Option Enum and Its Advantages Over Null Values
Rust does not have nulls, but it does have an enum that can encode the concept of a value being present or absent. This enum is Option<T>
, and it is defined by the standard library as follows:
enum Option<T> {
None,
Some(T),
}
The Option<T>
enum is so useful that it’s even included in the prelude; you don’t need to bring it into scope explicitly. Its variants are also included in the prelude: you can use Some
and None
directly without the Option::
prefix. The Option<T>
enum is still just a regular enum, and Some(T)
and None
are still variants of type Option<T>
.
<T>
means that the Some
variant of the Option
enum can hold one piece of data of any type, and that each concrete type that gets used in place of T
makes the overall Option<T>
type a different type. Here are some examples of using Option
values to hold number types and string types:
let some_number = Some(5);
let some_char = Some('e');
let absent_number: Option<i32> = None;
Option<T>
and T
(where T
can be any type) are different types, the compiler won’t let us use an Option<T>
value as if it were definitely a valid value.
You have to convert an Option<T>
to a T
before you can perform T
operations with it. Generally, this helps catch one of the most common issues with null: assuming that something isn’t null when it actually is.
In general, in order to use an Option<T>
value, you want to have code that will handle each variant. The match
expression is a control flow construct that does just this when used with enums: it will run different code depending on which variant of the enum it has, and that code can use the data inside the matching value.
The Match Control Flow Construct
Rust has an extremely powerful control flow construct called match
that allows you to compare a value against a series of patterns and then execute code based on which pattern matches. Patterns can be made up of literal values, variable names, wildcards, and many other things.
We can write a function that takes an unknown US coin and, in a similar way as the counting machine, determines which coin it is and returns its value in cents, as shown in following listing:
enum Coin {
Penny,
Nickel,
Dime,
Quarter,
}
fn value_in_cents(coin: Coin) -> u8 {
match coin {
Coin::Penny => {
println!("Lucky penny!");
1
}
Coin::Nickel => 5,
Coin::Dime => 10,
Coin::Quarter => 25,
}
}
First we list the match
keyword followed by an expression, which in this case is the value coin
. This seems very similar to a conditional expression used with if
, but there’s a big difference: with if
, the condition needs to evaluate to a Boolean value, but here it can be any type.
Next are the match
arms. An arm has two parts: a pattern and some code.The first arm here has a pattern that is the value Coin::Penny
and then the =>
operator that separates the pattern and the code to run. The code in this case is just the value 1
. Each arm is separated from the next with a comma.
When the match
expression executes, it compares the resultant value against the pattern of each arm, in order. If a pattern matches the value, the code associated with that pattern is executed. If that pattern doesn’t match the value, execution continues to the next arm. We can have as many arms as we need.
The code associated with each arm is an expression, and the resultant value of the expression in the matching arm is the value that gets returned for the entire match
expression.
We don’t typically use curly brackets if the match arm code is short. If you want to run multiple lines of code in a match arm, you must use curly brackets, and the comma following the arm is then optional.
Patterns That Bind to Values
Another useful feature of match arms is that they can bind to the parts of the values that match the pattern. This is how we can extract values out of enum variants.
As an example, let’s change one of our enum variants to hold data inside it. From 1999 through 2008, the United States minted quarters with different designs for each of the 50 states on one side. No other coins got state designs, so only quarters have this extra value. We can add this information to our enum
by changing the Quarter
variant to include a UsState
value stored inside it, which we’ve done in the following listing.
#[derive(Debug)] // so we can inspect the state in a minute
enum UsState {
Alabama,
Alaska,
// --snip--
}
enum Coin {
Penny,
Nickel,
Dime,
Quarter(UsState),
}
In the match expression for this code, we add a variable called state
to the pattern that matches values of the variant Coin::Quarter
. When a Coin::Quarter
matches, the state
variable will bind to the value of that quarter’s state. Then we can use state
in the code for that arm, like so:
fn value_in_cents(coin: Coin) -> u8 {
match coin {
Coin::Penny => 1,
Coin::Nickel => 5,
Coin::Dime => 10,
Coin::Quarter(state) => {
println!("State quarter from {:?}!", state);
25
}
}
}
Matching with Option<T>
Let’s say we want to write a function that takes an Option<i32>
and, if there’s a value inside, adds 1 to that value. If there isn’t a value inside, the function should return the None
value and not attempt to perform any operations.
This function is very easy to write, thanks to match
, and will look like the following listing.
fn plus_one(x: Option<i32>) -> Option<i32> {
match x {
None => None,
Some(i) => Some(i + 1),
}
}
let five = Some(5);
let six = plus_one(five);
let none = plus_one(None);
Combining match
and enums is useful in many situations. You’ll see this pattern a lot in Rust code: match
against an enum, bind a variable to the data inside, and then execute code based on it.
Matches Are Exhaustive
There’s one other aspect of match
we need to discuss: the arms’ patterns must cover all possibilities. Consider this version of our plus_one
function, which has a bug and won’t compile:
fn plus_one(x: Option<i32>) -> Option<i32> {
match x {
Some(i) => Some(i + 1),
}
}
Rust knows that we didn’t cover every possible case, and even knows which pattern we forgot! Matches in Rust are exhaustive: we must exhaust every last possibility in order for the code to be valid. Especially in the case of Option<T>
, when Rust prevents us from forgetting to explicitly handle the None
case, it protects us from assuming that we have a value when we might have null.
Catch-all Patterns and the _ Placeholder
Using enums, we can also take special actions for a few particular values, but for all other values take one default action.
Imagine we’re implementing a game where, if you roll a 3 on a dice roll, your player doesn’t move, but instead gets a new fancy hat. If you roll a 7, your player loses a fancy hat. For all other values, your player moves that number of spaces on the game board. Here’s a match that implements that logic:
let dice_roll = 9;
match dice_roll {
3 => add_fancy_hat(),
7 => remove_fancy_hat(),
other => move_player(other),
}
fn add_fancy_hat() {}
fn remove_fancy_hat() {}
fn move_player(num_spaces: u8) {}
This code compiles, even though we haven’t listed all the possible values a u8
can have, because the last pattern will match all values not specifically listed. This catch-all pattern meets the requirement that match
must be exhaustive.
Note that we have to put the catch-all arm last because the patterns are evaluated in order. If we put the catch-all arm earlier, the other arms would never run, so Rust will warn us if we add arms after a catch-all.
Rust also has a pattern we can use when we want a catch-all but don’t want to use the value in the catch-all pattern: _
is a special pattern that matches any value and does not bind to that value. This tells Rust we aren’t going to use the value, so Rust won’t warn us about an unused variable.
Let’s change the rules of the game: now, if you roll anything other than a 3 or a 7, you must roll again. We no longer need to use the catch-all value, so we can change our code to use _
instead of the variable named other
:
let dice_roll = 9;
match dice_roll {
3 => add_fancy_hat(),
7 => remove_fancy_hat(),
_ => reroll(),
}
fn add_fancy_hat() {}
fn remove_fancy_hat() {}
fn reroll() {}
Finally, we’ll change the rules of the game one more time so that nothing else happens on your turn if you roll anything other than a 3 or a 7. We can express that by using the unit value (the empty tuple type) as the code that goes with the _
arm:
let dice_roll = 9;
match dice_roll {
3 => add_fancy_hat(),
7 => remove_fancy_hat(),
_ => (),
}
fn add_fancy_hat() {}
fn remove_fancy_hat() {}
Concise Control Flow with if let
The if let
syntax lets you combine if
and let
into a less verbose way to handle values that match one pattern while ignoring the rest.
Consider the program in the following listing that matches on an Option<u8>
value in the config_max
variable but only wants to execute code if the value is the Some
variant.
let config_max = Some(3u8);
match config_max {
Some(max) => println!("The maximum is configured to be {}", max),
_ => (),
}
We don’t want to do anything with the None
value. To satisfy the match
expression, we have to add _ => ()
after processing just one variant, which is annoying boilerplate code to add.
Instead, we could write this in a shorter way using if let
. The following code behaves the same as the match
in the following listing:
let config_max = Some(3u8);
if let Some(max) = config_max {
println!("The maximum is configured to be {}", max);
}
Using if let
means less typing, less indentation, and less boilerplate code. However, you lose the exhaustive checking that match
enforces.
In other words, you can think of if let
as syntax sugar for a match
that runs code when the value matches one pattern and then ignores all other values.
We can include an else
with an if let
. The block of code that goes with the else
is the same as the block of code that would go with the _
case in the match
expression that is equivalent to the if let
and else
.
Recall the Coin
enum definition in the previous listing, where the Quarter
variant also held a UsState
value. If we wanted to count all non-quarter coins we see while also announcing the state of the quarters, we could do that with a match
expression, like this:
let mut count = 0;
match coin {
Coin::Quarter(state) => println!("State quarter from {:?}!", state),
_ => count += 1,
}
Or we could use an if let
and else
expression, like this:
let mut count = 0;
if let Coin::Quarter(state) = coin {
println!("State quarter from {:?}!", state);
} else {
count += 1;
}
Managing Growing Projects with Packages, Crates, and Modules
A package can contain multiple binary crates and optionally one library crate.
You can create scopes and change which names are in or out of scope. You can’t have two items with the same name in the same scope; tools are available to resolve name conflicts.
The Rust module system, include:
Packages: A Cargo feature that lets you build, test, and share crates.
Crates: A tree of modules that produces a library or executable.
Modules and use: Let you control the organization, scope, and privacy of paths.
Paths: A way of naming an item, such as a struct, function, or module.
Packages and Crates
A crate is the smallest amount of code that the Rust compiler considers at a time. Even if you run rustc
rather than cargo
and pass a single source code file, the compiler considers that file to be a crate. Crates can contain modules, and the modules may be defined in other files that get compiled with the crate.
A crate can come in one of two forms: a binary crate or a library crate.
Binary crates are programs you can compile to an executable that you can run, such as a command-line program or a server. Each must have a function called
main
that defines what happens when the executable runs.Library crates don’t have a
main
function, and they don’t compile to an executable. Instead, they define functionality intended to be shared with multiple projects.Most of the time when Rustaceans say “crate”, they mean library crate, and they use “crate” interchangeably with the general programming concept of a “library”.
The crate root is a source file that the Rust compiler starts from and makes up the root module of your crate.
A package is a bundle of one or more crates that provides a set of functionality. A package contains a Cargo.toml file that describes how to build those crates.
A package can contain as many binary crates as you like, but at most only one library crate. A package must contain at least one crate, whether that’s a library or binary crate.
Cargo follows a convention that src/main.rs is the crate root of a binary crate with the same name as the package. Likewise, Cargo knows that if the package directory contains src/lib.rs, the package contains a library crate with the same name as the package, and src/lib.rs is its crate root. Cargo passes the crate root files to rustc
to build the library or binary.
A package that only contains src/main.rs, meaning it only contains a binary crate which has the same name as the package. If a package contains src/main.rs and src/lib.rs, it has two crates: a binary and a library, both with the same name as the package. A package can have multiple binary crates by placing files in the src/bin directory: each file will be a separate binary crate.
Defining Modules to Control Scope and Privacy
Namely paths that allow you to name items; the use
keyword that brings a path into scope; and the pub
keyword to make items public.
Modules Cheat Sheet
Start from the crate root: When compiling a crate, the compiler first looks in the crate root file (usually src/lib.rs for a library crate or src/main.rs for a binary crate) for code to compile.
Declaring modules: In the crate root file, you can declare new modules; say, you declare a “garden” module with
mod garden
;. The compiler will look for the module’s code in these places:Inline, within curly brackets that replace the semicolon following
mod garden
In the file src/garden.rs
In the file src/garden/mod.rs
Declaring submodules: In any file other than the crate root, you can declare submodules. For example, you might declare
mod vegetables
; in src/garden.rs. The compiler will look for the submodule’s code within the directory named for the parent module in these places:Inline, directly following
mod vegetables
, within curly brackets instead of the semicolonIn the file src/garden/vegetables.rs
In the file src/garden/vegetables/mod.rs
Paths to code in modules: Once a module is part of your crate, you can refer to code in that module from anywhere else in that same crate, as long as the privacy rules allow, using the path to the code. For example, an
Asparagus
type in the garden vegetables module would be found atcrate::garden::vegetables::Asparagus
.Private vs public: Code within a module is private from its parent modules by default. To make a module public, declare it with
pub mod
instead ofmod
. To make items within a public module public as well, use pub before their declarations.The
use
keyword: Within a scope, theuse
keyword creates shortcuts to items to reduce repetition of long paths. In any scope that can refer tocrate::garden::vegetables::Asparagus
, you can create a shortcut withuse crate::garden::vegetables::Asparagus
; and from then on you only need to writeAsparagus
to make use of that type in the scope.
Here we create a binary crate named backyard
that illustrates these rules. The crate’s directory, also named backyard
, contains these files and directories:
backyard
├── Cargo.lock
├── Cargo.toml
└── src
├── garden
│ └── vegetables.rs
├── garden.rs
└── main.rs
The crate root file in this case is src/main.rs, and it contains:
Filename: src/main.rs
use crate::garden::vegetables::Asparagus;
pub mod garden;
fn main() {
let plant = Asparagus {};
println!("I'm growing {:?}!", plant);
}
The pub mod garden
; line tells the compiler to include the code it finds in src/garden.rs, which is:
Filename: src/garden.rs
pub mod vegetables;
Here, pub mod vegetables
; means the code in src/garden/vegetables.rs is included too. That code is:
#[derive(Debug)]
pub struct Asparagus {}
Grouping Related Code in Modules
Modules also allow us to control the privacy of items, because code within a module is private by default. Private items are internal implementation details not available for outside use.
As an example, let’s write a library crate that provides the functionality of a restaurant. We’ll define the signatures of functions but leave their bodies empty to concentrate on the organization of the code, rather than the implementation of a restaurant.
In the restaurant industry, some parts of a restaurant are referred to as front of house and others as back of house. Front of house is where customers are; this encompasses where the hosts seat customers, servers take orders and payment, and bartenders make drinks. Back of house is where the chefs and cooks work in the kitchen, dishwashers clean up, and managers do administrative work.
To structure our crate in this way, we can organize its functions into nested modules. Create a new library named restaurant
by running cargo new restaurant --lib
; then enter the following code into src/lib.rs to define some modules and function signatures. Here’s the front of house section:
Filename: src/lib.rs
mod front_of_house {
mod hosting {
fn add_to_waitlist() {}
fn seat_at_table() {}
}
mod serving {
fn take_order() {}
fn serve_order() {}
fn take_payment() {}
}
}
We define a module with the mod
keyword followed by the name of the module (in this case, front_of_house
). The body of the module then goes inside curly brackets. Inside modules, we can place other modules, as in this case with the modules hosting
and serving
. Modules can also hold definitions for other items, such as structs, enums, constants, traits, and functions.
Earlier, we mentioned that src/main.rs and src/lib.rs are called crate roots. The reason for their name is that the contents of either of these two files form a module named crate at the root of the crate’s module structure, known as the module tree.
The following listing shows the module tree for the structure in preceding listing.
crate
└── front_of_house
├── hosting
│ ├── add_to_waitlist
│ └── seat_at_table
└── serving
├── take_order
├── serve_order
└── take_payment
Notice that the entire module tree is rooted under the implicit module named crate.
Paths for Referring to an Item in the Module Tree
A path can take two forms:
An absolute path is the full path starting from a crate root; for code from an external crate, the absolute path begins with the crate name, and for code from the current crate, it starts with the literal
crate
.A relative path starts from the current module and uses
self
,super
, or an identifier in the current module.
Both absolute and relative paths are followed by one or more identifiers separated by double colons (::
).
Filename: src/lib.rs
mod front_of_house {
pub mod hosting {
pub fn add_to_waitlist() {}
}
}
pub fn eat_at_restaurant() {
// Absolute path
crate::front_of_house::hosting::add_to_waitlist();
// Relative path
front_of_house::hosting::add_to_waitlist();
}
Using the crate
name to start from the crate root is like using /
to start from the filesystem root in your shell.
Starting with a module name means that the path is relative.
Choosing whether to use a relative or absolute path is a decision you’ll make based on your project, and depends on whether you’re more likely to move item definition code separately from or together with the code that uses the item. Our preference in general is to specify absolute paths because it’s more likely we’ll want to move code definitions and item calls independently of each other.
In Rust, all items (functions, methods, structs, enums, modules, and constants) are private to parent modules by default. If you want to make an item like a function or struct private, you put it in a module.
Items in a parent module can’t use the private items inside child modules, but items in child modules can use the items in their ancestor modules (by using the super keyword). This is because child modules wrap and hide their implementation details, but the child modules can see the context in which they’re defined.
Rust chose to have the module system function this way so that hiding inner implementation details is the default. However, Rust does give you the option to expose inner parts of child modules’ code to outer ancestor modules by using the pub
keyword to make an item public.
Exposing Paths with the pub Keyword
The pub
keyword on a module only lets code in its ancestor modules refer to it, not access its inner code. Because modules are containers, there’s not much we can do by only making the module public; we need to go further and choose to make one or more of the items within the module public as well.
Best Practices for Packages with a Binary and a Library
The module tree should be defined in src/lib.rs. Then, any public items can be used in the binary crate by starting paths with the name of the package. The binary crate becomes a user of the library crate just like a completely external crate would use the library crate: it can only use the public API.
Starting Relative Paths with super
We can construct relative paths that begin in the parent module, rather than the current module or the crate root, by using super
at the start of the path. This is like starting a filesystem path with the ..
syntax. Using super
allows us to reference an item that we know is in the parent module, which can make rearranging the module tree easier when the module is closely related to the parent, but the parent might be moved elsewhere in the module tree someday.
Filename: src/lib.rs
fn deliver_order() {}
mod back_of_house {
fn fix_incorrect_order() {
cook_order();
super::deliver_order();
}
fn cook_order() {}
}
The fix_incorrect_order
function is in the back_of_house
module, so we can use super
to go to the parent module of back_of_house
, which in this case is crate
, the root.
We think the back_of_house
module and the deliver_order
function are likely to stay in the same relationship to each other and get moved together should we decide to reorganize the crate’s module tree. Therefore, we used super
so we’ll have fewer places to update code in the future if this code gets moved to a different module.
Making Structs and Enums Public
We can also use pub
to designate structs and enums as public, but there are a few details extra to the usage of pub
with structs and enums.
If we use pub
before a struct definition, we make the struct public, but the struct’s fields will still be private. We can make each field public or not on a case-by-case basis.
Filename: src/lib.rs
mod back_of_house {
pub struct Breakfast {
pub toast: String,
seasonal_fruit: String,
}
impl Breakfast {
pub fn summer(toast: &str) -> Breakfast {
Breakfast {
toast: String::from(toast),
seasonal_fruit: String::from("peaches"),
}
}
}
}
pub fn eat_at_restaurant() {
// Order a breakfast in the summer with Rye toast
let mut meal = back_of_house::Breakfast::summer("Rye");
// Change our mind about what bread we'd like
meal.toast = String::from("Wheat");
println!("I'd like {} toast please", meal.toast);
// The next line won't compile if we uncomment it; we're not allowed to see or modify the seasonal fruit that comes with the meal
// meal.seasonal_fruit = String::from("blueberries");
}
Because the toast
field in the back_of_house::Breakfast
struct is public, in eat_at_restaurant
we can write and read to the toast
field using dot notation. Notice that we can’t use the seasonal_fruit
field in eat_at_restaurant
because seasonal_fruit
is private.
Also, note that because back_of_house::Breakfast
has a private field, the struct needs to provide a public associated function that constructs an instance of Breakfast
(we’ve named it summer
here). If Breakfast
didn’t have such a function, we couldn’t create an instance of Breakfast
in eat_at_restaurant
because we couldn’t set the value of the private seasonal_fruit
field in eat_at_restaurant
.
In contrast, if we make an enum public, all of its variants are then public. We only need the pub
before the enum
keyword.
Filename: src/lib.rs
mod back_of_house {
pub enum Appetizer {
Soup,
Salad,
}
}
pub fn eat_at_restaurant() {
let order1 = back_of_house::Appetizer::Soup;
let order2 = back_of_house::Appetizer::Salad;
}
Enums aren’t very useful unless their variants are public; it would be annoying to have to annotate all enum variants with pub
in every case, so the default for enum variants is to be public.
Structs are often useful without their fields being public, so struct fields follow the general rule of everything being private by default unless annotated with pub
.
Bringing Paths into Scope with the use Keyword
We can create a shortcut to a path with the use
keyword once, and then use the shorter name everywhere else in the scope.
Filename: src/lib.rs
mod front_of_house {
pub mod hosting {
pub fn add_to_waitlist() {}
}
}
use crate::front_of_house::hosting;
pub fn eat_at_restaurant() {
hosting::add_to_waitlist();
}
Adding use
and a path in a scope is similar to creating a symbolic link in the filesystem. Paths brought into scope with use
also check privacy, like any other paths.
Note that use
only creates the shortcut for the particular scope in which the use
occurs. The following listing moves the eat_at_restaurant
function into a new child module named customer
, which is then a different scope than the use
statement, so the function body won’t compile:
Filename: src/lib.rs
mod front_of_house {
pub mod hosting {
pub fn add_to_waitlist() {}
}
}
use crate::front_of_house::hosting;
mod customer {
pub fn eat_at_restaurant() {
hosting::add_to_waitlist();
}
}
To fix this problem, move the use
within the customer
module too, or reference the shortcut in the parent module with super::hosting
within the child customer
module.
Creating Idiomatic use Paths
Filename: src/lib.rs
mod front_of_house {
pub mod hosting {
pub fn add_to_waitlist() {}
}
}
use crate::front_of_house::hosting::add_to_waitlist;
pub fn eat_at_restaurant() {
add_to_waitlist();
}
Although this listing accomplish the same task, the preceding one is the idiomatic way to bring a function into scope with use
.
Bringing the function’s parent module into scope with use
means we have to specify the parent module when calling the function. Specifying the parent module when calling the function makes it clear that the function isn’t locally defined while still minimizing repetition of the full path.
On the other hand, when bringing in structs, enums, and other items with use
, it’s idiomatic to specify the full path.
use std::collections::HashMap;
fn main() {
let mut map = HashMap::new();
map.insert(1, 2);
}
There’s no strong reason behind this idiom: it’s just the convention that has emerged, and folks have gotten used to reading and writing Rust code this way.
The exception to this idiom is if we’re bringing two items with the same name into scope with use
statements, because Rust doesn’t allow that.
Filename: src/lib.rs
use std::fmt;
use std::io;
fn function1() -> fmt::Result {
// --snip--
}
fn function2() -> io::Result<()> {
// --snip--
}
As you can see, using the parent modules distinguishes the two Result
types. If instead we specified use std::fmt::Result
and use std::io::Result
, we’d have two Result
types in the same scope and Rust wouldn’t know which one we meant when we used Result
.
Providing New Names with the as Keyword
There’s another solution to the problem of bringing two types of the same name into the same scope with use
: after the path, we can specify as
and a new local name, or alias, for the type.
Filename: src/lib.rs
use std::fmt::Result;
use std::io::Result as IoResult;
fn function1() -> Result {
// --snip--
}
fn function2() -> IoResult<()> {
// --snip--
}
Re-exporting Names with pub use
When we bring a name into scope with the use
keyword, the name available in the new scope is private. To enable the code that calls our code to refer to that name as if it had been defined in that code’s scope, we can combine pub
and use
. This technique is called re-exporting because we’re bringing an item into scope but also making that item available for others to bring into their scope.
Filename: src/lib.rs
mod front_of_house {
pub mod hosting {
pub fn add_to_waitlist() {}
}
}
pub use crate::front_of_house::hosting;
pub fn eat_at_restaurant() {
hosting::add_to_waitlist();
}
Before this change, external code would have to call the add_to_waitlist
function by using the path restaurant::front_of_house::hosting::add_to_waitlist()
. Now that this pub use
has re-exported the hosting
module from the root module, external code can now use the path restaurant::hosting::add_to_waitlist()
instead.
Re-exporting is useful when the internal structure of your code is different from how programmers calling your code would think about the domain.
With pub use, we can write our code with one structure but expose a different structure. Doing so makes our library well organized for programmers working on the library and programmers calling the library.
Using External Packages
To use the external package rand
in our project, we added this line to Cargo.toml:
Filename: Cargo.toml
rand = "0.8.5"
Adding rand
as a dependency in Cargo.toml tells Cargo to download the rand
package and any dependencies from crates.io and make rand
available to our project.
Then, to bring rand
definitions into the scope of our package, we added a use line starting with the name of the crate, rand
, and listed the items we wanted to bring into scope. We brought the Rng
trait into scope and called the rand::thread_rng
function:
use rand::Rng;
fn main() {
let secret_number = rand::thread_rng().gen_range(1..=100);
}
Members of the Rust community have made many packages available at crates.io, and pulling any of them into your package involves these same steps: listing them in your package’s Cargo.toml file and using use
to bring items from their crates into scope.
Note that the standard std
library is also a crate that’s external to our package. Because the standard library is shipped with the Rust language, we don’t need to change Cargo.toml to include std
. But we do need to refer to it with use
to bring items from there into our package’s scope.
use std::collections::HashMap;
Using Nested Paths to Clean Up Large use Lists
If we’re using multiple items defined in the same crate or same module, listing each item on its own line can take up a lot of vertical space in our files. Instead, we can use nested paths to bring the same items into scope in one line. We do this by specifying the common part of the path, followed by two colons, and then curly brackets around a list of the parts of the paths that differ.
Filename: src/main.rs
use std::cmp::Ordering;
use std::io;
// Nested
use std::{cmp::Ordering, io};
To merge a complete path and its sub path into one use
statement, we can use self
in the nested path.
Filename: src/lib.rs
use std::io;
use std::io::Write;
// Nested
use std::io::{self, Write};
// This line brings std::io and std::io::Write into scope.
The Glob Operator
If we want to bring all public items defined in a path into scope, we can specify that path followed by the *
glob operator.
use std::collections::*;
This use
statement brings all public items defined in std::collections
into the current scope.
Be careful when using the glob operator! Glob can make it harder to tell what names are in scope and where a name used in your program was defined.
The glob operator is often used when testing to bring everything under test into the tests module. The glob operator is also sometimes used as part of the prelude pattern.
Separating Modules into Different Files
For example, let’s start from the code in previous listing that had multiple restaurant modules. We’ll extract modules into files instead of having all the modules defined in the crate root file.
First, we’ll extract front_of_house
module to its own file. Remove the code inside the curly brackets for the front_of_house
module, leaving only the mod front_of_house
; declaration, so that src/lib.rs contains the code shown in following list.
Filename: src/lib.rs
mod front_of_house;
pub use crate::front_of_house::hosting;
pub fn eat_at_restaurant() {
hosting::add_to_waitlist();
}
Next, place the code that was in the curly brackets into a new file named src/front_of_house.rs, as shown in the following listing. The compiler knows to look in this file because it came across the module declaration in the crate root with the name front_of_house
.
Filename: src/front_of_house.rs
pub mod hosting {
pub fn add_to_waitlist() {}
}
Note that you only need to load a file using a mod
declaration once in your module tree. Once the compiler knows the file is part of the project (and knows where in the module tree the code resides because of where you’ve put the mod statement), other files in your project should refer to the loaded file’s code using a path to where it was declared. In other words, mod
is not an “include” operation that you may have seen in other programming languages.
Next, we’ll extract the hosting
module to its own file. The process is a bit different because hosting
is a child module of front_of_house
, not of the root module. We’ll place the file for hosting
in a new directory that will be named for its ancestors in the module tree, in this case src/front_of_house/.
To start moving hosting
, we change src/front_of_house.rs to contain only the declaration of the hosting
module:
Filename: src/front_of_house.rs
pub mod hosting;
Then we create a src/front_of_house directory and a file hosting.rs to contain the definitions made in the hosting
module:
Filename: src/front_of_house/hosting.rs
pub fn add_to_waitlist() {}
The compiler’s rules for which files to check for which modules’ code means the directories and files more closely match the module tree.
Alternate File Paths
So far we’ve covered the most idiomatic file paths the Rust compiler uses, but Rust also supports an older style of file path. For a module named
front_of_house
declared in the crate root, the compiler will look for the module’s code in:
src/front_of_house.rs (what we covered)
src/front_of_house/mod.rs (older style, still supported path)
For a module named
hosting
that is a submodule offront_of_house
, the compiler will look for the module’s code in:
src/front_of_house/hosting.rs (what we covered)
src/front_of_house/hosting/mod.rs (older style, still supported path)
If you use both styles for the same module, you’ll get a compiler error. Using a mix of both styles for different modules in the same project is allowed, but might be confusing for people navigating your project.
The main downside to the style that uses files named mod.rs is that your project can end up with many files named mod.rs, which can get confusing when you have them open in your editor at the same time.
The mod keyword declares modules, and Rust looks in a file with the same name as the module for the code that goes into that module.
Common Collections
Most other data types represent one specific value, but collections can contain multiple values. Unlike the built-in array and tuple types, the data these collections point to is stored on the heap, which means the amount of data does not need to be known at compile time and can grow or shrink as the program runs.
In this chapter, we’ll discuss three collections that are used very often in Rust programs:
A vector allows you to store a variable number of values next to each other.
A string is a collection of characters. We’ve mentioned the String type previously, but in this chapter we’ll talk about it in depth.
A hash map allows you to associate a value with a particular key. It’s a particular implementation of the more general data structure called a map.
Storing Lists of Values with Vectors
The first collection type we’ll look at is Vec<T>
, also known as a vector. Vectors allow you to store more than one value in a single data structure that puts all the values next to each other in memory. Vectors can only store values of the same type.
Creating a New Vector
To create a new empty vector, we call the Vec::new
function.
let v: Vec<i32> = Vec::new();
Note that we added a type annotation here. Because we aren’t inserting any values into this vector, Rust doesn’t know what kind of elements we intend to store. This is an important point. Vectors are implemented using generics.
More often, you’ll create a Vec<T>
with initial values and Rust will infer the type of value you want to store, so you rarely need to do this type annotation. Rust conveniently provides the vec!
macro, which will create a new vector that holds the values you give it.
let v = vec![1, 2, 3];
Updating a Vector
To create a vector and then add elements to it, we can use the push
method.
let mut v = Vec::new();
v.push(5);
v.push(6);
As with any variable, if we want to be able to change its value, we need to make it mutable using the mut
keyword. The numbers we place inside are all of type i32, and Rust infers this from the data, so we don’t need the Vec<i32>
annotation.
Reading Elements of Vectors
There are two ways to reference a value stored in a vector: via indexing or using the get method.
let v = vec![1, 2, 3, 4, 5];
let third: &i32 = &v[2];
println!("The third element is {third}");
let third: Option<&i32> = v.get(2);
match third {
Some(third) => println!("The third element is {third}"),
None => println!("There is no third element."),
}
Using &
and []
gives us a reference to the element at the index value. When we use the get
method with the index passed as an argument, we get an Option<&T>
that we can use with match
.
The reason Rust provides these two ways to reference an element is so you can choose how the program behaves when you try to use an index value outside the range of existing elements.
let v = vec![1, 2, 3, 4, 5];
let does_not_exist = &v[100];
let does_not_exist = v.get(100);
When we run this code, the first []
method will cause the program to panic because it references a nonexistent element. This method is best used when you want your program to crash if there’s an attempt to access an element past the end of the vector.
When the get
method is passed an index that is outside the vector, it returns None
without panicking. You would use this method if accessing an element beyond the range of the vector may happen occasionally under normal circumstances. Your code will then have logic to handle having either Some(&element)
or None
.
When the program has a valid reference, the borrow checker enforces the ownership and borrowing rules to ensure this reference and any other references to the contents of the vector remain valid.
Recall the rule that states you can’t have mutable and immutable references in the same scope. That rule applies that we hold an immutable reference to the first element in a vector and try to add an element to the end.
let mut v = vec![1, 2, 3, 4, 5];
let first = &v[0];
v.push(6);
println!("The first element is: {first}");
The code might look like it should work: why should a reference to the first element care about changes at the end of the vector? This error is due to the way vectors work: because vectors put the values next to each other in memory, adding a new element onto the end of the vector might require allocating new memory and copying the old elements to the new space, if there isn’t enough room to put all the elements next to each other where the vector is currently stored. In that case, the reference to the first element would be pointing to deallocated memory. The borrowing rules prevent programs from ending up in that situation.
Iterating over the Values in a Vector
To access each element in a vector in turn, we would iterate through all of the elements rather than use indices to access one at a time. You can use a for loop to get immutable references to each element in a vector and print them.
let v = vec![100, 32, 57];
for i in &v {
println!("{i}");
}
We can also iterate over mutable references to each element in a mutable vector in order to make changes to all the elements.
let mut v = vec![100, 32, 57];
for i in &mut v {
*i += 50;
}
To change the value that the mutable reference refers to, we have to use the *
dereference operator to get to the value in i before we can change it.
Iterating over a vector, whether immutably or mutably, is safe because of the borrow checker’s rules. If we attempted to insert or remove items in the for
loop bodies, we would get a compiler error. The reference to the vector that the for
loop holds prevents simultaneous modification of the whole vector.
Using an Enum to Store Multiple Types
Vectors can only store values that are the same type. Fortunately, the variants of an enum are defined under the same enum type, so when we need one type to represent elements of different types, we can define and use an enum.
We can define an enum whose variants will hold the different value types, and all the enum variants will be considered the same type: that of the enum. Then we can create a vector to hold that enum and so, ultimately, holds different types.
enum SpreadsheetCell {
Int(i32),
Float(f64),
Text(String),
}
let row = vec![
SpreadsheetCell::Int(3),
SpreadsheetCell::Text(String::from("blue")),
SpreadsheetCell::Float(10.12),
];
Rust needs to know what types will be in the vector at compile time so it knows exactly how much memory on the heap will be needed to store each element. We must also be explicit about what types are allowed in this vector. If Rust allowed a vector to hold any type, there would be a chance that one or more of the types would cause errors with the operations performed on the elements of the vector. Using an enum plus a match expression means that Rust will ensure at compile time that every possible case is handled.
If you don’t know the exhaustive set of types a program will get at runtime to store in a vector, the enum technique won’t work. Instead, you can use a trait object.
Dropping a Vector Drops Its Elements
Like any other struct, a vector is freed when it goes out of scope.
When the vector gets dropped, all of its contents are also dropped, meaning the integers it holds will be cleaned up. The borrow checker ensures that any references to contents of a vector are only used while the vector itself is valid.
Storing UTF-8 Encoded Text with Strings
New Rustaceans commonly get stuck on strings for a combination of three reasons: Rust’s propensity for exposing possible errors, strings being a more complicated data structure than many programmers give them credit for, and UTF-8.
What Is a String?
Rust has only one string type in the core language, which is the string slice str that is usually seen in its borrowed form &str
.
The String
type, which is provided by Rust’s standard library rather than coded into the core language, is a growable, mutable, owned, UTF-8 encoded string type.
Both String
and string slices are UTF-8 encoded.
Creating a New String
Many of the same operations available with Vec<T>
are available with String
as well, because String
is actually implemented as a wrapper around a vector of bytes with some extra guarantees, restrictions, and capabilities.
let mut s = String::new();
Often, we’ll have some initial data that we want to start the string with. For that, we use the to_string
method, which is available on any type that implements the Display
trait, as string literals do.
let data = "initial contents";
let s = data.to_string();
// the method also works on a literal directly:
let s = "initial contents".to_string();
We can also use the function String::from
to create a String
from a string literal.
let s = String::from("initial contents");
In this case, String::from
and to_string
do the same thing, so which you choose is a matter of style and readability.
Updating a String
A String
can grow in size and its contents can change, just like the contents of a Vec<T>
, if you push more data into it. In addition, you can conveniently use the +
operator or the format!
macro to concatenate String values.
Appending to a String with
push_str
andpush
let mut s = String::from("foo"); s.push_str("bar");
The push_str method takes a string slice because we don’t necessarily want to take ownership of the parameter.
let mut s1 = String::from("foo"); let s2 = "bar"; s1.push_str(s2); println!("s2 is {s2}");
The
push
method takes a single character as a parameter and adds it to theString
.let mut s = String::from("lo"); s.push('l');
Concatenation with the
+
Operator or theformat!
Macrolet s1 = String::from("Hello, "); let s2 = String::from("world!"); let s3 = s1 + &s2; // note s1 has been moved here and can no longer be used
The reason
s1
is no longer valid after the addition, and the reason we used a reference tos2
, has to do with the signature of the method that’s called when we use the + operator. The + operator uses the add method, whose signature looks something like this:fn add(self, s: &str) -> String {
In the standard library, you’ll see add defined using generics and associated types. Here, we’ve substituted in concrete types, which is what happens when we call this method with String values. This signature gives us the clues we need to understand the tricky bits of the
+
operator.First,
s2
has an&
, meaning that we’re adding a reference of the second string to the first string. This is because of thes
parameter in theadd
function: we can only add a&str
to aString
; we can’t add twoString
values together. But wait —— the type of&s2
is&String
, not&str
, as specified in the second parameter to add.The reason we’re able to use
&s2
in the call toadd
is that the compiler can coerce the&String
argument into a&str
. When we call theadd
method, Rust uses a deref coercion, which here turns&s2
into&s2[..]
. Becauseadd
does not take ownership of thes
parameter,s2
will still be a validString
after this operation.Second, we can see in the signature that
add
takes ownership ofself
, becauseself
does not have an&
. This meanss1
will be moved into theadd
call and will no longer be valid after that. So althoughlet s3 = s1 + &s2;
looks like it will copy both strings and create a new one, this statement actually takes ownership ofs1
, appends a copy of the contents ofs2
, and then returns ownership of the result. In other words, it looks like it’s making a lot of copies but isn’t; the implementation is more efficient than copying.If we need to concatenate multiple strings, the behavior of the + operator gets unwieldy:
let s1 = String::from("tic"); let s2 = String::from("tac"); let s3 = String::from("toe"); let s = s1 + "-" + &s2 + "-" + &s3;
For more complicated string combining, we can instead use the format! macro:
let s1 = String::from("tic"); let s2 = String::from("tac"); let s3 = String::from("toe"); let s = format!("{s1}-{s2}-{s3}");
The
format!
macro works likeprintln!
, but instead of printing the output to the screen, it returns aString
with the contents. The version of the code usingformat!
is much easier to read, and the code generated by theformat!
macro uses references so that this call doesn’t take ownership of any of its parameters.
Indexing into Strings
In many other programming languages, accessing individual characters in a string by referencing them by index is a valid and common operation. However, if you try to access parts of a String using indexing syntax in Rust, you’ll get an error. Rust strings don’t support indexing.
Internal Representation
A
String
is a wrapper over aVec<u8>
.let hello = String::from("Hola"); // In this case, len will be 4, which means the vector storing the string "Hola" is 4 bytes long. Each of these letters takes 1 byte when encoded in UTF-8.
let hello = String::from("Здравствуйте"); // In this case, len will be 24, not 12. That's the number of bytes it takes to encode "Здравствуйте" in UTF-8, because each Unicode scalar value in that string takes 2 bytes of storage.
Different Unicode scalar value takes different bytes of storage. Therefore, an index into the string’s bytes will not always correlate to a valid Unicode scalar value.
let hello = "hello"; let answer = &hello[0]; // If &"hello"[0] were valid code that returned the byte value, it would return 104, not h.
Another Reason is that when encoded in UTF-8, indexing into a string will return a byte value instead of the character on its own.
The answer, then, is that to avoid returning an unexpected value and causing bugs that might not be discovered immediately, Rust doesn’t compile this code at all and prevents misunderstandings early in the development process.
Bytes and Scalar Values and Grapheme Clusters! Oh My!
Another point about UTF-8 is that there are actually three relevant ways to look at strings from Rust’s perspective: as bytes (how computers ultimately store this data), scalar values (what Rust’s char type is), and grapheme clusters (the closest thing to what we would call letters).
If we look at the Hindi word “नमस्ते” written in the Devanagari script, it is stored as a vector of u8 values that looks like this:
[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164, 224, 165, 135]
That’s 18 bytes and is how computers ultimately store this data. If we look at them as Unicode scalar values, which are what Rust’s char type is, those bytes look like this:
['न', 'म', 'स', '्', 'त', 'े']
There are six char values here, but the fourth and sixth are not letters: they’re diacritics that don’t make sense on their own. Finally, if we look at them as grapheme clusters, we’d get what a person would call the four letters that make up the Hindi word:
["न", "म", "स्", "ते"]
Rust provides different ways of interpreting the raw string data that computers store so that each program can choose the interpretation it needs, no matter what human language the data is in.
A final reason Rust doesn’t allow us to index into a String to get a character is that indexing operations are expected to always take constant time (O(1)). But it isn’t possible to guarantee that performance with a String, because Rust would have to walk through the contents from the beginning to the index to determine how many valid characters there were.
Slicing Strings
Indexing into a string is often a bad idea because it’s not clear what the return type of the string-indexing operation should be: a byte value, a character, a grapheme cluster, or a string slice.
Rather than indexing using []
with a single number, you can use []
with a range to create a string slice containing particular bytes.
let hello = "Здравствуйте";
let s = &hello[0..4];
If we were to try to slice only part of a character’s bytes with something like &hello[0..1]
, Rust would panic at runtime because index 0 is not a char boundary. Each char has 2 bytes in this case.
Methods for Iterating Over Strings
The best way to operate on pieces of strings is to be explicit about whether you want characters or bytes. For individual Unicode scalar values, use the chars
method. Alternatively, the bytes
method returns each raw byte.
for c in "Зд".chars() {
println!("{c}");
}
// This code will print the following:
// З
// д
for b in "Зд".bytes() {
println!("{b}");
}
// This code will print the four bytes that make up this string:
// 208
// 151
// 208
// 180
But be sure to remember that valid Unicode scalar values may be made up of more than 1 byte.
Getting grapheme clusters from strings as with the Devanagari script is complex, so this functionality is not provided by the standard library.
Strings Are Not So Simple
Rust has chosen to make the correct handling of String
data the default behavior for all Rust programs, which means programmers have to put more thought into handling UTF-8 data upfront. This trade-off exposes more of the complexity of strings than is apparent in other programming languages, but it prevents you from having to handle errors involving non-ASCII characters later in your development life cycle.
Storing Keys with Associated Values in Hash Maps
The type HashMap<K, V>
stores a mapping of keys of type K to values of type V using a hashing function, which determines how it places these keys and values into memory.
Many programming languages support this kind of data structure, but they often use a different name, such as hash, map, object, hash table, dictionary, or associative array, just to name a few.
Hash maps are useful when you want to look up data not by using an index, as you can with vectors, but by using a key that can be of any type.
Creating a New Hash Map
One way to create an empty hash map is using new and adding elements with insert.
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
Note that we need to first use
the HashMap
from the collections portion of the standard library. Of our three common collections, this one is the least often used, so it’s not included in the features brought into scope automatically in the prelude. Hash maps also have less support from the standard library; there’s no built-in macro to construct them, for example.
Just like vectors, hash maps store their data on the heap. Like vectors, hash maps are homogeneous: all of the keys must have the same type as each other, and all of the values must have the same type.
Accessing Values in a Hash Map
We can get a value out of the hash map by providing its key to the get method.
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
let team_name = String::from("Blue");
let score = scores.get(&team_name).copied().unwrap_or(0);
The get
method returns an Option<&V>
; if there’s no value for that key in the hash map, get
will return None
. This program handles the Option
by calling copied
to get an Option<i32>
rather than an Option<&i32>
, then unwrap_or
to set score
to zero if scores
doesn’t have an entry for the key.
We can iterate over each key/value pair in a hash map in a similar manner as we do with vectors, using a for
loop.
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
for (key, value) in &scores {
println!("{key}: {value}");
}
Hash Maps and Ownership
For types that implement the Copy
trait, like i32
, the values are copied into the hash map. For owned values like String
, the values will be moved and the hash map will be the owner of those values.
use std::collections::HashMap;
let field_name = String::from("Favorite color");
let field_value = String::from("Blue");
let mut map = HashMap::new();
map.insert(field_name, field_value);
// field_name and field_value are invalid at this point, try using them and see what compiler error you get!
If we insert references to values into the hash map, the values won’t be moved into the hash map. The values that the references point to must be valid for at least as long as the hash map is valid.
Updating a Hash Map
Although the number of key and value pairs is growable, each unique key can only have one value associated with it at a time, but not vice versa.
When you want to change the data in a hash map, you have to decide how to handle the case when a key already has a value assigned. You could replace the old value with the new value, completely disregarding the old value. You could keep the old value and ignore the new value, only adding the new value if the key doesn’t already have a value. Or you could combine the old value and the new value.
Overwriting a Value
If we insert a key and a value into a hash map and then insert that same key with a different value, the value associated with that key will be replaced.
use std::collections::HashMap; let mut scores = HashMap::new(); scores.insert(String::from("Blue"), 10); scores.insert(String::from("Blue"), 25); println!("{:?}", scores); // {"Blue": 25}
Adding a Key and Value Only If a Key Isn’t Present
It’s common to check whether a particular key already exists in the hash map with a value then take the following actions: if the key does exist in the hash map, the existing value should remain the way it is. If the key doesn’t exist, insert it and a value for it.
Hash maps have a special API for this called
entry
that takes the key you want to check as a parameter. The return value of theentry
method is an enum calledEntry
that represents a value that might or might not exist.use std::collections::HashMap; let mut scores = HashMap::new(); scores.insert(String::from("Blue"), 10); scores.entry(String::from("Yellow")).or_insert(50); scores.entry(String::from("Blue")).or_insert(50); println!("{:?}", scores); // {"Yellow": 50, "Blue": 10}
The
or_insert
method onEntry
is defined to return a mutable reference to the value for the correspondingEntry
key if that key exists, and if not, inserts the parameter as the new value for this key and returns a mutable reference to the new value.Updating a Value Based on the Old Value
Another common use case for hash maps is to look up a key’s value and then update it based on the old value.
use std::collections::HashMap; let text = "hello world wonderful world"; let mut map = HashMap::new(); for word in text.split_whitespace() { let count = map.entry(word).or_insert(0); *count += 1; } println!("{:?}", map); // {"world": 2, "hello": 1, "wonderful": 1}
The
split_whitespace
method returns an iterator over sub-slices, separated by whitespace, of the value intext
. Theor_insert
method returns a mutable reference (&mut V
) to the value for the specified key. Here we store that mutable reference in thecount
variable, so in order to assign to that value, we must first dereferencecount
using the asterisk(*).
The mutable reference goes out of scope at the end of thefor
loop, so all of these changes are safe and allowed by the borrowing rules.
Hashing Functions
By default, HashMap uses a hashing function called SipHash that can provide resistance to Denial of Service (DoS) attacks involving hash tables. This is not the fastest hashing algorithm available, but the trade-off for better security that comes with the drop in performance is worth it.
If you profile your code and find that the default hash function is too slow for your purposes, you can switch to another function by specifying a different hasher. A hasher is a type that implements the BuildHasher trait.
Error Handling
Rust groups errors into two major categories: recoverable and unrecoverable errors.
For a recoverable error, such as a file not found error, we most likely just want to report the problem to the user and retry the operation.
Unrecoverable errors are always symptoms of bugs, like trying to access a location beyond the end of an array, and so we want to immediately stop the program.
Unrecoverable Errors with panic!
Sometimes, bad things happen in your code, and there’s nothing you can do about it. In these cases, Rust has the panic!
macro.
There are two ways to cause a panic in practice: by taking an action that causes our code to panic (such as accessing an array past the end) or by explicitly calling the panic! macro. In both cases, we cause a panic in our program.
By default, these panics will print a failure message, unwind, clean up the stack, and quit. Via an environment variable, you can also have Rust display the call stack when a panic occurs to make it easier to track down the source of the panic.
The call to panic!
causes the error message contained in the last two lines. The first line shows our panic message and the place in our source code where the panic occurred: src/main.rs:2:5
indicates that it’s the second line, fifth character of our src/main.rs file.
In other cases, the panic!
call might be in code that our code calls, and the filename and line number reported by the error message will be someone else’s code where the panic!
macro is called, not the line of our code that eventually led to the panic!
call. We can use the backtrace of the functions the panic!
call came from to figure out the part of our code that is causing the problem.
Unwinding the Stack or Aborting in Response to a Panic
By default, when a panic occurs, the program starts unwinding, which means Rust walks back up the stack and cleans up the data from each function it encounters. However, this walking back and cleanup is a lot of work. Rust, therefore, allows you to choose the alternative of immediately aborting, which ends the program without cleaning up.
Memory that the program was using will then need to be cleaned up by the operating system. If in your project you need to make the resulting binary as small as possible, you can switch from unwinding to aborting upon a panic by adding panic = 'abort'
to the appropriate [profile]
sections in your Cargo.toml file. For example, if you want to abort on panic in release mode, add this:
[profile.release]
panic = 'abort'
Using a panic!
Backtrace
In C, attempting to read beyond the end of a data structure is undefined behavior. You might get whatever is at the location in memory that would correspond to that element in the data structure, even though the memory doesn’t belong to that structure. This is called a buffer overread and can lead to security vulnerabilities if an attacker is able to manipulate the index in such a way as to read data they shouldn’t be allowed to that is stored after the data structure.
To protect your program from this sort of vulnerability, if you try to read an element at an index that doesn’t exist, Rust will stop execution and refuse to continue.
We can set the RUST_BACKTRACE
environment variable to any value except 0 to get a backtrace of exactly what happened to cause the error. A backtrace is a list of all the functions that have been called to get to this point. Backtraces in Rust work as they do in other languages: the key to reading the backtrace is to start from the top and read until you see files you wrote. That’s the spot where the problem originated. The lines above that spot are code that your code has called; the lines below are code that called your code. These before-and-after lines might include core Rust code, standard library code, or crates that you’re using.
In order to get backtraces with this information, debug symbols must be enabled. Debug symbols are enabled by default when using cargo build
or cargo run
without the --release
flag, as we have here.
Recoverable Errors with Result
The Result
enum is defined as having two variants, Ok
and Err
, as follows:
enum Result<T, E> {
Ok(T),
Err(E),
}
T
represents the type of the value that will be returned in a success case within the Ok
variant, and E
represents the type of the error that will be returned in a failure case within the Err
variant. Because Result
has these generic type parameters, we can use the Result
type and the functions defined on it in many different situations where the successful value and error value we want to return may differ.
use std::fs::File;
fn main() {
let greeting_file_result = File::open("hello.txt");
}
The return type of File::open
is a Result<T, E>
. The generic parameter T
has been filled in by the implementation of File::open
with the type of the success value, std::fs::File
, which is a file handle. The type of E
used in the error value is std::io::Error
.
We need to add to the code to take different actions depending on the value File::open
returns.
use std::fs::File;
fn main() {
let greeting_file_result = File::open("hello.txt");
let greeting_file = match greeting_file_result {
Ok(file) => file,
Err(error) => panic!("Problem opening the file: {:?}", error),
};
}
Note that, like the Option
enum, the Result
enum and its variants have been brought into scope by the prelude, so we don’t need to specify Result::
before the Ok
and Err
variants in the match arms.
Matching on Different Errors
The preceding code will panic!
no matter why File::open
failed. However, we want to take different actions for different failure reasons: if File::open
failed because the file doesn’t exist, we want to create the file and return the handle to the new file. If File::open
failed for any other reason—for example, because we didn’t have permission to open the file - we still want the code to panic!
.
use std::fs::File;
use std::io::ErrorKind;
fn main() {
let greeting_file_result = File::open("hello.txt");
let greeting_file = match greeting_file_result {
Ok(file) => file,
Err(error) => match error.kind() {
ErrorKind::NotFound => match File::create("hello.txt") {
Ok(fc) => fc,
Err(e) => panic!("Problem creating the file: {:?}", e),
},
other_error => {
panic!("Problem opening the file: {:?}", other_error);
}
},
};
}
The type of the value that File::open
returns inside the Err
variant is io::Error
, which is a struct provided by the standard library. This struct has a method kind that we can call to get an io::ErrorKind
value. The enum io::ErrorKind
is provided by the standard library and has variants representing the different kinds of errors that might result from an io
operation. The variant we want to use is ErrorKind::NotFound
, which indicates the file we’re trying to open doesn’t exist yet. So we match on greeting_file_result
, but we also have an inner match on error.kind()
.
The condition we want to check in the inner match is whether the value returned by error.kind()
is the NotFound
variant of the ErrorKind
enum. If it is, we try to create the file with File::create
. However, because File::create
could also fail, we need a second arm in the inner match expression. When the file can’t be created, a different error message is printed. The second arm of the outer match stays the same, so the program panics on any error besides the missing file error.
Alternatives to Using match
with Result<T, E>
That’s a lot of match
! The match
expression is very useful but also very much a primitive. Closures are used with many of the methods defined on Result<T, E>
. These methods can be more concise than using match
when handling Result<T, E>
values in your code.
use std::fs::File;
use std::io::ErrorKind;
fn main() {
let greeting_file = File::open("hello.txt").unwrap_or_else(|error| {
if error.kind() == ErrorKind::NotFound {
File::create("hello.txt").unwrap_or_else(|error| {
panic!("Problem creating the file: {:?}", error);
})
} else {
panic!("Problem opening the file: {:?}", error);
}
});
}
Many more of these methods can clean up huge nested match
expressions when you’re dealing with errors.
Shortcuts for Panic on Error: unwrap and expect
The unwrap
method is a shortcut method implemented just like the match
expression. If the Result
value is the Ok
variant, unwrap
will return the value inside the Ok
. If the Result
is the Err
variant, unwrap
will call the panic!
macro for us.
use std::fs::File;
fn main() {
let greeting_file = File::open("hello.txt").unwrap();
}
Similarly, the expect
method lets us also choose the panic!
error message. Using expect
instead of unwrap
and providing good error messages can convey your intent and make tracking down the source of a panic easier.
use std::fs::File;
fn main() {
let greeting_file = File::open("hello.txt")
.expect("hello.txt should be included in this project");
}
The error message used by expect
in its call to panic!
will be the parameter that we pass to expect
, rather than the default panic!
message that unwrap
uses.
Propagating Errors
When a function’s implementation calls something that might fail, instead of handling the error within the function itself, you can return the error to the calling code so that it can decide what to do. This is known as propagating the error and gives more control to the calling code, where there might be more information or logic that dictates how the error should be handled than what you have available in the context of your code.
use std::fs::File;
use std::io::{self, Read};
fn read_username_from_file() -> Result<String, io::Error> {
let username_file_result = File::open("hello.txt");
let mut username_file = match username_file_result {
Ok(file) => file,
Err(e) => return Err(e),
};
let mut username = String::new();
match username_file.read_to_string(&mut username) {
Ok(_) => Ok(username),
Err(e) => Err(e),
}
}
This function can be written in a much shorter way, but we’re going to start by doing a lot of it manually in order to explore error handling; at the end, we’ll show the shorter way. Let’s look at the return type of the function first: Result<String, io::Error>
. This means the function is returning a value of the type Result<T, E>
where the generic parameter T
has been filled in with the concrete type String
, and the generic type E
has been filled in with the concrete type io::Error
.
We chose io::Error
as the return type of this function because that happens to be the type of the error value returned from both of the operations we’re calling in this function’s body that might fail: the File::open
function and the read_to_string
method.
If File::open
succeeds, the file handle in the pattern variable file becomes the value in the mutable variable username_file
and the function continues. In the Err
case, instead of calling panic!
, we use the return
keyword to return early out of the function entirely and pass the error value from File::open
, now in the pattern variable e
, back to the calling code as this function’s error value.
The read_to_string
method also returns a Result
because it might fail, even though File::open
succeeded. So we need another match
to handle that Result
: if read_to_string
succeeds, then our function has succeeded, and we return the username
from the file that’s now in username
wrapped in an Ok
. If read_to_string
fails, we return the error value in the same way that we returned the error value in the match
that handled the return value of File::open
. However, we don’t need to explicitly say return, because this is the last expression in the function.
This pattern of propagating errors is so common in Rust that Rust provides the question mark operator ?
to make this easier.
A Shortcut for Propagating Errors: the ?
Operator
use std::fs::File;
use std::io::{self, Read};
fn read_username_from_file() -> Result<String, io::Error> {
let mut username_file = File::open("hello.txt")?;
let mut username = String::new();
username_file.read_to_string(&mut username)?;
Ok(username)
}
The ?
placed after a Result
value is defined to work in almost the same way as the match
expressions we defined to handle the Result
values. If the value of the Result
is an Ok
, the value inside the Ok
will get returned from this expression, and the program will continue. If the value is an Err
, the Err
will be returned from the whole function as if we had used the return
keyword so the error value gets propagated to the calling code.
There is a difference between what the match
expression does and what the ?
operator does: error values that have the ?
operator called on them go through the from
function, defined in the From
trait in the standard library, which is used to convert values from one type into another. When the ?
operator calls the from
function, the error type received is converted into the error type defined in the return type of the current function. This is useful when a function returns one error type to represent all the ways a function might fail, even if parts might fail for many different reasons.
For example, we could change the read_username_from_file
function to return a custom error type named OurError
that we define. If we also define impl From<io::Error> for OurError
to construct an instance of OurError
from an io::Error
, then the ?
operator calls in the body of read_username_from_file
will call from
and convert the error types without needing to add any more code to the function.
The ?
at the end of the File::open
call will return the value inside an Ok
to the variable username_file
. If an error occurs, the ?
operator will return early out of the whole function and give any Err
value to the calling code. The same thing applies to the ?
at the end of the read_to_string
call.
The ?
operator eliminates a lot of boilerplate and makes this function’s implementation simpler. We could even shorten this code further by chaining method calls immediately after the ?
.
use std::fs::File;
use std::io::{self, Read};
fn read_username_from_file() -> Result<String, io::Error> {
let mut username = String::new();
File::open("hello.txt")?.read_to_string(&mut username)?;
Ok(username)
}
Instead of creating a variable username_file
, we’ve chained the call to read_to_string
directly onto the result of File::open("hello.txt")?
. We still have a ?
at the end of the read_to_string
call, and we still return an Ok
value containing username
when both File::open
and read_to_string
succeed rather than returning errors.
use std::fs;
use std::io;
fn read_username_from_file() -> Result<String, io::Error> {
fs::read_to_string("hello.txt")
}
Reading a file into a string is a fairly common operation, so the standard library provides the convenient fs::read_to_string
function that opens the file, creates a new String
, reads the contents of the file, puts the contents into that String
, and returns it. Of course, using fs::read_to_string
doesn’t give us the opportunity to explain all the error handling, so we did it the longer way first.
Where The ?
Operator Can Be Used
The ?
operator can only be used in functions whose return type is compatible with the value the ?
is used on. This is because the ?
operator is defined to perform an early return of a value out of the function.
Let’s look at the error we’ll get if we use the ?
operator in a main function with a return type incompatible with the type of the value we use ?
on:
use std::fs::File;
fn main() {
let greeting_file = File::open("hello.txt")?;
}
This code opens a file, which might fail. The ?
operator follows the Result
value returned by File::open
, but this main function has the return type of ()
, not Result
.
This error points out that we’re only allowed to use the ?
operator in a function that returns Result
, Option
, or another type that implements FromResidual
.
To fix the error, you have two choices. One choice is to change the return type of your function to be compatible with the value you’re using the ?
operator on as long as you have no restrictions preventing that. The other technique is to use a match or one of the Result<T, E>
methods to handle the Result<T, E>
in whatever way is appropriate.
The error message also mentioned that ?
can be used with Option<T>
values as well. As with using ?
on Result
, you can only use ?
on Option
in a function that returns an Option
. If the value is None
, the None
will be returned early from the function at that point. If the value is Some
, the value inside the Some
is the resulting value of the expression and the function continues.
fn last_char_of_first_line(text: &str) -> Option<char> {
text.lines().next()?.chars().last()
}
This function returns Option<char>
because it’s possible that there is a character there, but it’s also possible that there isn’t. This code takes the text
string slice argument and calls the lines
method on it, which returns an iterator over the lines in the string. Because this function wants to examine the first line, it calls next
on the iterator to get the first value from the iterator. If text
is the empty string, this call to next
will return None
, in which case we use ?
to stop and return None
from last_char_of_first_line
. If text
is not the empty string, next
will return a Some
value containing a string slice of the first line in text
.
The ?
extracts the string slice, and we can call chars
on that string slice to get an iterator of its characters. We’re interested in the last character in this first line, so we call last
to return the last item in the iterator. This is an Option
because it’s possible that the first line is the empty string, for example if text starts with a blank line but has characters on other lines, as in “\nhi”. However, if there is a last character on the first line, it will be returned in the Some
variant.
Note that you can use the ?
operator on a Result
in a function that returns Result
, and you can use the ?
operator on an Option
in a function that returns Option
, but you can’t mix and match. The ?
operator won’t automatically convert a Result
to an Option
or vice versa; in those cases, you can use methods like the ok
method on Result
or the ok_or
method on Option
to do the conversion explicitly.
So far, all the main
functions we’ve used return ()
. The main
function is special because it’s the entry and exit point of executable programs, and there are restrictions on what its return type can be for the programs to behave as expected.
Luckily, main
can also return a Result<(), E>
. We’ve changed the return type of main
to be Result<(), Box<dyn Error>>
and added a return value Ok(())
to the end. This code will now compile:
use std::error::Error;
use std::fs::File;
fn main() -> Result<(), Box<dyn Error>> {
let greeting_file = File::open("hello.txt")?;
Ok(())
}
The Box<dyn Error>
type is a trait object, which we’ll talk about in Chapter 17. For now, you can read Box<dyn Error>
to mean “any kind of error.” Using ?
on a Result
value in a main
function with the error type Box<dyn Error>
is allowed, because it allows any Err
value to be returned early.
When a main
function returns a Result<(), E>
, the executable will exit with a value of 0
if main
returns Ok(())
and will exit with a nonzero value if main
returns an Err
value.
The main
function may return any types that implement the std::process::Termination
trait, which contains a function report
that returns an ExitCode
.
To panic!
or Not to panic!
When you choose to return a Result
value, you give the calling code options. The calling code could choose to attempt to recover in a way that’s appropriate for its situation, or it could decide that an Err
value in this case is unrecoverable, so it can call panic! and turn your recoverable error into an unrecoverable one. Therefore, returning Result
is a good default choice when you’re defining a function that might fail.
In situations such as examples, prototype code, and tests, it’s more appropriate to write code that panics instead of returning a Result
.
Examples, Prototype Code, and Tests
When you’re writing an example to illustrate some concept, also including robust error-handling code can make the example less clear. In examples, it’s understood that a call to a method like unwrap
that could panic is meant as a placeholder for the way you’d want your application to handle errors, which can differ based on what the rest of your code is doing.
Similarly, the unwrap
and expect
methods are very handy when prototyping, before you’re ready to decide how to handle errors. They leave clear markers in your code for when you’re ready to make your program more robust.
If a method call fails in a test, you’d want the whole test to fail, even if that method isn’t the functionality under test. Because panic!
is how a test is marked as a failure, calling unwrap
or expect
is exactly what should happen.
Cases in Which You Have More Information Than the Compiler
It would also be appropriate to call unwrap
or expect
when you have some other logic that ensures the Result
will have an Ok
value, but the logic isn’t something the compiler understands. If you can ensure by manually inspecting the code that you’ll never have an Err
variant, it’s perfectly acceptable to call unwrap
, and even better to document the reason you think you’ll never have an Err
variant in the expect
text. Here’s an example:
use std::net::IpAddr;
let home: IpAddr = "127.0.0.1"
.parse()
.expect("Hardcoded IP address should be valid");
However, having a hardcoded, valid string doesn’t change the return type of the parse
method: we still get a Result
value, and the compiler will still make us handle the Result
as if the Err
variant is a possibility because the compiler isn’t smart enough to see that this string is always a valid IP address. If the IP address string came from a user rather than being hardcoded into the program and therefore did have a possibility of failure, we’d definitely want to handle the Result
in a more robust way instead. Mentioning the assumption that this IP address is hardcoded will prompt us to change expect
to better error handling code if in the future, we need to get the IP address from some other source instead.
Guidelines for Error Handling
It’s advisable to have your code panic when it’s possible that your code could end up in a bad state. In this context, a bad state is when some assumption, guarantee, contract, or invariant has been broken, such as when invalid values, contradictory values, or missing values are passed to your code—plus one or more of the following:
The bad state is something that is unexpected, as opposed to something that will likely happen occasionally, like a user entering data in the wrong format.
Your code after this point needs to rely on not being in this bad state, rather than checking for the problem at every step.
There’s not a good way to encode this information in the types you use.
If someone calls your code and passes in values that don’t make sense, it’s best to return an error if you can so the user of the library can decide what they want to do in that case. However, in cases where continuing could be insecure or harmful, the best choice might be to call panic!
and alert the person using your library to the bug in their code so they can fix it during development. Similarly, panic!
is often appropriate if you’re calling external code that is out of your control and it returns an invalid state that you have no way of fixing.
However, when failure is expected, it’s more appropriate to return a Result
than to make a panic!
call.
When your code performs an operation that could put a user at risk if it’s called using invalid values, your code should verify the values are valid first and panic if the values aren’t valid. This is mostly for safety reasons: attempting to operate on invalid data can expose your code to vulnerabilities. This is the main reason the standard library will call panic!
if you attempt an out-of-bounds memory access: trying to access memory that doesn’t belong to the current data structure is a common security problem. Functions often have contracts: their behavior is only guaranteed if the inputs meet particular requirements. Panicking when the contract is violated makes sense because a contract violation always indicates a caller-side bug and it’s not a kind of error you want the calling code to have to explicitly handle. In fact, there’s no reasonable way for calling code to recover; the calling programmers need to fix the code. Contracts for a function, especially when a violation will cause a panic, should be explained in the API documentation for the function.
However, having lots of error checks in all of your functions would be verbose and annoying. Fortunately, you can use Rust’s type system (and thus the type checking done by the compiler) to do many of the checks for you. If your function has a particular type as a parameter, you can proceed with your code’s logic knowing that the compiler has already ensured you have a valid value. For example, if you have a type rather than an Option
, your program expects to have something rather than nothing. Your code then doesn’t have to handle two cases for the Some
and None
variants: it will only have one case for definitely having a value. Code trying to pass nothing to your function won’t even compile, so your function doesn’t have to check for that case at runtime. Another example is using an unsigned integer type such as u32
, which ensures the parameter is never negative.
Creating Custom Types for Validation
Recall the guessing game in Chapter 2 in which our code asked the user to guess a number between 1 and 100. We never validated that the user’s guess was between those numbers before checking it against our secret number; we only validated that the guess was positive. But it would be a useful enhancement to guide the user toward valid guesses and have different behavior when a user guesses a number that’s out of range versus when a user types, for example, letters instead.
One way to do this would be to parse the guess as an i32 instead of only a u32 to allow potentially negative numbers, and then add a check for the number being in range, like so:
loop {
// --snip--
let guess: i32 = match guess.trim().parse() {
Ok(num) => num,
Err(_) => continue,
};
if guess < 1 || guess > 100 {
println!("The secret number will be between 1 and 100.");
continue;
}
match guess.cmp(&secret_number) {
// --snip--
}
However, this is not an ideal solution: if it was absolutely critical that the program only operated on values between 1 and 100, and it had many functions with this requirement, having a check like this in every function would be tedious (and might impact performance).
Instead, we can make a new type and put the validations in a function to create an instance of the type rather than repeating the validations everywhere. That way, it’s safe for functions to use the new type in their signatures and confidently use the values they receive. The following example shows one way to define a Guess type that will only create an instance of Guess if the new function receives a value between 1 and 100.
pub struct Guess {
value: i32,
}
impl Guess {
pub fn new(value: i32) -> Guess {
if value < 1 || value > 100 {
panic!("Guess value must be between 1 and 100, got {}.", value);
}
Guess { value }
}
pub fn value(&self) -> i32 {
self.value
}
}
The code in the body of the new
function tests value
to make sure it’s between 1 and 100.
The public method value
is necessary because the value
field of the Guess
struct is private. It’s important that the value
field be private so code using the Guess
struct is not allowed to set value
directly: code outside the module must use the Guess::new
function to create an instance of Guess
, thereby ensuring there’s no way for a Guess
to have a value
that hasn’t been checked by the conditions in the Guess::new
function.
Generic Types, Traits, and Lifetimes
Generics: abstract stand-ins for concrete types or other properties. We can express the behavior of generics or how they relate to other generics without knowing what will be in their place when compiling and running the code.
Generics allow us to replace specific types with a placeholder that represents multiple types to remove code duplication.
Generic Data Types
In Function Definitions
To parameterize the types in a new single function, we need to name the type parameter, just as we do for the value parameters to a function. You can use any identifier as a type parameter name. But we’ll use T
because, by convention, type parameter names in Rust are short, often just a letter, and Rust’s type-naming convention is UpperCamelCase. Short for “type”, T
is the default choice of most Rust programmers.
When we use a type parameter name in a function signature, we have to declare the type parameter name before we use it. To define the generic largest
function, place type name declarations inside angle brackets, <>
, between the name of the function and the parameter list, like this:
fn largest<T>(list: &[T]) -> &T {
The listing also shows how we can call the function with either a slice of i32
values or char
values. Note that this code won’t compile yet, but we’ll fix it later.
Filename: src/main.rs
fn largest<T>(list: &[T]) -> &T {
let mut largest = &list[0];
for item in list {
if item > largest {
largest = item;
}
}
largest
}
fn main() {
let number_list = vec![34, 50, 25, 100, 65];
let result = largest(&number_list);
println!("The largest number is {}", result);
let char_list = vec!['y', 'm', 'a', 'q'];
let result = largest(&char_list);
println!("The largest char is {}", result);
}
// If we compile this code right now, we'll get this error:
// binary operation `>` cannot be applied to type `&T`
// consider restricting type prarameter `T`: fn largest<T: std::cmp::PartialOrd>(list: &[T]) -> &T {
The help text mentions std::cmp::PartialOrd
, which is a trait, and we’re going to talk about traits in the next section. For now, know that this error states that the body of largest won’t work for all possible types that T
could be. Because we want to compare values of type T
in the body, we can only use types whose values can be ordered. To enable comparisons, the standard library has the std::cmp::PartialOrd
trait that you can implement on types. By following the help text’s suggestion, we restrict the types valid for T
to only those that implement PartialOrd
and this example will compile, because the standard library implements PartialOrd
on both i32
and char
.
In struct Definitions
We can also define structs to use a generic type parameter in one or more fields using the <>
syntax.
Filename: src/main.rs
struct Point<T> {
x: T,
y: T,
}
fn main() {
let integer = Point { x: 5, y: 10 };
let float = Point { x: 1.0, y: 4.0 };
// won't compile, `x` and `y` must be the same type
let wont_work = Point { x: 5, y: 4.0 };
}
To define a Point struct where x
and y
are both generics but could have different types, we can use multiple generic type parameters.
Filename: src/main.rs
struct Point<T, U> {
x: T,
y: U,
}
fn main() {
let both_integer = Point { x: 5, y: 10 };
let both_float = Point { x: 1.0, y: 4.0 };
let integer_and_float = Point { x: 5, y: 4.0 };
}
If you’re finding you need lots of generic types in your code, it could indicate that your code needs restructuring into smaller pieces.
In Enum Definitions
As we did with structs, we can define enums to hold generic data types in their variants.
enum Option<T> {
Some(T),
None,
}
By using the Option<T>
enum, we can express the abstract concept of an optional value, and because Option<T>
is generic, we can use this abstraction no matter what the type of the optional value is.
Enums can use multiple generic types as well. The definition of the Result
enum is one example:
enum Result<T, E> {
Ok(T),
Err(E),
}
This definition makes it convenient to use the Result
enum anywhere we have an operation that might succeed (return a value of some type T
) or fail (return an error of some type E
).
In Method Definitions
We can implement methods on structs and enums and use generic types in their definitions, too.
Filename: src/main.rs
struct Point<T> {
x: T,
y: T,
}
impl<T> Point<T> {
fn x(&self) -> &T {
&self.x
}
}
fn main() {
let p = Point { x: 5, y: 10 };
println!("p.x = {}", p.x());
}
Note that we have to declare T
just after impl
so we can use T
to specify that we’re implementing methods on the type Point<T>
. By declaring T
as a generic type after impl
, Rust can identify that the type in the angle brackets in Point
is a generic type rather than a concrete type. We could have chosen a different name for this generic parameter than the generic parameter declared in the struct definition, but using the same name is conventional. Methods written within an impl
that declares the generic type will be defined on any instance of the type, no matter what concrete type ends up substituting for the generic type.
We can also specify constraints on generic types when defining methods on the type. We could, for example, implement methods only on Point<f32>
instances rather than on Point<T>
instances with any generic type.
Filename: src/main.rs
impl Point<f32> {
fn distance_from_origin(&self) -> f32 {
(self.x.powi(2) + self.y.powi(2)).sqrt()
}
}
This code means the type Point<f32>
will have a distance_from_origin
method; other instances of Point<T>
where T
is not of type f32
will not have this method defined. The method measures how far our point is from the point at coordinates (0.0, 0.0) and uses mathematical operations that are available only for floating point types.
Generic type parameters in a struct definition aren’t always the same as those you use in that same struct’s method signatures.
Filename: src/main.rs
struct Point<X1, Y1> {
x: X1,
y: Y1,
}
impl<X1, Y1> Point<X1, Y1> {
fn mixup<X2, Y2>(self, other: Point<X2, Y2>) -> Point<X1, Y2> {
Point {
x: self.x,
y: other.y,
}
}
}
fn main() {
let p1 = Point { x: 5, y: 10.4 };
let p2 = Point { x: "Hello", y: 'c' };
let p3 = p1.mixup(p2);
println!("p3.x = {}, p3.y = {}", p3.x, p3.y);
}
// p3.x = 5, p3.y = c
Performance of Code Using Generics
You might be wondering whether there is a runtime cost when using generic type parameters. The good news is that using generic types won’t make your program run any slower than it would with concrete types.
Rust accomplishes this by performing monomorphization of the code using generics at compile time. Monomorphization is the process of turning generic code into specific code by filling in the concrete types that are used when compiled. In this process, the compiler does the opposite of the steps we used to create the generic function: the compiler looks at all the places where generic code is called and generates code for the concrete types the generic code is called with.
Let’s look at how this works by using the standard library’s generic Option<T>
enum:
let integer = Some(5);
let float = Some(5.0);
The monomorphized version of the code looks similar to the following (the compiler uses different names than what we’re using here for illustration):
Filename: src/main.rs
enum Option_i32 {
Some(i32),
None,
}
enum Option_f64 {
Some(f64),
None,
}
fn main() {
let integer = Option_i32::Some(5);
let float = Option_f64::Some(5.0);
}
The generic Option<T>
is replaced with the specific definitions created by the compiler. Because Rust compiles generic code into code that specifies the type in each instance, we pay no runtime cost for using generics. When the code runs, it performs just as it would if we had duplicated each definition by hand. The process of monomorphization makes Rust’s generics extremely efficient at runtime.
Traits: Defining Shared Behavior
A trait defines functionality a particular type has and can share with other types. We can use traits to define shared behavior in an abstract way. We can use trait bounds to specify that a generic type can be any type that has certain behavior.
Note: Traits are similar to a feature often called interfaces in other languages, although with some differences.
Defining a Trait
Trait definitions are a way to group method signatures together to define a set of behaviors necessary to accomplish some purpose.
For example, let’s say we have multiple structs that hold various kinds and amounts of text: a NewsArticle
struct that holds a news story filed in a particular location and a Tweet
that can have at most 280 characters along with metadata that indicates whether it was a new tweet, a retweet, or a reply to another tweet.
We want to make a media aggregator library crate named aggregator
that can display summaries of data that might be stored in a NewsArticle
or Tweet
instance. To do this, we need a summary from each type, and we’ll request that summary by calling a summarize
method on an instance. The following listing shows the definition of a public Summary
trait that expresses this behavior.
Filename: src/lib.rs
pub trait Summary {
fn summarize(&self) -> String;
}
After the method signature, instead of providing an implementation within curly brackets, we use a semicolon. Each type implementing this trait must provide its own custom behavior for the body of the method. The compiler will enforce that any type that has the Summary
trait will have the method summarize
defined with this signature exactly.
A trait can have multiple methods in its body: the method signatures are listed one per line and each line ends in a semicolon.
Implementing a Trait on a Type
Now that we’ve defined the desired signatures of the Summary
trait’s methods, we can implement it on the types in our media aggregator.
Filename: src/lib.rs
pub struct NewsArticle {
pub headline: String,
pub location: String,
pub author: String,
pub content: String,
}
impl Summary for NewsArticle {
fn summarize(&self) -> String {
format!("{}, by {} ({})", self.headline, self.author, self.location)
}
}
pub struct Tweet {
pub username: String,
pub content: String,
pub reply: bool,
pub retweet: bool,
}
impl Summary for Tweet {
fn summarize(&self) -> String {
format!("{}: {}", self.username, self.content)
}
}
Now that the library has implemented the Summary
trait on NewsArticle
and Tweet
, users of the crate can call the trait methods on instances of NewsArticle
and Tweet
in the same way we call regular methods. The only difference is that the user must bring the trait into scope as well as the types. Here’s an example of how a binary crate could use our aggregator
library crate:
use aggregator::{Summary, Tweet};
fn main() {
let tweet = Tweet {
username: String::from("horse_ebooks"),
content: String::from(
"of course, as you probably already know, people",
),
reply: false,
retweet: false,
};
println!("1 new tweet: {}", tweet.summarize());
}
// 1 new tweet: horse_ebooks: of course, as you probably already know, people
Other crates that depend on the aggregator
crate can also bring the Summary
trait into scope to implement Summary
on their own types. One restriction to note is that we can implement a trait on a type only if at least one of the trait or the type is local to our crate. For example, we can implement standard library traits like Display
on a custom type like Tweet
as part of our aggregator
crate functionality, because the type Tweet is local to our aggregator
crate. We can also implement Summary
on Vec<T>
in our aggregator
crate, because the trait Summary
is local to our aggregator
crate.
But we can’t implement external traits on external types. For example, we can’t implement the Display
trait on Vec<T>
within our aggregator
crate, because Display
and Vec<T>
are both defined in the standard library and aren’t local to our aggregator
crate. This restriction is part of a property called coherence, and more specifically the orphan rule, so named because the parent type is not present. This rule ensures that other people’s code can’t break your code and vice versa. Without the rule, two crates could implement the same trait for the same type, and Rust wouldn’t know which implementation to use.
Default Implementations
Sometimes it’s useful to have default behavior for some or all of the methods in a trait instead of requiring implementations for all methods on every type. Then, as we implement the trait on a particular type, we can keep or override each method’s default behavior.
Filename: src/lib.rs
pub trait Summary {
fn summarize(&self) -> String {
String::from("(Read more...)")
}
}
To use a default implementation to summarize instances of NewsArticle
, we specify an empty impl
block with impl Summary for NewsArticle {}
.
let article = NewsArticle {
headline: String::from("Penguins win the Stanley Cup Championship!"),
location: String::from("Pittsburgh, PA, USA"),
author: String::from("Iceburgh"),
content: String::from(
"The Pittsburgh Penguins once again are the best \
hockey team in the NHL.",
),
};
println!("New article available! {}", article.summarize());
// New article available! (Read more...)
The syntax for overriding a default implementation is the same as the syntax for implementing a trait method that doesn’t have a default implementation.
Default implementations can call other methods in the same trait, even if those other methods don’t have a default implementation.
pub trait Summary {
fn summarize_author(&self) -> String;
fn summarize(&self) -> String {
format!("(Read more from {}...)", self.summarize_author())
}
}
To use this version of Summary, we only need to define summarize_author when we implement the trait on a type:
impl Summary for Tweet {
fn summarize_author(&self) -> String {
format!("@{}", self.username)
}
}
let tweet = Tweet {
username: String::from("horse_ebooks"),
content: String::from(
"of course, as you probably already know, people",
),
reply: false,
retweet: false,
};
println!("1 new tweet: {}", tweet.summarize());
// 1 new tweet: (Read more from @horse_ebooks...)
Note that it isn’t possible to call the default implementation from an overriding implementation of that same method.
Traits as Parameters
Now that you know how to define and implement traits, we can explore how to use traits to define functions that accept many different types. To do this, we use the impl Trait
syntax, like this:
pub fn notify(item: &impl Summary) {
println!("Breaking news! {}", item.summarize());
}
Instead of a concrete type for the item
parameter, we specify the impl
keyword and the trait name. This parameter accepts any type that implements the specified trait.
Trait Bound Syntax
The impl Trait
syntax works for straightforward cases but is actually syntax sugar for a longer form known as a trait bound; it looks like this:
pub fn notify<T: Summary>(item: &T) {
println!("Breaking news! {}", item.summarize());
}
The impl Trait
syntax is convenient and makes for more concise code in simple cases, while the fuller trait bound syntax can express more complexity in other cases. For example, we can have two parameters that implement Summary
. Doing so with the impl Trait
syntax looks like this:
pub fn notify(item1: &impl Summary, item2: &impl Summary) {
Using impl Trait
is appropriate if we want this function to allow item1
and item2
to have different types (as long as both types implement Summary
). If we want to force both parameters to have the same type, however, we must use a trait bound, like this:
pub fn notify<T: Summary>(item1: &T, item2: &T) {
Specifying Multiple Trait Bounds with the + Syntax
We can also specify more than one trait bound. We specify in the notify
definition that item must implement both Display
and Summary
. We can do so using the +
syntax:
pub fn notify(item: &(impl Summary + Display)) {
The +
syntax is also valid with trait bounds on generic types:
pub fn notify<T: Summary + Display>(item: &T) {
Clearer Trait Bounds with where Clauses
Using too many trait bounds has its downsides. Each generic has its own trait bounds, so functions with multiple generic type parameters can contain lots of trait bound information between the function’s name and its parameter list, making the function signature hard to read. For this reason, Rust has alternate syntax for specifying trait bounds inside a where
clause after the function signature. So instead of writing this:
fn some_function<T: Display + Clone, U: Clone + Debug>(t: &T, u: &U) -> i32 {
we can use a where
clause, like this:
fn some_function<T, U>(t: &T, u: &U) -> i32
where
T: Display + Clone,
U: Clone + Debug,
{
Returning Types that Implement Traits
We can also use the impl Trait
syntax in the return position to return a value of some type that implements a trait, as shown here:
fn returns_summarizable() -> impl Summary {
Tweet {
username: String::from("horse_ebooks"),
content: String::from(
"of course, as you probably already know, people",
),
reply: false,
retweet: false,
}
}
The ability to specify a return type only by the trait it implements is especially useful in the context of closures and iterators. Closures and iterators create types that only the compiler knows or types that are very long to specify. The impl Trait
syntax lets you concisely specify that a function returns some type that implements the Iterator
trait without needing to write out a very long type.
However, you can only use impl Trait if you’re returning a single type. For example, this code that returns either a NewsArticle
or a Tweet
with the return type specified as impl Summary
wouldn’t work:
// wouldn't work
fn returns_summarizable(switch: bool) -> impl Summary {
if switch {
NewsArticle {
headline: String::from(
"Penguins win the Stanley Cup Championship!",
),
location: String::from("Pittsburgh, PA, USA"),
author: String::from("Iceburgh"),
content: String::from(
"The Pittsburgh Penguins once again are the best \
hockey team in the NHL.",
),
}
} else {
Tweet {
username: String::from("horse_ebooks"),
content: String::from(
"of course, as you probably already know, people",
),
reply: false,
retweet: false,
}
}
}
Using Trait Bounds to Conditionally Implement Methods
By using a trait bound with an impl block that uses generic type parameters, we can implement methods conditionally for types that implement the specified traits. For example, the type Pair<T>
in the following listing always implements the new
function to return a new instance of Pair<T>
. But in the next impl
block, Pair<T>
only implements the cmp_display
method if its inner type T
implements the PartialOrd
trait that enables comparison and the Display
trait that enables printing.
Filename: src/lib.rs
use std::fmt::Display;
struct Pair<T> {
x: T,
y: T,
}
impl<T> Pair<T> {
fn new(x: T, y: T) -> Self {
Self { x, y }
}
}
impl<T: Display + PartialOrd> Pair<T> {
fn cmp_display(&self) {
if self.x >= self.y {
println!("The largest member is x = {}", self.x);
} else {
println!("The largest member is y = {}", self.y);
}
}
}
We can also conditionally implement a trait for any type that implements another trait. Implementations of a trait on any type that satisfies the trait bounds are called blanket implementations and are extensively used in the Rust standard library. For example, the standard library implements the ToString
trait on any type that implements the Display
trait. The impl
block in the standard library looks similar to this code:
impl<T: Display> ToString for T {
// --snip--
}
Because the standard library has this blanket implementation, we can call the to_string
method defined by the ToString
trait on any type that implements the Display
trait. For example, we can turn integers into their corresponding String
values like this because integers implement Display
:
let s = 3.to_string();
Validating References with Lifetimes
Rather than ensuring that a type has the behavior we want, lifetimes ensure that references are valid as long as we need them to be.
Most of the time, lifetimes are implicit and inferred, just like most of the time, types are inferred. We must only annotate types when multiple types are possible. In a similar way, we must annotate lifetimes when the lifetimes of references could be related in a few different ways. Rust requires us to annotate the relationships using generic lifetime parameters to ensure the actual references used at runtime will definitely be valid.
Preventing Dangling References with Lifetimes
The main aim of lifetimes is to prevent dangling references, which cause a program to reference data other than the data it’s intended to reference. Consider the program in the following listing, which has an outer scope and an inner scope.
fn main() {
let r;
{
let x = 5;
r = &x; // error! borrowed value does not live long enough
} // `x` dropped here while still borrowed
println!("r: {}", r); // borrow later used here
}
The Borrow Checker
The Rust compiler has a borrow checker that compares scopes to determine whether all borrows are valid.
fn main() {
let r; // ---------+-- 'a
// |
{ // |
let x = 5; // -+-- 'b |
r = &x; // | |
} // -+ |
// |
println!("r: {}", r); // |
} // ---------+
Here, we’ve annotated the lifetime of r
with 'a
and the lifetime of x
with 'b
. As you can see, the inner 'b
block is much smaller than the outer 'a
lifetime block. At compile time, Rust compares the size of the two lifetimes and sees that r
has a lifetime of 'a
but that it refers to memory with a lifetime of 'b
. The program is rejected because 'b
is shorter than 'a
: the subject of the reference doesn’t live as long as the reference.
The following listing fixes the code so it doesn’t have a dangling reference and compiles without any errors.
fn main() {
let x = 5; // ----------+-- 'b
// |
let r = &x; // --+-- 'a |
// | |
println!("r: {}", r); // | |
// --+ |
} // ----------+
Here, x
has the lifetime 'b
, which in this case is larger than 'a
. This means r
can reference x
because Rust knows that the reference in r
will always be valid while x
is valid.
Generic Lifetimes in Functions
We’ll write a function that returns the longer of two string slices. This function will take two string slices and return a single string slice. After we’ve implemented the longest
function, the code in the following listing should print The longest string is abcd
.
Filename: src/main.rs
fn main() {
let string1 = String::from("abcd");
let string2 = "xyz";
let result = longest(string1.as_str(), string2);
println!("The longest string is {}", result);
}
Note that we want the function to take string slices, which are references, rather than strings, because we don’t want the longest
function to take ownership of its parameters.
If we try to implement the longest function as shown in the following listing, it won’t compile.
Filename: src/main.rs
fn longest(x: &str, y: &str) -> &str { // error! expected named lifetime parameter on return type
if x.len() > y.len() {
x
} else {
y
}
}
The help text reveals that the return type needs a generic lifetime parameter on it because Rust can’t tell whether the reference being returned refers to x
or y
. Actually, we don’t know either, because the if
block in the body of this function returns a reference to x
and the else
block returns a reference to y
!
Lifetime Annotation Syntax
Lifetime annotations don’t change how long any of the references live. Rather, they describe the relationships of the lifetimes of multiple references to each other without affecting the lifetimes. Just as functions can accept any type when the signature specifies a generic type parameter, functions can accept references with any lifetime by specifying a generic lifetime parameter.
Lifetime annotations have a slightly unusual syntax: the names of lifetime parameters must start with an apostrophe ('
) and are usually all lowercase and very short, like generic types. Most people use the name 'a
for the first lifetime annotation. We place lifetime parameter annotations after the &
of a reference, using a space to separate the annotation from the reference’s type.
Here are some examples:
&i32 // a reference
&'a i32 // a reference with an explicit lifetime
&'a mut i32 // a mutable reference with an explicit lifetime
Lifetime Annotations in Function Signatures
We want the signature to express the following constraint: the returned reference will be valid as long as both the parameters are valid. We’ll name the lifetime 'a
and then add it to each reference, as shown in the following listing.
Filename: src/main.rs
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() {
x
} else {
y
}
}
In practice, it means that the lifetime of the reference returned by the longest
function is the same as the smaller of the lifetimes of the values referred to by the function arguments.
Remember, when we specify the lifetime parameters in this function signature, we’re not changing the lifetimes of any values passed in or returned. Rather, we’re specifying that the borrow checker should reject any values that don’t adhere to these constraints. Note that the longest
function doesn’t need to know exactly how long x
and y
will live, only that some scope can be substituted for 'a
that will satisfy this signature.
When we pass concrete references to longest
, the concrete lifetime that is substituted for 'a
is the part of the scope of x
that overlaps with the scope of y
. In other words, the generic lifetime 'a
will get the concrete lifetime that is equal to the smaller of the lifetimes of x
and y
.
Let’s look at how the lifetime annotations restrict the longest
function by passing in references that have different concrete lifetimes. The following listing is a straightforward example.
Filename: src/main.rs
fn main() {
let string1 = String::from("long string is long");
{
let string2 = String::from("xyz");
let result = longest(string1.as_str(), string2.as_str());
println!("The longest string is {}", result);
}
}
Next, let’s try an example that shows that the lifetime of the reference in result
must be the smaller lifetime of the two arguments. The code in the following listing will not compile.
Filename: src/main.rs
fn main() {
let string1 = String::from("long string is long");
let result;
{
let string2 = String::from("xyz");
result = longest(string1.as_str(), string2.as_str()); // error! `string2` does not live long enough
} // `string2` dropped here while still borrowed
println!("The longest string is {}", result); // borrow later used here
}
Thinking in Terms of Lifetimes
The way in which you need to specify lifetime parameters depends on what your function is doing. For example, if we changed the implementation of the longest
function to always return the first parameter rather than the longest string slice, we wouldn’t need to specify a lifetime on the y
parameter. The following code will compile:
Filename: src/main.rs
fn longest<'a>(x: &'a str, y: &str) -> &'a str {
x
}
When returning a reference from a function, the lifetime parameter for the return type needs to match the lifetime parameter for one of the parameters. If the reference returned does not refer to one of the parameters, it must refer to a value created within this function. However, this would be a dangling reference because the value will go out of scope at the end of the function. Consider this attempted implementation of the longest
function that won’t compile:
Filename: src/main.rs
fn longest<'a>(x: &str, y: &str) -> &'a str {
let result = String::from("really long string");
result.as_str() // error! cannot return reference to local variable `result`
}
Ultimately, lifetime syntax is about connecting the lifetimes of various parameters and return values of functions. Once they’re connected, Rust has enough information to allow memory-safe operations and disallow operations that would create dangling pointers or otherwise violate memory safety.
Lifetime Annotations in Struct Definitions
We can define structs to hold references, but in that case we would need to add a lifetime annotation on every reference in the struct’s definition. The following listing has a struct named ImportantExcerpt
that holds a string slice.
Filename: src/main.rs
struct ImportantExcerpt<'a> {
part: &'a str,
}
fn main() {
let novel = String::from("Call me Ishmael. Some years ago...");
let first_sentence = novel.split('.').next().expect("Could not find a '.'");
let i = ImportantExcerpt {
part: first_sentence,
};
}
This annotation means an instance of ImportantExcerpt
can’t outlive the reference it holds in its part
field.
Lifetime Elision
You’ve learned that every reference has a lifetime and that you need to specify lifetime parameters for functions or structs that use references. However, in earlier chapter we had a function shown again in the following listing, that compiled without lifetime annotations.
Filename: src/lib.rs
fn first_word(s: &str) -> &str {
let bytes = s.as_bytes();
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return &s[0..i];
}
}
&s[..]
}
The patterns programmed into Rust’s analysis of references are called the lifetime elision rules. These aren’t rules for programmers to follow; they’re a set of particular cases that the compiler will consider, and if your code fits these cases, you don’t need to write the lifetimes explicitly.
Lifetimes on function or method parameters are called input lifetimes, and lifetimes on return values are called output lifetimes.
The compiler uses three rules to figure out the lifetimes of the references when there aren’t explicit annotations. The first rule applies to input lifetimes, and the second and third rules apply to output lifetimes. If the compiler gets to the end of the three rules and there are still references for which it can’t figure out lifetimes, the compiler will stop with an error. These rules apply to fn
definitions as well as impl
blocks.
The first rule is that the compiler assigns a lifetime parameter to each parameter that’s a reference. In other words, a function with one parameter gets one lifetime parameter: fn foo<'a>(x: &'a i32)
; a function with two parameters gets two separate lifetime parameters: fn foo<'a, 'b>(x: &'a i32, y: &'b i32)
; and so on.
The second rule is that, if there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters: fn foo<'a>(x: &'a i32) -> &'a i32
.
The third rule is that, if there are multiple input lifetime parameters, but one of them is &self
or &mut self
because this is a method, the lifetime of self
is assigned to all output lifetime parameters. This third rule makes methods much nicer to read and write because fewer symbols are necessary.
Lifetime Annotations in Method Definitions
impl<'a> ImportantExcerpt<'a> {
fn level(&self) -> i32 {
3
}
}
The lifetime parameter declaration after impl
and its use after the type name are required, but we’re not required to annotate the lifetime of the reference to self
because of the first elision rule.
Here is an example where the first and the third lifetime elision rule applies:
impl<'a> ImportantExcerpt<'a> {
fn announce_and_return_part(&self, announcement: &str) -> &str {
println!("Attention please: {}", announcement);
self.part
}
}
The Static Lifetime
One special lifetime we need to discuss is 'static
, which denotes that the affected reference can live for the entire duration of the program. All string literals have the 'static
lifetime, which we can annotate as follows:
let s: &'static str = "I have a static lifetime.";
The text of this string is stored directly in the program’s binary, which is always available. Therefore, the lifetime of all string literals is 'static
.
Generic Type Parameters, Trait Bounds, and Lifetimes Together
use std::fmt::Display;
fn longest_with_an_announcement<'a, T>(
x: &'a str,
y: &'a str,
ann: T,
) -> &'a str
where
T: Display,
{
println!("Announcement! {}", ann);
if x.len() > y.len() {
x
} else {
y
}
}
Writing Automated Tests
How to Write Tests
Tests are Rust functions that verify that the non-test code is functioning in the expected manner. The bodies of test functions typically perform these three actions:
Set up any needed data or state.
Run the code you want to test.
Assert the results are what you expect.
The Anatomy of a Test Function
Attributes are metadata about pieces of Rust code. A Test in Rust is a function that’s annotated with the test
attribute.
To change a function into a test function, add #[test]
on the line before fn
. When you run your tests with the cargo test
command, Rust builds a test runner binary that runs the annotated functions and reports on whether each test function passes or fails.
Whenever we make a new library project with Cargo, a test module with a test function in it is automatically generated for us.
Let’s create a new library project called adder
that will add two numbers:
$ cargo new adder --lib
Created library `adder` project
$ cd adder
The contents of the src/lib.rs file in your adder
library should look like the following listing.
Filename: src/lib.rs
#[cfg(test)]
mod tests {
#[test]
fn it_works() {
let result = 2 + 2;
assert_eq!(result, 4);
}
}
For now, let’s ignore the top two lines and focus on the function. Note the #[test]
annotation: this attribute indicates this is a test function, so the test runner knows to treat this function as a test. We might also have non-test functions in the tests module to help set up common scenarios or perform common operations, so we always need to indicate which functions are tests.
The cargo test
command runs all tests in our project, as shown in the following listing.
$ cargo test
Compiling adder v0.1.0 (file:///projects/adder)
Finished test [unoptimized + debuginfo] target(s) in 0.57s
Running unittests src/lib.rs (target/debug/deps/adder-92948b65e88960b4)
running 1 test
test tests::it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests adder
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Cargo compiled and ran the test. We see the line running 1 test
. The next line shows the name of the generated test function, called it_works
, and that the result of running that test is ok
. The overall summary test result: ok.
means that all the tests passed, and the portion that reads 1 passed; 0 failed
totals the number of tests that passed or failed.
It’s possible to mark a test as ignored so it doesn’t run in a particular instance. Because we haven’t done that here, the summary shows 0 ignored
. We can also pass an argument to the cargo test
command to run only tests whose name matches a string; this is called filtering. We also haven’t filtered the tests being run, so the end of the summary shows 0 filtered out
.
The 0 measured
statistic is for benchmark tests that measure performance. Benchmark tests are, as of this writing, only available in nightly Rust.
The next part of the test output starting at Doc-tests adder
is for the results of any documentation tests. We don’t have any documentation tests yet, but Rust can compile any code examples that appear in our API documentation. This feature helps keep your docs and your code in sync! For now, we’ll ignore the Doc-tests
output.
Let’s start to customize the test to our own needs. First change the name of the it_works
function to a different name, such as exploration
. Then we’ll add another test, but this time we’ll make a test that fails! Tests fail when something in the test function panics. Each test is run in a new thread, and when the main thread sees that a test thread has died, the test is marked as failed. Enter the new test as a function named another
, so your src/lib.rs file looks like the following listing.
Filename: src/lib.rs
#[cfg(test)]
mod tests {
#[test]
fn exploration() {
assert_eq!(2 + 2, 4);
}
#[test]
fn another() {
panic!("Make this test fail");
}
}
Run the tests again using cargo test
. The output should look like the following listing, which shows that our exploration
test passed and another
failed.
$ cargo test
Compiling adder v0.1.0 (file:///projects/adder)
Finished test [unoptimized + debuginfo] target(s) in 0.72s
Running unittests src/lib.rs (target/debug/deps/adder-92948b65e88960b4)
running 2 tests
test tests::another ... FAILED
test tests::exploration ... ok
failures:
---- tests::another stdout ----
thread 'tests::another' panicked at 'Make this test fail', src/lib.rs:10:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
tests::another
test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
error: test failed, to rerun pass `--lib`
Instead of ok
, the line test tests::another
shows FAILED
. Two new sections appear between the individual results and the summary: the first displays the detailed reason for each test failure. In this case, we get the details that another
failed because it panicked at 'Make this test fail'
on line 10 in the src/lib.rs file. The next section lists just the names of all the failing tests, which is useful when there are lots of tests and lots of detailed failing test output. We can use the name of a failing test to run just that test to more easily debug it.
The summary line displays at the end: overall, our test result is FAILED
. We had one test pass and one test fail.
Checking Results with the assert! Macro
The assert!
macro, provided by the standard library, is useful when you want to ensure that some condition in a test evaluates to true
. We give the assert!
macro an argument that evaluates to a Boolean
.
In the preceding listing, we used a Rectangle
struct and a can_hold
method, which are repeated here. Let’s put this code in the src/lib.rs file, then write some tests for it using the assert!
macro.
Filename: src/lib.rs
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
impl Rectangle {
fn can_hold(&self, other: &Rectangle) -> bool {
self.width > other.width && self.height > other.height
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn larger_can_hold_smaller() {
let larger = Rectangle {
width: 8,
height: 7,
};
let smaller = Rectangle {
width: 5,
height: 1,
};
assert!(larger.can_hold(&smaller));
}
#[test]
fn smaller_cannot_hold_larger() {
let larger = Rectangle {
width: 8,
height: 7,
};
let smaller = Rectangle {
width: 5,
height: 1,
};
assert!(!smaller.can_hold(&larger));
}
}
Note that we’ve added a new line inside the tests module: use super::*;
. The tests module is a regular module that follows the usual visibility rules. Because the tests
module is an inner module, we need to bring the code under test in the outer module into the scope of the inner module. We use a glob here so anything we define in the outer module is available to this tests
module.
$ cargo test
Compiling rectangle v0.1.0 (file:///projects/rectangle)
Finished test [unoptimized + debuginfo] target(s) in 0.66s
Running unittests src/lib.rs (target/debug/deps/rectangle-6584c4561e48942e)
running 2 tests
test tests::larger_can_hold_smaller ... ok
test tests::smaller_cannot_hold_larger ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests rectangle
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Two tests that pass! Now let’s see what happens to our test results when we introduce a bug in our code. We’ll change the implementation of the can_hold
method by replacing the greater-than sign with a less-than sign when it compares the widths:
// --snip--
impl Rectangle {
fn can_hold(&self, other: &Rectangle) -> bool {
self.width < other.width && self.height > other.height
}
}
Running the tests now produces the following:
$ cargo test
Compiling rectangle v0.1.0 (file:///projects/rectangle)
Finished test [unoptimized + debuginfo] target(s) in 0.66s
Running unittests src/lib.rs (target/debug/deps/rectangle-6584c4561e48942e)
running 2 tests
test tests::larger_can_hold_smaller ... FAILED
test tests::smaller_cannot_hold_larger ... ok
failures:
---- tests::larger_can_hold_smaller stdout ----
thread 'tests::larger_can_hold_smaller' panicked at 'assertion failed: larger.can_hold(&smaller)', src/lib.rs:28:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
tests::larger_can_hold_smaller
test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
error: test failed, to rerun pass `--lib`
Our tests caught the bug!
Testing Equality with the assert_eq! and assert_ne! Macros
A common way to verify functionality is to test for equality between the result of the code under test and the value you expect the code to return. However, this is such a common test that the standard library provides a pair of macros—assert_eq!
and assert_ne!
—to perform this test more conveniently. These macros compare two arguments for equality or inequality, respectively. They’ll also print the two values if the assertion fails, which makes it easier to see why the test failed; conversely, the assert!
macro only indicates that it got a false
value for the ==
expression, without printing the values that led to the false
value.
Filename: src/lib.rs
pub fn add_two(a: i32) -> i32 {
a + 3
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_adds_two() {
assert_eq!(4, add_two(2));
}
}
Run the tests:
$ cargo test
Compiling adder v0.1.0 (file:///projects/adder)
Finished test [unoptimized + debuginfo] target(s) in 0.61s
Running unittests src/lib.rs (target/debug/deps/adder-92948b65e88960b4)
running 1 test
test tests::it_adds_two ... FAILED
failures:
---- tests::it_adds_two stdout ----
thread 'tests::it_adds_two' panicked at 'assertion failed: `(left == right)`
left: `4`,
right: `5`', src/lib.rs:11:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
tests::it_adds_two
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
error: test failed, to rerun pass `--lib`
Note that in some languages and test frameworks, the parameters to equality assertion functions are called expected
and actual
, and the order in which we specify the arguments matters. However, in Rust, they’re called left
and right
, and the order in which we specify the value we expect and the value the code produces doesn’t matter.
The assert_ne!
macro will pass if the two values we give it are not equal and fail if they’re equal. This macro is most useful for cases when we’re not sure what a value will be, but we know what the value definitely shouldn’t be.
Under the surface, the assert_eq!
and assert_ne!
macros use the operators ==
and !=
, respectively. When the assertions fail, these macros print their arguments using debug formatting, which means the values being compared must implement the PartialEq
and Debug
traits. All primitive types and most of the standard library types implement these traits. For structs and enums that you define yourself, you’ll need to implement PartialEq
to assert equality of those types. You’ll also need to implement Debug
to print the values when the assertion fails. Because both traits are derivable traits, this is usually as straightforward as adding the #[derive(PartialEq, Debug)]
annotation to your struct or enum definition.
Adding Custom Failure Messages
You can also add a custom message to be printed with the failure message as optional arguments to the assert!
, assert_eq!
, and assert_ne!
macros. Any arguments specified after the required arguments are passed along to the format!
macro, so you can pass a format string that contains {}
placeholders and values to go in those placeholders. Custom messages are useful for documenting what an assertion means; when a test fails, you’ll have a better idea of what the problem is with the code.
For example, let’s say we have a function that greets people by name and we want to test that the name we pass into the function appears in the output:
Filename: src/lib.rs
pub fn greeting(name: &str) -> String {
format!("Hello {}!", name)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn greeting_contains_name() {
let result = greeting("Carol");
assert!(result.contains("Carol"));
}
}
Now let’s introduce a bug into this code by changing greeting to exclude name to see what the default test failure looks like:
pub fn greeting(name: &str) -> String {
String::from("Hello!")
}
Running this test produces the following:
$ cargo test
Compiling greeter v0.1.0 (file:///projects/greeter)
Finished test [unoptimized + debuginfo] target(s) in 0.91s
Running unittests src/lib.rs (target/debug/deps/greeter-170b942eb5bf5e3a)
running 1 test
test tests::greeting_contains_name ... FAILED
failures:
---- tests::greeting_contains_name stdout ----
thread 'tests::greeting_contains_name' panicked at 'assertion failed: result.contains(\"Carol\")', src/lib.rs:12:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
tests::greeting_contains_name
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
error: test failed, to rerun pass `--lib`
This result just indicates that the assertion failed and which line the assertion is on. A more useful failure message would print the value from the greeting
function. Let’s add a custom failure message composed of a format string with a placeholder filled in with the actual value we got from the greeting
function:
#[test]
fn greeting_contains_name() {
let result = greeting("Carol");
assert!(
result.contains("Carol"),
"Greeting did not contain name, value was `{}`",
result
);
}
Now when we run the test, we’ll get a more informative error message:
$ cargo test
Compiling greeter v0.1.0 (file:///projects/greeter)
Finished test [unoptimized + debuginfo] target(s) in 0.93s
Running unittests src/lib.rs (target/debug/deps/greeter-170b942eb5bf5e3a)
running 1 test
test tests::greeting_contains_name ... FAILED
failures:
---- tests::greeting_contains_name stdout ----
thread 'tests::greeting_contains_name' panicked at 'Greeting did not contain name, value was `Hello!`', src/lib.rs:12:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
tests::greeting_contains_name
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
error: test failed, to rerun pass `--lib`
Checking for Panics with should_panic
In addition to checking return values, it’s important to check that our code handles error conditions as we expect.
Filename: src/lib.rs
pub struct Guess {
value: i32,
}
impl Guess {
pub fn new(value: i32) -> Guess {
if value < 1 || value > 100 {
panic!("Guess value must be between 1 and 100, got {}.", value);
}
Guess { value }
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
#[should_panic]
fn greater_than_100() {
Guess::new(200);
}
}
We place the #[should_panic]
attribute after the #[test]
attribute and before the test function it applies to. Let’s look at the result when this test passes:
$ cargo test
Compiling guessing_game v0.1.0 (file:///projects/guessing_game)
Finished test [unoptimized + debuginfo] target(s) in 0.58s
Running unittests src/lib.rs (target/debug/deps/guessing_game-57d70c3acb738f4d)
running 1 test
test tests::greater_than_100 - should panic ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests guessing_game
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Looks good! Now let’s introduce a bug in our code by removing the condition that the new
function will panic if the value is greater than 100:
// --snip--
impl Guess {
pub fn new(value: i32) -> Guess {
if value < 1 {
panic!("Guess value must be between 1 and 100, got {}.", value);
}
Guess { value }
}
}
When we run the test, it will fail:
$ cargo test
Compiling guessing_game v0.1.0 (file:///projects/guessing_game)
Finished test [unoptimized + debuginfo] target(s) in 0.62s
Running unittests src/lib.rs (target/debug/deps/guessing_game-57d70c3acb738f4d)
running 1 test
test tests::greater_than_100 - should panic ... FAILED
failures:
---- tests::greater_than_100 stdout ----
note: test did not panic as expected
failures:
tests::greater_than_100
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
error: test failed, to rerun pass `--lib`
We don’t get a very helpful message in this case, but when we look at the test function, we see that it’s annotated with #[should_panic]
. The failure we got means that the code in the test function did not cause a panic.
Tests that use should_panic
can be imprecise. A should_panic
test would pass even if the test panics for a different reason from the one we were expecting. To make should_panic
tests more precise, we can add an optional expected
parameter to the should_panic
attribute. The test harness will make sure that the failure message contains the provided text.
Filename: src/lib.rs
// --snip--
impl Guess {
pub fn new(value: i32) -> Guess {
if value < 1 {
panic!(
"Guess value must be greater than or equal to 1, got {}.",
value
);
} else if value > 100 {
panic!(
"Guess value must be less than or equal to 100, got {}.",
value
);
}
Guess { value }
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
#[should_panic(expected = "less than or equal to 100")]
fn greater_than_100() {
Guess::new(200);
}
}
This test will pass because the value we put in the should_panic
attribute’s expected
parameter is a substring of the message that the Guess::new
function panics with.
To see what happens when a should_panic
test with an expected
message fails, let’s again introduce a bug into our code by swapping the bodies of the if value < 1
and the else if value > 100
blocks:
if value < 1 {
panic!(
"Guess value must be less than or equal to 100, got {}.",
value
);
} else if value > 100 {
panic!(
"Guess value must be greater than or equal to 1, got {}.",
value
);
}
This time when we run the should_panic
test, it will fail:
$ cargo test
Compiling guessing_game v0.1.0 (file:///projects/guessing_game)
Finished test [unoptimized + debuginfo] target(s) in 0.66s
Running unittests src/lib.rs (target/debug/deps/guessing_game-57d70c3acb738f4d)
running 1 test
test tests::greater_than_100 - should panic ... FAILED
failures:
---- tests::greater_than_100 stdout ----
thread 'tests::greater_than_100' panicked at 'Guess value must be greater than or equal to 1, got 200.', src/lib.rs:13:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
note: panic did not contain expected string
panic message: `"Guess value must be greater than or equal to 1, got 200."`,
expected substring: `"less than or equal to 100"`
failures:
tests::greater_than_100
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
error: test failed, to rerun pass `--lib`
The failure message indicates that this test did indeed panic as we expected, but the panic message did not include the expected string 'Guess value must be less than or equal to 100'
. The panic message that we did get in this case was Guess value must be greater than or equal to 1, got 200.
. Now we can start figuring out where our bug is!
Using Result<T, E>
in Tests
We can also write tests that use Result<T, E>
!
#[cfg(test)]
mod tests {
#[test]
fn it_works() -> Result<(), String> {
if 2 + 2 == 4 {
Ok(())
} else {
Err(String::from("two plus two does not equal four"))
}
}
}
Writing tests so they return a Result<T, E>
enables you to use the question mark operator in the body of tests, which can be a convenient way to write tests that should fail if any operation within them returns an Err
variant.
You can’t use the #[should_panic]
annotation on tests that use Result<T, E>
. To assert that an operation returns an Err
variant, don’t use the question mark operator on the Result<T, E>
value. Instead, use assert!(value.is_err())
.
Controlling How Tests Are Run
cargo test
compiles your code in test mode and runs the resulting test binary. The default behavior of the binary produced by cargo test is to run all the tests in parallel and capture output generated during test runs, preventing the output from being displayed and making it easier to read the output related to the test results. You can, however, specify command line options to change this default behavior.
Some command line options go to cargo test
, and some go to the resulting test binary. To separate these two types of arguments, you list the arguments that go to cargo test
followed by the separator --
and then the ones that go to the test binary. Running cargo test --help
displays the options you can use with cargo test
, and running cargo test -- --help
displays the options you can use after the separator.
Running Tests in Parallel or Consecutively
When you run multiple tests, by default they run in parallel using threads, meaning they finish running faster and you get feedback quicker. Because the tests are running at the same time, you must make sure your tests don’t depend on each other or on any shared state, including a shared environment, such as the current working directory or environment variables.
For example, say each of your tests runs some code that creates a file on disk named test-output.txt and writes some data to that file. Then each test reads the data in that file and asserts that the file contains a particular value, which is different in each test. Because the tests run at the same time, one test might overwrite the file in the time between another test writing and reading the file. The second test will then fail, not because the code is incorrect but because the tests have interfered with each other while running in parallel. One solution is to make sure each test writes to a different file; another solution is to run the tests one at a time.
If you don’t want to run the tests in parallel or if you want more fine-grained control over the number of threads used, you can send the --test-threads
flag and the number of threads you want to use to the test binary. Take a look at the following example:
cargo test -- --test-threads=1
Showing Function Output
By default, if a test passes, Rust’s test library captures anything printed to standard output. For example, if we call println!
in a test and the test passes, we won’t see the println!
output in the terminal; we’ll see only the line that indicates the test passed. If a test fails, we’ll see whatever was printed to standard output with the rest of the failure message.
Filename: src/lib.rs
fn prints_and_returns_10(a: i32) -> i32 {
println!("I got the value {}", a);
10
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn this_test_will_pass() {
let value = prints_and_returns_10(4);
assert_eq!(10, value);
}
#[test]
fn this_test_will_fail() {
let value = prints_and_returns_10(8);
assert_eq!(5, value);
}
}
When we run these tests with cargo test
, we’ll see the following output:
$ cargo test
Compiling silly-function v0.1.0 (file:///projects/silly-function)
Finished test [unoptimized + debuginfo] target(s) in 0.58s
Running unittests src/lib.rs (target/debug/deps/silly_function-160869f38cff9166)
running 2 tests
test tests::this_test_will_fail ... FAILED
test tests::this_test_will_pass ... ok
failures:
---- tests::this_test_will_fail stdout ----
I got the value 8
thread 'tests::this_test_will_fail' panicked at 'assertion failed: `(left == right)`
left: `5`,
right: `10`', src/lib.rs:19:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
tests::this_test_will_fail
test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
error: test failed, to rerun pass `--lib`
Note that nowhere in this output do we see I got the value 4
, which is what is printed when the test that passes runs. That output has been captured.
If we want to see printed values for passing tests as well, we can tell Rust to also show the output of successful tests with --show-output
.
cargo test -- --show-output
$ cargo test -- --show-output
Compiling silly-function v0.1.0 (file:///projects/silly-function)
Finished test [unoptimized + debuginfo] target(s) in 0.60s
Running unittests src/lib.rs (target/debug/deps/silly_function-160869f38cff9166)
running 2 tests
test tests::this_test_will_fail ... FAILED
test tests::this_test_will_pass ... ok
successes:
---- tests::this_test_will_pass stdout ----
I got the value 4
successes:
tests::this_test_will_pass
failures:
---- tests::this_test_will_fail stdout ----
I got the value 8
thread 'tests::this_test_will_fail' panicked at 'assertion failed: `(left == right)`
left: `5`,
right: `10`', src/lib.rs:19:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
tests::this_test_will_fail
test result: FAILED. 1 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
error: test failed, to rerun pass `--lib`
Running a Subset of Tests by Name
Sometimes, running a full test suite can take a long time. If you’re working on code in a particular area, you might want to run only the tests pertaining to that code. You can choose which tests to run by passing cargo test
the name or names of the test(s) you want to run as an argument.
Filename: src/lib.rs
pub fn add_two(a: i32) -> i32 {
a + 2
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn add_two_and_two() {
assert_eq!(4, add_two(2));
}
#[test]
fn add_three_and_two() {
assert_eq!(5, add_two(3));
}
#[test]
fn one_hundred() {
assert_eq!(102, add_two(100));
}
}
If we run the tests without passing any arguments, as we saw earlier, all the tests will run in parallel:
$ cargo test
Compiling adder v0.1.0 (file:///projects/adder)
Finished test [unoptimized + debuginfo] target(s) in 0.62s
Running unittests src/lib.rs (target/debug/deps/adder-92948b65e88960b4)
running 3 tests
test tests::add_three_and_two ... ok
test tests::add_two_and_two ... ok
test tests::one_hundred ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests adder
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running Single Tests
We can pass the name of any test function to cargo test
to run only that test:
$ cargo test one_hundred
Compiling adder v0.1.0 (file:///projects/adder)
Finished test [unoptimized + debuginfo] target(s) in 0.69s
Running unittests src/lib.rs (target/debug/deps/adder-92948b65e88960b4)
running 1 test
test tests::one_hundred ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.00s
The test output lets us know we had more tests that didn’t run by displaying 2 filtered
out at the end.
We can’t specify the names of multiple tests in this way; only the first value given to cargo test
will be used. But there is a way to run multiple tests.
Filtering to Run Multiple Tests
We can specify part of a test name, and any test whose name matches that value will be run. For example, because two of our tests’ names contain add
, we can run those two by running cargo test add
:
$ cargo test add
Compiling adder v0.1.0 (file:///projects/adder)
Finished test [unoptimized + debuginfo] target(s) in 0.61s
Running unittests src/lib.rs (target/debug/deps/adder-92948b65e88960b4)
running 2 tests
test tests::add_three_and_two ... ok
test tests::add_two_and_two ... ok
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.00s
Also note that the module in which a test appears becomes part of the test’s name, so we can run all the tests in a module by filtering on the module’s name.
Ignoring Some Tests Unless Specifically Requested
Sometimes a few specific tests can be very time-consuming to execute, so you might want to exclude them during most runs of cargo test
. Rather than listing as arguments all tests you do want to run, you can instead annotate the time-consuming tests using the ignore
attribute to exclude them, as shown here:
Filename: src/lib.rs
#[test]
fn it_works() {
assert_eq!(2 + 2, 4);
}
#[test]
#[ignore]
fn expensive_test() {
// code that takes an hour to run
}
After #[test]
we add the #[ignore]
line to the test we want to exclude. Now when we run our tests, it_works
runs, but expensive_test
doesn’t:
$ cargo test
Compiling adder v0.1.0 (file:///projects/adder)
Finished test [unoptimized + debuginfo] target(s) in 0.60s
Running unittests src/lib.rs (target/debug/deps/adder-92948b65e88960b4)
running 2 tests
test expensive_test ... ignored
test it_works ... ok
test result: ok. 1 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests adder
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
The expensive_test
function is listed as ignored
. If we want to run only the ignored tests, we can use cargo test -- --ignored
:
$ cargo test -- --ignored
Compiling adder v0.1.0 (file:///projects/adder)
Finished test [unoptimized + debuginfo] target(s) in 0.61s
Running unittests src/lib.rs (target/debug/deps/adder-92948b65e88960b4)
running 1 test
test expensive_test ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.00s
Doc-tests adder
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
By controlling which tests run, you can make sure your cargo test
results will be fast. When you’re at a point where it makes sense to check the results of the ignored
tests and you have time to wait for the results, you can run cargo test -- --ignored
instead. If you want to run all tests whether they’re ignored or not, you can run cargo test -- --include-ignored
.
Test Organization
The Rust community thinks about tests in terms of two main categories: unit tests and integration tests. Unit tests are small and more focused, testing one module in isolation at a time, and can test private interfaces. Integration tests are entirely external to your library and use your code in the same way any other external code would, using only the public interface and potentially exercising multiple modules per test.
Unit Tests
The purpose of unit tests is to test each unit of code in isolation from the rest of the code to quickly pinpoint where code is and isn’t working as expected. You’ll put unit tests in the src directory in each file with the code that they’re testing. The convention is to create a module named tests
in each file to contain the test functions and to annotate the module with cfg(test)
.
The Tests Module and #[cfg(test)]
The #[cfg(test)]
annotation on the tests module tells Rust to compile and run the test code only when you run cargo test
, not when you run cargo build
. This saves compile time when you only want to build the library and saves space in the resulting compiled artifact because the tests are not included. You’ll see that because integration tests go in a different directory, they don’t need the #[cfg(test)]
annotation. However, because unit tests go in the same files as the code, you’ll use #[cfg(test)]
to specify that they shouldn’t be included in the compiled result.
The attribute cfg
stands for configuration and tells Rust that the following item should only be included given a certain configuration option. In this case, the configuration option is test
, which is provided by Rust for compiling and running tests. By using the cfg
attribute, Cargo compiles our test code only if we actively run the tests with cargo test
. This includes any helper functions that might be within this module, in addition to the functions annotated with #[test]
.
Testing Private Functions
Rust’s privacy rules do allow you to test private functions.
Filename: src/lib.rs
pub fn add_two(a: i32) -> i32 {
internal_adder(a, 2)
}
fn internal_adder(a: i32, b: i32) -> i32 {
a + b
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn internal() {
assert_eq!(4, internal_adder(2, 2));
}
}
Note that the internal_adder
function is not marked as pub
. Tests are just Rust code, and the tests
module is just another module. Items in child modules can use the items in their ancestor modules. In this test, we bring all of the test
module’s parent’s items into scope with use super::*
, and then the test can call internal_adder
.
Integration Tests
In Rust, integration tests are entirely external to your library. They use your library in the same way any other code would, which means they can only call functions that are part of your library’s public API. Their purpose is to test whether many parts of your library work together correctly. Units of code that work correctly on their own could have problems when integrated, so test coverage of the integrated code is important as well. To create integration tests, you first need a tests directory.
The tests Directory
We create a tests directory at the top level of our project directory, next to src. Cargo knows to look for integration test files in this directory. We can then make as many test files as we want, and Cargo will compile each of the files as an individual crate.
Let’s create an integration test. Your directory structure should look like this:
adder
├── Cargo.lock
├── Cargo.toml
├── src
│ └── lib.rs
└── tests
└── integration_test.rs
Filename: tests/integration_test.rs
use adder;
#[test]
fn it_adds_two() {
assert_eq!(4, adder::add_two(2));
}
Each file in the tests
directory is a separate crate, so we need to bring our library into each test crate’s scope. For that reason we add use adder
at the top of the code, which we didn’t need in the unit tests.
We don’t need to annotate any code in tests/integration_test.rs with #[cfg(test)]
. Cargo treats the tests
directory specially and compiles files in this directory only when we run cargo test
. Run cargo test
now:
$ cargo test
Compiling adder v0.1.0 (file:///projects/adder)
Finished test [unoptimized + debuginfo] target(s) in 1.31s
Running unittests src/lib.rs (target/debug/deps/adder-1082c4b063a8fbe6)
running 1 test
test tests::internal ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/integration_test.rs (target/debug/deps/integration_test-1082c4b063a8fbe6)
running 1 test
test it_adds_two ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests adder
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
The three sections of output include the unit tests, the integration test, and the doc tests. Note that if any test in a section fails, the following sections will not be run. For example, if a unit test fails, there won’t be any output for integration and doc tests because those tests will only be run if all unit tests are passing.
The first section for the unit tests is the same as we’ve been seeing: one line for each unit test and then a summary line for the unit tests.
The integration tests section starts with the line Running tests/integration_test.rs
. Next, there is a line for each test function in that integration test and a summary line for the results of the integration test just before the Doc-tests adder
section starts.
Each integration test file has its own section, so if we add more files in the tests directory, there will be more integration test sections.
We can still run a particular integration test function by specifying the test function’s name as an argument to cargo test
. To run all the tests in a particular integration test file, use the --test
argument of cargo test
followed by the name of the file:
$ cargo test --test integration_test
Compiling adder v0.1.0 (file:///projects/adder)
Finished test [unoptimized + debuginfo] target(s) in 0.64s
Running tests/integration_test.rs (target/debug/deps/integration_test-82e7799c1bc62298)
running 1 test
test it_adds_two ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Submodules in Integration Tests
As mentioned earlier, each file in the tests directory is compiled as its own separate crate, which is useful for creating separate scopes to more closely imitate the way end users will be using your crate. However, this means files in the tests directory don’t share the same behavior as files in src do, as you learned in previous chapter regarding how to separate code into modules and files.
The different behavior of tests directory files is most noticeable when you have a set of helper functions to use in multiple integration test files and you try to follow the steps in previous chapter to extract them into a common module. For example, if we create tests/common.rs and place a function named setup
in it, we can add some code to setup
that we want to call from multiple test functions in multiple test files:
Filename: tests/common.rs
pub fn setup() {
// setup code specific to your library's tests would go here
}
When we run the tests again, we’ll see a new section in the test output for the common.rs file, even though this file doesn’t contain any test functions nor did we call the setup
function from anywhere:
$ cargo test
Compiling adder v0.1.0 (file:///projects/adder)
Finished test [unoptimized + debuginfo] target(s) in 0.89s
Running unittests src/lib.rs (target/debug/deps/adder-92948b65e88960b4)
running 1 test
test tests::internal ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/common.rs (target/debug/deps/common-92948b65e88960b4)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running tests/integration_test.rs (target/debug/deps/integration_test-92948b65e88960b4)
running 1 test
test it_adds_two ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests adder
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Having common
appear in the test results with running 0 tests
displayed for it is not what we wanted. We just wanted to share some code with the other integration test files.
To avoid having common
appear in the test output, instead of creating tests/common.rs, we’ll create tests/common/mod.rs. The project directory now looks like this:
├── Cargo.lock
├── Cargo.toml
├── src
│ └── lib.rs
└── tests
├── common
│ └── mod.rs
└── integration_test.rs
This is the older naming convention that Rust also understands. Naming the file this way tells Rust not to treat the common
module as an integration test file. Files in subdirectories of the tests directory don’t get compiled as separate crates or have sections in the test output.
After we’ve created tests/common/mod.rs, we can use it from any of the integration test files as a module.
Filename: tests/integration_test.rs
use adder;
mod common;
#[test]
fn it_adds_two() {
common::setup();
assert_eq!(4, adder::add_two(2));
}
Integration Tests for Binary Crates
If our project is a binary crate that only contains a src/main.rs file and doesn’t have a src/lib.rs file, we can’t create integration tests in the tests directory and bring functions defined in the src/main.rs file into scope with a use
statement. Only library crates expose functions that other crates can use; binary crates are meant to be run on their own.
An I/O Project: Building a Command Line Program
Accepting Command Line Arguments
$ cargo new minigrep
Created binary (application) `minigrep` project
$ cd minigrep
The first task is to make minigrep
accept its two command line arguments: the file path and a string to search for. That is, we want to be able to run our program with cargo run
, two hyphens to indicate the following arguments are for our program rather than for cargo
, a string to search for, and a path to a file to search in, like so:
cargo run -- searchstring example-filename.txt
Right now, the program generated by cargo new
cannot process arguments we give it. Some existing libraries on crates.io can help with writing a program that accepts command line arguments, but because you’re just learning this concept, let’s implement this capability ourselves.
Reading the Argument Values
To enable minigrep
to read the values of command line arguments we pass to it, we’ll need the std::env::args
function provided in Rust’s standard library. This function returns an iterator of the command line arguments passed to minigrep
. For now, you only need to know two details about iterators: iterators produce a series of values, and we can call the collect
method on an iterator to turn it into a collection, such as a vector, that contains all the elements the iterator produces.
Filename: src/main.rs
use std::env;
fn main() {
let args: Vec<String> = env::args().collect();
dbg!(args);
}
The args Function and Invalid Unicode
Note that
std::env::args
will panic if any argument contains invalid Unicode. If your program needs to accept arguments containing invalid Unicode, usestd::env::args_os
instead. That function returns an iterator that producesOsString
values instead ofString
values. We’ve chosen to usestd::env::args
here for simplicity, becauseOsString
values differ per platform and are more complex to work with thanString
values.
Although we very rarely need to annotate types in Rust, collect
is one function you do often need to annotate because Rust isn’t able to infer the kind of collection you want.
Let’s try running the code first with no arguments and then with two arguments:
$ cargo run
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished dev [unoptimized + debuginfo] target(s) in 0.61s
Running `target/debug/minigrep`
[src/main.rs:5] args = [
"target/debug/minigrep",
]
$ cargo run -- needle haystack
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished dev [unoptimized + debuginfo] target(s) in 1.57s
Running `target/debug/minigrep needle haystack`
[src/main.rs:5] args = [
"target/debug/minigrep",
"needle",
"haystack",
]
Notice that the first value in the vector is "target/debug/minigrep"
, which is the name of our binary. This matches the behavior of the arguments list in C, letting programs use the name by which they were invoked in their execution. It’s often convenient to have access to the program name in case you want to print it in messages or change behavior of the program based on what command line alias was used to invoke the program.
Saving the Argument Values in Variables
Now we need to save the values of the two arguments in variables so we can use the values throughout the rest of the program.
Filename: src/main.rs
use std::env;
fn main() {
let args: Vec<String> = env::args().collect();
let query = &args[1];
let file_path = &args[2];
println!("Searching for {}", query);
println!("In file {}", file_path);
}
We temporarily print the values of these variables to prove that the code is working as we intend. Let’s run this program again with the arguments test
and sample.txt
:
$ cargo run -- test sample.txt
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished dev [unoptimized + debuginfo] target(s) in 0.0s
Running `target/debug/minigrep test sample.txt`
Searching for test
In file sample.txt
Reading a File
Now we’ll add functionality to read the file specified in the file_path
argument. First, we need a sample file to test it with: we’ll use a file with a small amount of text over multiple lines with some repeated words. Create a file called poem.txt at the root level of your project, and enter the poem “I’m Nobody! Who are you?”
Filename: poem.txt
I'm nobody! Who are you?
Are you nobody, too?
Then there's a pair of us - don't tell!
They'd banish us, you know.
How dreary to be somebody!
How public, like a frog
To tell your name the livelong day
To an admiring bog!
With the text in place, edit src/main.rs and add code to read the file.
Filename: src/main.rs
use std::env;
use std::fs;
fn main() {
// --snip--
println!("In file {}", file_path);
let contents = fs::read_to_string(file_path)
.expect("Should have been able to read the file");
println!("With text:\n{contents}");
}
Let’s run this code with any string as the first command line argument (because we haven’t implemented the searching part yet) and the poem.txt file as the second argument:
$ cargo run -- the poem.txt
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished dev [unoptimized + debuginfo] target(s) in 0.0s
Running `target/debug/minigrep the poem.txt`
Searching for the
In file poem.txt
With text:
I'm nobody! Who are you?
Are you nobody, too?
Then there's a pair of us - don't tell!
They'd banish us, you know.
How dreary to be somebody!
How public, like a frog
To tell your name the livelong day
To an admiring bog!
At the moment, the main
function has multiple responsibilities: generally, functions are clearer and easier to maintain if each function is responsible for only one idea. The other problem is that we’re not handling errors as well as we could. The program is still small, so these flaws aren’t a big problem, but as the program grows, it will be harder to fix them cleanly. It’s good practice to begin refactoring early on when developing a program, because it’s much easier to refactor smaller amounts of code.
Refactoring to Improve Modularity and Error Handling
As a function gains responsibilities, it becomes more difficult to reason about, harder to test, and harder to change without breaking one of its parts. It’s best to separate functionality so each function is responsible for one task.
The longer main becomes, the more variables we’ll need to bring into scope; the more variables we have in scope, the harder it will be to keep track of the purpose of each. It’s best to group the configuration variables into one structure to make their purpose clear.
Reading a file can fail in a number of ways: for example, the file could be missing, or we might not have permission to open it. Right now, regardless of the situation, we’d print the same error message for everything, which wouldn’t give the user any information!
We use expect
repeatedly to handle different errors, and if the user runs our program without specifying enough arguments, they’ll get an index out of bounds
error from Rust that doesn’t clearly explain the problem. It would be best if all the error-handling code were in one place so future maintainers had only one place to consult the code if the error-handling logic needed to change. Having all the error-handling code in one place will also ensure that we’re printing messages that will be meaningful to our end users.
Separation of Concerns for Binary Projects
The Rust community has developed guidelines for splitting the separate concerns of a binary program when main
starts getting large. This process has the following steps:
Split your program into a main.rs and a lib.rs and move your program’s logic to lib.rs.
As long as your command line parsing logic is small, it can remain in main.rs.
When the command line parsing logic starts getting complicated, extract it from main.rs and move it to lib.rs.
The responsibilities that remain in the main
function after this process should be limited to the following:
Calling the command line parsing logic with the argument values
Setting up any other configuration
Calling a
run
function in lib.rsHandling the error if
run
returns an error
This pattern is about separating concerns: main.rs handles running the program, and lib.rs handles all the logic of the task at hand. Because you can’t test the main
function directly, this structure lets you test all of your program’s logic by moving it into functions in lib.rs. The code that remains in main.rs will be small enough to verify its correctness by reading it.
Extracting the Argument Parser
We’ll extract the functionality for parsing arguments into a function that main
will call to prepare for moving the command line parsing logic to src/lib.rs.
Filename: src/main.rs
fn main() {
let args: Vec<String> = env::args().collect();
let (query, file_path) = parse_config(&args);
// --snip--
}
fn parse_config(args: &[String]) -> (&str, &str) {
let query = &args[1];
let file_path = &args[2];
(query, file_path)
}
Grouping Configuration Values
At the moment, we’re returning a tuple, but then we immediately break that tuple into individual parts again. This is a sign that perhaps we don’t have the right abstraction yet.
We’ll instead put the two values into one struct and give each of the struct fields a meaningful name. Doing so will make it easier for future maintainers of this code to understand how the different values relate to each other and what their purpose is.
Filename: src/main.rs
fn main() {
let args: Vec<String> = env::args().collect();
let config = parse_config(&args);
println!("Searching for {}", config.query);
println!("In file {}", config.file_path);
let contents = fs::read_to_string(config.file_path)
.expect("Should have been able to read the file");
// --snip--
}
struct Config {
query: String,
file_path: String,
}
fn parse_config(args: &[String]) -> Config {
let query = args[1].clone();
let file_path = args[2].clone();
Config { query, file_path }
}
The args
variable in main
is the owner of the argument values and is only letting the parse_config
function borrow them, which means we’d violate Rust’s borrowing rules if Config
tried to take ownership of the values in args
.
The Trade-Offs of Using clone
There’s a tendency among many Rustaceans to avoid using
clone
to fix ownership problems because of its runtime cost. But for now, it’s better to have a working program that’s a bit inefficient than to try to hyperoptimize code on your first pass.
Creating a Constructor for Config
So now that the purpose of the parse_config
function is to create a Config
instance, we can change parse_config
from a plain function to a function named new
that is associated with the Config
struct. Making this change will make the code more idiomatic.
Filename: src/main.rs
fn main() {
let args: Vec<String> = env::args().collect();
let config = Config::new(&args);
// --snip--
}
// --snip--
impl Config {
fn new(args: &[String]) -> Config {
let query = args[1].clone();
let file_path = args[2].clone();
Config { query, file_path }
}
}
Fixing the Error Handling
Recall that attempting to access the values in the args
vector at index 1 or index 2 will cause the program to panic if the vector contains fewer than three items.
Try running the program without any arguments. The line index out of bounds: the len is 1 but the index is 1
is an error message intended for programmers. It won’t help our end users understand what they should do instead. Let’s fix that now.
Improving the Error Message
We add a check in the new
function that will verify that the slice is long enough before accessing index 1 and 2.
Filename: src/main.rs
// --snip--
fn new(args: &[String]) -> Config {
if args.len() < 3 {
panic!("not enough arguments");
}
// --snip--
A call to panic!
is more appropriate for a programming problem than a usage problem.
Returning a Result Instead of Calling panic!
We can instead return a Result
value that will contain a Config
instance in the successful case and will describe the problem in the error case. We’re also going to change the function name from new
to build
because many programmers expect new
functions to never fail.
Note that this won’t compile until we update main
as well, which we’ll do in the next listing.
Filename: src/main.rs
impl Config {
fn build(args: &[String]) -> Result<Config, &'static str> {
if args.len() < 3 {
return Err("not enough arguments");
}
let query = args[1].clone();
let file_path = args[2].clone();
Ok(Config { query, file_path })
}
}
Calling Config::build and Handling Errors
To handle the error case and print a user-friendly message, we need to update main
to handle the Result
being returned by Config::build
. We’ll also take the responsibility of exiting the command line tool with a nonzero error code away from panic!
and instead implement it by hand. A nonzero exit status is a convention to signal to the process that called our program that the program exited with an error state.
Filename: src/main.rs
use std::process;
fn main() {
let args: Vec<String> = env::args().collect();
let config = Config::build(&args).unwrap_or_else(|err| {
println!("Problem parsing arguments: {err}");
process::exit(1);
});
// --snip--
Using unwrap_or_else
allows us to define some custom, non-panic!
error handling. If the Result
is an Ok
value, this method’s behavior is similar to unwrap
: it returns the inner value Ok
is wrapping. However, if the value is an Err
value, this method calls the code in the closure, which is an anonymous function we define and pass as an argument to unwrap_or_else
.
The process::exit
function will stop the program immediately and return the number that was passed as the exit status code. This is similar to the panic!
-based handling we used, but we no longer get all the extra output. Let’s try it:
$ cargo run
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished dev [unoptimized + debuginfo] target(s) in 0.48s
Running `target/debug/minigrep`
Problem parsing arguments: not enough arguments
Extracting Logic from main
We’ll extract a function named run
that will hold all the logic currently in the main
function that isn’t involved with setting up configuration or handling errors. When we’re done, main
will be concise and easy to verify by inspection, and we’ll be able to write tests for all the other logic.
For now, we’re just making the small, incremental improvement of extracting the function. We’re still defining the function in src/main.rs.
Filename: src/main.rs
fn main() {
// --snip--
println!("Searching for {}", config.query);
println!("In file {}", config.file_path);
run(config);
}
fn run(config: Config) {
let contents = fs::read_to_string(config.file_path)
.expect("Should have been able to read the file");
println!("With text:\n{contents}");
}
// --snip--
Returning Errors from the run Function
Instead of allowing the program to panic by calling expect
, the run
function will return a Result<T, E>
when something goes wrong.
Filename: src/main.rs
use std::error::Error;
// --snip--
fn run(config: Config) -> Result<(), Box<dyn Error>> {
let contents = fs::read_to_string(config.file_path)?;
println!("With text:\n{contents}");
Ok(())
}
For now, just know that Box<dyn Error>
means the function will return a type that implements the Error
trait, but we don’t have to specify what particular type the return value will be. This gives us flexibility to return error values that may be of different types in different error cases. The dyn
keyword is short for “dynamic.”
We’ve removed the call to expect
in favor of the ?
operator. Rather than panic!
on an error, ?
will return the error value from the current function for the caller to handle.
This Ok(())
syntax might look a bit strange at first, but using ()
like this is the idiomatic way to indicate that we’re calling run
for its side effects only; it doesn’t return a value we need.
When you run this code, it will compile but will display a warning:
$ cargo run the poem.txt
Compiling minigrep v0.1.0 (file:///projects/minigrep)
warning: unused `Result` that must be used
--> src/main.rs:19:5
|
19 | run(config);
| ^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
= note: `#[warn(unused_must_use)]` on by default
warning: `minigrep` (bin "minigrep") generated 1 warning
Finished dev [unoptimized + debuginfo] target(s) in 0.71s
Running `target/debug/minigrep the poem.txt`
Searching for the
In file poem.txt
With text:
I'm nobody! Who are you?
Are you nobody, too?
Then there's a pair of us - don't tell!
They'd banish us, you know.
How dreary to be somebody!
How public, like a frog
To tell your name the livelong day
To an admiring bog!
Rust tells us that our code ignored the Result value and the Result value might indicate that an error occurred. But we’re not checking to see whether or not there was an error, and the compiler reminds us that we probably meant to have some error-handling code here!
Handling Errors Returned from run in main
Filename: src/main.rs
fn main() {
// --snip--
println!("Searching for {}", config.query);
println!("In file {}", config.file_path);
if let Err(e) = run(config) {
println!("Application error: {e}");
process::exit(1);
}
}
We use if let
rather than unwrap_or_else
to check whether run
returns an Err
value and call process::exit(1)
if it does. The run
function doesn’t return a value that we want to unwrap
in the same way that Config::build
returns the Config
instance. Because run
returns ()
in the success case, we only care about detecting an error, so we don’t need unwrap_or_else
to return the unwrapped value, which would only be ()
.
Splitting Code into a Library Crate
Let’s move all the code that isn’t the main
function from src/main.rs to src/lib.rs:
The
run
function definitionThe relevant
use
statementsThe definition of
Config
The
Config::build
function definition
Filename: src/lib.rs
use std::error::Error;
use std::fs;
pub struct Config {
pub query: String,
pub file_path: String,
}
impl Config {
pub fn build(args: &[String]) -> Result<Config, &'static str> {
// --snip--
}
}
pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
// --snip--
}
Now we need to bring the code we moved to src/lib.rs into the scope of the binary crate in src/main.rs.
Filename: src/main.rs
use std::env;
use std::process;
use minigrep::Config;
fn main() {
// --snip--
if let Err(e) = minigrep::run(config) {
// --snip--
}
}
Developing the Library’s Functionality with Test-Driven Development
In this section, we’ll add the searching logic to the minigrep
program using the test-driven development (TDD) process with the following steps:
Write a test that fails and run it to make sure it fails for the reason you expect.
Write or modify just enough code to make the new test pass.
Refactor the code you just added or changed and make sure the tests continue to pass.
Repeat from step 1!
Writing a Failing Test
Because we don’t need them anymore, let’s remove the println!
statements from src/lib.rs and src/main.rs that we used to check the program’s behavior. Then, in src/lib.rs, add a tests
module with a test function. The test function specifies the behavior we want the search
function to have: it will take a query and the text to search, and it will return only the lines from the text that contain the query. The following listing shows this test, which won’t compile yet.
Filename: src/lib.rs
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn one_result() {
let query = "duct";
let contents = "\
Rust:
safe, fast, productive.
Pick three.";
assert_eq!(vec!["safe, fast, productive."], search(query, contents));
}
}
Note that the backslash after the opening double quote tells Rust not to put a newline character at the beginning of the contents of this string literal.
In accordance with TDD principles, we’ll add just enough code to get the test to compile and run by adding a definition of the search
function that always returns an empty vector. Then the test should compile and fail because an empty vector doesn’t match a vector containing the line "safe, fast, productive."
Filename: src/lib.rs
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
vec![]
}
Notice that we need to define an explicit lifetime 'a
in the signature of search
and use that lifetime with the contents
argument and the return value.
In other words, we tell Rust that the data returned by the search function will live as long as the data passed into the search function in the contents argument.
Now let’s run the test:
$ cargo test
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished test [unoptimized + debuginfo] target(s) in 0.97s
Running unittests src/lib.rs (target/debug/deps/minigrep-9cd200e5fac0fc94)
running 1 test
test tests::one_result ... FAILED
failures:
---- tests::one_result stdout ----
thread 'tests::one_result' panicked at 'assertion failed: `(left == right)`
left: `["safe, fast, productive."]`,
right: `[]`', src/lib.rs:44:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
tests::one_result
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
error: test failed, to rerun pass `--lib`
Great, the test fails, exactly as we expected. Let’s get the test to pass!
Writing Code to Pass the Test
Currently, our test is failing because we always return an empty vector. To fix that and implement search
, our program needs to follow these steps:
Iterate through each line of the contents.
Check whether the line contains our query string.
If it does, add it to the list of values we’re returning.
If it doesn’t, do nothing.
Return the list of results that match.
Iterating Through Lines with the lines Method
Rust has a helpful method to handle line-by-line iteration of strings, conveniently named lines
. Note this won’t compile yet.
Filename: src/lib.rs
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
for line in contents.lines() {
// do something with line
}
}
Searching Each Line for the Query
Fortunately, strings have a helpful method named contains
that check whether the current line contains our query string for us! Note this still won’t compile yet.
Filename: src/lib.rs
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
for line in contents.lines() {
if line.contains(query) {
// do something with line
}
}
}
Storing Matching Lines
To finish this function, we need a way to store the matching lines that we want to return. For that, we can make a mutable vector before the for
loop and call the push
method to store a line
in the vector. After the for
loop, we return the vector, as shown in the following listing.
Filename: src/lib.rs
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
let mut results = Vec::new();
for line in contents.lines() {
if line.contains(query) {
results.push(line);
}
}
results
}
Now the search
function should return only the lines that contain query
, and our test should pass. Let’s run the test:
$ cargo test
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished test [unoptimized + debuginfo] target(s) in 1.22s
Running unittests src/lib.rs (target/debug/deps/minigrep-9cd200e5fac0fc94)
running 1 test
test tests::one_result ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/main.rs (target/debug/deps/minigrep-9cd200e5fac0fc94)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests minigrep
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
At this point, we could consider opportunities for refactoring the implementation of the search function while keeping the tests passing to maintain the same functionality.
Using the search Function in the run Function
We need to pass the config.query
value and the contents
that run
reads from the file to the search
function. Then run
will print each line returned from search
:
Filename: src/lib.rs
pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
let contents = fs::read_to_string(config.file_path)?;
for line in search(&config.query, &contents) {
println!("{line}");
}
Ok(())
}
Now the entire program should work! Let’s try it out:
$ cargo run -- frog poem.txt
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished dev [unoptimized + debuginfo] target(s) in 0.38s
Running `target/debug/minigrep frog poem.txt`
How public, like a frog
$ cargo run -- body poem.txt
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished dev [unoptimized + debuginfo] target(s) in 0.0s
Running `target/debug/minigrep body poem.txt`
I'm nobody! Who are you?
Are you nobody, too?
How dreary to be somebody!
And finally, let’s make sure that we don’t get any lines when we search for a word that isn’t anywhere in the poem:
$ cargo run -- monomorphization poem.txt
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished dev [unoptimized + debuginfo] target(s) in 0.0s
Running `target/debug/minigrep monomorphization poem.txt`
Working with Environment Variables
We’ll improve minigrep
by adding an extra feature: an option for case-insensitive searching that the user can turn on via an environment variable.
Writing a Failing Test for the Case-Insensitive search Function
We first add a new search_case_insensitive
function that will be called when the environment variable has a value. We’ll continue to follow the TDD process, so the first step is again to write a failing test. We’ll add a new test for the new search_case_insensitive
function and rename our old test from one_result
to case_sensitive
to clarify the differences between the two tests.
Filename: src/lib.rs
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn case_sensitive() {
let query = "duct";
let contents = "\
Rust:
safe, fast, productive.
Pick three.
Duct tape.";
assert_eq!(vec!["safe, fast, productive."], search(query, contents));
}
#[test]
fn case_insensitive() {
let query = "rUsT";
let contents = "\
Rust:
safe, fast, productive.
Pick three.
Trust me.";
assert_eq!(
vec!["Rust:", "Trust me."],
search_case_insensitive(query, contents)
);
}
}
Implementing the search_case_insensitive Function
The search_case_insensitive
function will be almost the same as the search
function. The only difference is that we’ll lowercase the query
and each line
so whatever the case of the input arguments, they’ll be the same case when we check whether the line contains the query.
Filename: src/lib.rs
pub fn search_case_insensitive<'a>(
query: &str,
contents: &'a str,
) -> Vec<&'a str> {
let query = query.to_lowercase();
let mut results = Vec::new();
for line in contents.lines() {
if line.to_lowercase().contains(&query) {
results.push(line);
}
}
results
}
Note that query
is now a String
rather than a string slice, because calling to_lowercase
creates new data rather than referencing existing data.
First, we’ll add a configuration option to the Config
struct to switch between case-sensitive and case-insensitive search. Adding this field will cause compiler errors because we aren’t initializing this field anywhere yet:
Filename: src/lib.rs
pub struct Config {
pub query: String,
pub file_path: String,
pub ignore_case: bool,
}
Next, we need the run
function to check the ignore_case
field’s value and use that to decide whether to call the search
function or the search_case_insensitive
function. This still won’t compile yet.
Filename: src/lib.rs
pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
let contents = fs::read_to_string(config.file_path)?;
let results = if config.ignore_case {
search_case_insensitive(&config.query, &contents)
} else {
search(&config.query, &contents)
};
for line in results {
println!("{line}");
}
Ok(())
}
Finally, we need to check for the environment variable. The functions for working with environment variables are in the env
module in the standard library, so we bring that module into scope at the top of src/lib.rs. Then we’ll use the var
function from the env
module to check to see if any value has been set for an environment variable named IGNORE_CASE
.
Filename: src/lib.rs
use std::env;
// --snip--
impl Config {
pub fn build(args: &[String]) -> Result<Config, &'static str> {
if args.len() < 3 {
return Err("not enough arguments");
}
let query = args[1].clone();
let file_path = args[2].clone();
let ignore_case = env::var("IGNORE_CASE").is_ok();
Ok(Config {
query,
file_path,
ignore_case,
})
}
}
The env::var
function returns a Result
that will be the successful Ok
variant that contains the value of the environment variable if the environment variable is set to any value. It will return the Err
variant if the environment variable is not set.
We don’t care about the value of the environment variable, just whether it’s set or unset, so we’re checking is_ok
rather than using unwrap
, expect
, or any of the other methods we’ve seen on Result
.
Let’s give it a try! First, we’ll run our program without the environment variable set and with the query to
, which should match any line that contains the word “to” in all lowercase:
$ cargo run -- to poem.txt
Compiling minigrep v0.1.0 (file:///projects/minigrep)
Finished dev [unoptimized + debuginfo] target(s) in 0.0s
Running `target/debug/minigrep to poem.txt`
Are you nobody, too?
How dreary to be somebody!
Now, let’s run the program with IGNORE_CASE
set to 1but with the same query
to`.
IGNORE_CASE=1 cargo run -- to poem.txt
If you’re using PowerShell, you will need to set the environment variable and run the program as separate commands:
PS> $Env:IGNORE_CASE=1; cargo run -- to poem.txt
This will make IGNORE_CASE
persist for the remainder of your shell session. It can be unset with the Remove
-Item cmdlet:
PS> Remove-Item Env:IGNORE_CASE
We should get lines that contain “to” that might have uppercase letters:
Are you nobody, too?
How dreary to be somebody!
To tell your name the livelong day
To an admiring bog!
Writing Error Messages to Standard Error Instead of Standard Output
In most terminals, there are two kinds of output: standard output (stdout
) for general information and standard error (stderr
) for error messages. This distinction enables users to choose to direct the successful output of a program to a file but still print error messages to the screen.
The println!
macro is only capable of printing to standard output, so we have to use something else to print to standard error.
Checking Where Errors Are Written
Command line programs are expected to send error messages to the standard error stream so we can still see error messages on the screen even if we redirect the standard output stream to a file.
To demonstrate this behavior, we’ll run the program with >
and the file path, output.txt, that we want to redirect the standard output stream to. We won’t pass any arguments, which should cause an error:
cargo run > output.txt
The >
syntax tells the shell to write the contents of standard output to output.txt instead of the screen. We didn’t see the error message we were expecting printed to the screen, so that means it must have ended up in the file. This is what output.txt contains:
Problem parsing arguments: not enough arguments
Printing Errors to Standard Error
The standard library provides the eprintln!
macro that prints to the standard error stream, so let’s change the two places we were calling println!
to print errors to use eprintln!
instead.
Filename: src/main.rs
fn main() {
let args: Vec<String> = env::args().collect();
let config = Config::build(&args).unwrap_or_else(|err| {
eprintln!("Problem parsing arguments: {err}");
process::exit(1);
});
if let Err(e) = minigrep::run(config) {
eprintln!("Application error: {e}");
process::exit(1);
}
}
Let’s now run the program again in the same way, without any arguments and redirecting standard output with >
:
$ cargo run > output.txt
Problem parsing arguments: not enough arguments
Now we see the error onscreen and output.txt contains nothing, which is the behavior we expect of command line programs.
Let’s run the program again with arguments that don’t cause an error but still redirect standard output to a file, like so:
cargo run -- to poem.txt > output.txt
We won’t see any output to the terminal, and output.txt will contain our results:
Filename: output.txt
Are you nobody, too?
How dreary to be somebody!
Functional Language Features: Iterators and Closures
Closures: Anonymous Functions that Capture Their Environment
Rust’s closures are anonymous functions you can save in a variable or pass as arguments to other functions. You can create the closure in one place and then call the closure elsewhere to evaluate it in a different context. Unlike functions, closures can capture values from the scope in which they’re defined.
Capturing the Environment with Closures
We’ll first examine how we can use closures to capture values from the environment they’re defined in for later use. Here’s the scenario: Every so often, our t-shirt company gives away an exclusive, limited-edition shirt to someone on our mailing list as a promotion. People on the mailing list can optionally add their favorite color to their profile. If the person chosen for a free shirt has their favorite color set, they get that color shirt. If the person hasn’t specified a favorite color, they get whatever color the company currently has the most of.
Filename: src/main.rs
#[derive(Debug, PartialEq, Copy, Clone)]
enum ShirtColor {
Red,
Blue,
}
struct Inventory {
shirts: Vec<ShirtColor>,
}
impl Inventory {
fn giveaway(&self, user_preference: Option<ShirtColor>) -> ShirtColor {
user_preference.unwrap_or_else(|| self.most_stocked())
}
fn most_stocked(&self) -> ShirtColor {
let mut num_red = 0;
let mut num_blue = 0;
for color in &self.shirts {
match color {
ShirtColor::Red => num_red += 1,
ShirtColor::Blue => num_blue += 1,
}
}
if num_red > num_blue {
ShirtColor::Red
} else {
ShirtColor::Blue
}
}
}
fn main() {
let store = Inventory {
shirts: vec![ShirtColor::Blue, ShirtColor::Red, ShirtColor::Blue],
};
let user_pref1 = Some(ShirtColor::Red);
let giveaway1 = store.giveaway(user_pref1);
println!(
"The user with preference {:?} gets {:?}",
user_pref1, giveaway1
);
let user_pref2 = None;
let giveaway2 = store.giveaway(user_pref2);
println!(
"The user with preference {:?} gets {:?}",
user_pref2, giveaway2
);
}
// output
// The user with preference Some(Red) gets Red
// The user with preference None gets Blue
We specify the closure expression || self.most_stocked()
as the argument to unwrap_or_else
. This is a closure that takes no parameters itself (if the closure had parameters, they would appear between the two vertical bars). The body of the closure calls self.most_stocked()
.
Closure Type Inference and Annotation
There are more differences between functions and closures. Closures don’t usually require you to annotate the types of the parameters or the return value like fn
functions do. Type annotations are required on functions because the types are part of an explicit interface exposed to your users.
Closures are typically short and relevant only within a narrow context rather than in any arbitrary scenario. Within these limited contexts, the compiler can infer the types of the parameters and the return type, similar to how it’s able to infer the types of most variables (there are rare cases where the compiler needs closure type annotations too).
As with variables, we can add type annotations if we want to increase explicitness and clarity at the cost of being more verbose than is strictly necessary.
Filename: src/main.rs
let expensive_closure = |num: u32| -> u32 {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(2));
num
};
With type annotations added, the syntax of closures looks more similar to the syntax of functions. Here we define a function that adds 1 to its parameter and a closure that has the same behavior, for comparison.
fn add_one_v1 (x: u32) -> u32 { x + 1 }
let add_one_v2 = |x: u32| -> u32 { x + 1 };
let add_one_v3 = |x| { x + 1 };
let add_one_v4 = |x| x + 1 ;
The add_one_v3
and add_one_v4
lines require the closures to be evaluated to be able to compile because the types will be inferred from their usage. This is similar to let v = Vec::new()
; needing either type annotations or values of some type to be inserted into the Vec
for Rust to be able to infer the type.
We can call the closure with any type, which we’ve done here with String
the first time. If we then try to call example_closure
with an integer, we’ll get an error.
Filename: src/main.rs
let example_closure = |x| x;
let s = example_closure(String::from("hello"));
let n = example_closure(5); // error! mismatched types
The first time we call example_closure
with the String
value, the compiler infers the type of x
and the return type of the closure to be String
. Those types are then locked into the closure in example_closure
, and we get a type error when we next try to use a different type with the same closure.
Capturing References or Moving Ownership
Closures can capture values from their environment in three ways, which directly map to the three ways a function can take a parameter: borrowing immutably, borrowing mutably, and taking ownership. The closure will decide which of these to use based on what the body of the function does with the captured values.
In the following listing, we define a closure that captures an immutable reference to the vector named list because it only needs an immutable reference to print the value:
Filename: src/main.rs
fn main() {
let list = vec![1, 2, 3];
println!("Before defining closure: {:?}", list);
let only_borrows = || println!("From closure: {:?}", list);
println!("Before calling closure: {:?}", list);
only_borrows();
println!("After calling closure: {:?}", list);
}
// output
// Before defining closure: [1, 2, 3]
// Before calling closure: [1, 2, 3]
// From closure: [1, 2, 3]
// After calling closure: [1, 2, 3]
This example also illustrates that a variable can bind to a closure definition, and we can later call the closure by using the variable name and parentheses as if the variable name were a function name.
Next, in the following listing, we change the closure body so that it adds an element to the list
vector. The closure now captures a mutable reference:
Filename: src/main.rs
fn main() {
let mut list = vec![1, 2, 3];
println!("Before defining closure: {:?}", list);
let mut borrows_mutably = || list.push(7);
borrows_mutably();
println!("After calling closure: {:?}", list);
}
// output
// Before defining closure: [1, 2, 3]
// After calling closure: [1, 2, 3, 7]
Note that there’s no longer a println!
between the definition and the call of the borrows_mutably
closure: when borrows_mutably
is defined, it captures a mutable reference to list
. Between the closure definition and the closure call, an immutable borrow to print isn’t allowed because no other borrows are allowed when there’s a mutable borrow.
If you want to force the closure to take ownership of the values it uses in the environment even though the body of the closure doesn’t strictly need ownership, you can use the move
keyword before the parameter list.
This technique is mostly useful when passing a closure to a new thread to move the data so that it’s owned by the new thread.
Filename: src/main.rs
use std::thread;
fn main() {
let list = vec![1, 2, 3];
println!("Before defining closure: {:?}", list);
thread::spawn(move || println!("From thread: {:?}", list))
.join()
.unwrap();
}
The new thread might finish before the rest of the main thread finishes, or the main thread might finish first. If the main thread maintained ownership of list
but ended before the new thread did and dropped list
, the immutable reference in the thread would be invalid. Therefore, the compiler requires that list
be moved into the closure given to the new thread so the reference will be valid.
Moving Captured Values Out of Closures and the Fn Traits
A closure body can do any of the following: move a captured value out of the closure, mutate the captured value, neither move nor mutate the value, or capture nothing from the environment to begin with.
Closures will automatically implement one, two, or all three of these Fn
traits, in an additive fashion, depending on how the closure’s body handles the values:
FnOnce
applies to closures that can be called once. All closures implement at least this trait, because all closures can be called. A closure that moves captured values out of its body will only implementFnOnce
and none of the otherFn
traits, because it can only be called once.FnMut
applies to closures that don’t move captured values out of their body, but that might mutate the captured values. These closures can be called more than once.Fn
applies to closures that don’t move captured values out of their body and that don’t mutate captured values, as well as closures that capture nothing from their environment. These closures can be called more than once without mutating their environment, which is important in cases such as calling a closure multiple times concurrently.
Let’s look at the definition of the unwrap_or_else
method on Option<T>
:
impl<T> Option<T> {
pub fn unwrap_or_else<F>(self, f: F) -> T
where
F: FnOnce() -> T
{
match self {
Some(x) => x,
None => f(),
}
}
}
Using FnOnce
in the trait bound expresses the constraint that unwrap_or_else
is only going to call f
at most one time.
Note: Functions can implement all three of the
Fn
traits too. If what we want to do doesn’t require capturing a value from the environment, we can use the name of a function rather than a closure where we need something that implements one of theFn
traits. For example, on anOption<Vec<T>>
value, we could callunwrap_or_else(Vec::new)
to get a new, empty vector if the value isNone
.
Now let’s look at the standard library method sort_by_key
defined on slices, to see how that differs from unwrap_or_else
and why sort_by_key
uses FnMut
instead of FnOnce
for the trait bound.
Filename: src/main.rs
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
fn main() {
let mut list = [
Rectangle { width: 10, height: 1 },
Rectangle { width: 3, height: 5 },
Rectangle { width: 7, height: 12 },
];
list.sort_by_key(|r| r.width);
println!("{:#?}", list);
}
// output
// [
// Rectangle {
// width: 3,
// height: 5,
// },
// Rectangle {
// width: 7,
// height: 12,
// },
// Rectangle {
// width: 10,
// height: 1,
// },
// ]
The reason sort_by_key
is defined to take an FnMut
closure is that it calls the closure multiple times: once for each item in the slice. The closure |r| r.width
doesn’t capture, mutate, or move out anything from its environment, so it meets the trait bound requirements.
In contrast, the following listing shows an example of a closure that implements just the FnOnce
trait, because it moves a value out of the environment. The compiler won’t let us use this closure with sort_by_key
:
Filename: src/main.rs
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
fn main() {
let mut list = [
Rectangle { width: 10, height: 1 },
Rectangle { width: 3, height: 5 },
Rectangle { width: 7, height: 12 },
];
let mut sort_operations = vec![];
let value = String::from("by key called");
list.sort_by_key(|r| {
sort_operations.push(value); // error! cannot move out of `value`, a captured variable in an `FnMut` closure
r.width
});
println!("{:#?}", list);
}
This code attempts to do this counting by pushing value
—a String
from the closure’s environment—into the sort_operations
vector. The closure captures value
then moves value
out of the closure by transferring ownership of value
to the sort_operations
vector. This closure can be called once; trying to call it a second time wouldn’t work because value
would no longer be in the environment to be pushed into sort_operations
again! Therefore, this closure only implements FnOnce
.
The closure in the following listing works with sort_by_key
because it is only capturing a mutable reference to the num_sort_operations
counter and can therefore be called more than once:
Filename: src/main.rs
#[derive(Debug)]
struct Rectangle {
width: u32,
height: u32,
}
fn main() {
let mut list = [
Rectangle { width: 10, height: 1 },
Rectangle { width: 3, height: 5 },
Rectangle { width: 7, height: 12 },
];
let mut num_sort_operations = 0;
list.sort_by_key(|r| {
num_sort_operations += 1;
r.width
});
println!("{:#?}, sorted in {num_sort_operations} operations", list);
}
The Fn
traits are important when defining or using functions or types that make use of closures.
Processing a Series of Items with Iterators
In Rust, iterators are lazy, meaning they have no effect until you call methods that consume the iterator to use it up. For example, the code in the following listing creates an iterator over the items in the vector v1
by calling the iter
method defined on Vec<T>
. This code by itself doesn’t do anything useful. When the for
loop is called using the iterator in v1_iter
, each element in the iterator is used in one iteration of the loop, which prints out each value.
let v1 = vec![1, 2, 3];
let v1_iter = v1.iter();
for val in v1_iter {
println!("Got: {}", val);
}
The Iterator Trait and the next Method
All iterators implement a trait named Iterator that is defined in the standard library. The definition of the trait looks like this:
pub trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
// methods with default implementations elided
}
Notice this definition uses some new syntax: type Item
and Self::Item
, which are defining an associated type with this trait. For now, all you need to know is that this code says implementing the Iterator
trait requires that you also define an Item
type, and this Item
type is used in the return type of the next
method. In other words, the Item
type will be the type returned from the iterator.
The Iterator
trait only requires implementors to define one method: the next
method, which returns one item of the iterator at a time wrapped in Some
and, when iteration is over, returns None
.
We can call the next method on iterators directly:
Filename: src/lib.rs
#[test]
fn iterator_demonstration() {
let v1 = vec![1, 2, 3];
let mut v1_iter = v1.iter();
assert_eq!(v1_iter.next(), Some(&1));
assert_eq!(v1_iter.next(), Some(&2));
assert_eq!(v1_iter.next(), Some(&3));
assert_eq!(v1_iter.next(), None);
}
Note that we needed to make v1_iter
mutable: calling the next
method on an iterator changes internal state that the iterator uses to keep track of where it is in the sequence. In other words, this code consumes, or uses up, the iterator. Each call to next
eats up an item from the iterator. We didn’t need to make v1_iter
mutable when we used a for
loop because the loop took ownership of v1_iter
and made it mutable behind the scenes.
Also note that the values we get from the calls to next
are immutable references to the values in the vector. The iter
method produces an iterator over immutable references. If we want to create an iterator that takes ownership of v1
and returns owned values, we can call into_iter
instead of iter
. Similarly, if we want to iterate over mutable references, we can call iter_mut
instead of iter
.
Methods that Consume the Iterator
Methods that call next
are called consuming adaptors, because calling them uses up the iterator. One example is the sum
method, which takes ownership of the iterator and iterates through the items by repeatedly calling next
, thus consuming the iterator. As it iterates through, it adds each item to a running total and returns the total when iteration is complete. The following listing has a test illustrating a use of the sum
method:
Filename: src/lib.rs
#[test]
fn iterator_sum() {
let v1 = vec![1, 2, 3];
let v1_iter = v1.iter();
let total: i32 = v1_iter.sum();
assert_eq!(total, 6);
}
We aren’t allowed to use v1_iter
after the call to sum
because sum
takes ownership of the iterator we call it on.
Methods that Produce Other Iterators
Iterator adaptors are methods defined on the Iterator
trait that don’t consume the iterator. Instead, they produce different iterators by changing some aspect of the original iterator.
The following listing shows an example of calling the iterator adaptor method map
, which takes a closure to call on each item as the items are iterated through. The map
method returns a new iterator that produces the modified items.
Filename: src/main.rs
let v1: Vec<i32> = vec![1, 2, 3];
v1.iter().map(|x| x + 1); // error! unused `Map` that must be used
To fix this warning and consume the iterator, we’ll use the collect
method. This method consumes the iterator and collects the resulting values into a collection data type.
Filename: src/main.rs
let v1: Vec<i32> = vec![1, 2, 3];
let v2: Vec<_> = v1.iter().map(|x| x + 1).collect();
assert_eq!(v2, vec![2, 3, 4]);
Using Closures that Capture Their Environment
For this example, we’ll use the filter
method that takes a closure. The closure gets an item from the iterator and returns a bool
. If the closure returns true
, the value will be included in the iteration produced by filter
. If the closure returns false
, the value won’t be included.
Filename: src/lib.rs
#[derive(PartialEq, Debug)]
struct Shoe {
size: u32,
style: String,
}
fn shoes_in_size(shoes: Vec<Shoe>, shoe_size: u32) -> Vec<Shoe> {
shoes.into_iter().filter(|s| s.size == shoe_size).collect()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn filters_by_size() {
let shoes = vec![
Shoe {
size: 10,
style: String::from("sneaker"),
},
Shoe {
size: 13,
style: String::from("sandal"),
},
Shoe {
size: 10,
style: String::from("boot"),
},
];
let in_my_size = shoes_in_size(shoes, 10);
assert_eq!(
in_my_size,
vec![
Shoe {
size: 10,
style: String::from("sneaker")
},
Shoe {
size: 10,
style: String::from("boot")
},
]
);
}
}
In the body of shoes_in_size
, we call into_iter
to create an iterator that takes ownership of the vector. Then we call filter
to adapt that iterator into a new iterator that only contains elements for which the closure returns true
.
The closure captures the shoe_size
parameter from the environment and compares the value with each shoe’s size, keeping only shoes of the size specified.
Finally, calling collect
gathers the values returned by the adapted iterator into a vector that’s returned by the function.
Improving Our I/O Project
Removing a clone Using an Iterator
Filename: src/lib.rs
impl Config {
pub fn build(args: &[String]) -> Result<Config, &'static str> {
if args.len() < 3 {
return Err("not enough arguments");
}
let query = args[1].clone();
let file_path = args[2].clone();
let ignore_case = env::var("IGNORE_CASE").is_ok();
Ok(Config {
query,
file_path,
ignore_case,
})
}
}
With our new knowledge about iterators, we can change the build
function to take ownership of an iterator as its argument instead of borrowing a slice.
Using the Returned Iterator Directly
Filename: src/main.rs
fn main() {
let args: Vec<String> = env::args().collect();
let config = Config::build(&args).unwrap_or_else(|err| {
eprintln!("Problem parsing arguments: {err}");
process::exit(1);
});
// --snip--
}
We’ll first change the start of the main
function, which this time uses an iterator. This won’t compile until we update Config::build
as well.
Filename: src/main.rs
fn main() {
let config = Config::build(env::args()).unwrap_or_else(|err| {
eprintln!("Problem parsing arguments: {err}");
process::exit(1);
});
// --snip--
}
The env::args
function returns an iterator! Rather than collecting the iterator values into a vector and then passing a slice to Config::build
, now we’re passing ownership of the iterator returned from env::args
to Config::build
directly.
In your I/O project’s src/lib.rs file, let’s change the signature of Config::build to look like the following listing. This still won’t compile because we need to update the function body.
Filename: src/lib.rs
impl Config {
pub fn build(
mut args: impl Iterator<Item = String>,
) -> Result<Config, &'static str> {
// --snip--
The standard library documentation for the env::args
function shows that the type of the iterator it returns is std::env::Args
, and that type implements the Iterator
trait and returns String
values. (I think we can use std::env::Args
directly)
Using Iterator Trait Methods Instead of Indexing
Filename: src/lib.rs
impl Config {
pub fn build(
mut args: impl Iterator<Item = String>,
) -> Result<Config, &'static str> {
args.next();
let query = match args.next() {
Some(arg) => arg,
None => return Err("Didn't get a query string"),
};
let file_path = match args.next() {
Some(arg) => arg,
None => return Err("Didn't get a file path"),
};
let ignore_case = env::var("IGNORE_CASE").is_ok();
Ok(Config {
query,
file_path,
ignore_case,
})
}
}
Remember that the first value in the return value of env::args
is the name of the program.
Making Code Clearer with Iterator Adaptors
We can also take advantage of iterators in the search
function in our I/O project:
Filename: src/lib.rs
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
let mut results = Vec::new();
for line in contents.lines() {
if line.contains(query) {
results.push(line);
}
}
results
}
We can write this code in a more concise way using iterator adaptor methods. Doing so also lets us avoid having a mutable intermediate results
vector. The functional programming style prefers to minimize the amount of mutable state to make code clearer. Removing the mutable state might enable a future enhancement to make searching happen in parallel, because we wouldn’t have to manage concurrent access to the results
vector. The following listing shows this change:
Filename: src/lib.rs
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
contents
.lines()
.filter(|line| line.contains(query))
.collect()
}
Choosing Between Loops or Iterators
Most Rust programmers prefer to use the iterator style. It’s a bit tougher to get the hang of at first, but once you get a feel for the various iterator adaptors and what they do, iterators can be easier to understand. Instead of fiddling with the various bits of looping and building new vectors, the code focuses on the high-level objective of the loop.
Comparing Performance: Loops vs. Iterators
Iterators, although a high-level abstraction, get compiled down to roughly the same code as if you’d written the lower-level code yourself. Iterators are one of Rust’s zero-cost abstractions, by which we mean using the abstraction imposes no additional runtime overhead.
In general, C++ implementations obey the zero-overhead principle: What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.
As another example, the following code is taken from an audio decoder. The decoding algorithm uses the linear prediction mathematical operation to estimate future values based on a linear function of the previous samples. This code uses an iterator chain to do some math on three variables in scope: a buffer
slice of data, an array of 12 coefficients
, and an amount by which to shift data in qlp_shift
.
let buffer: &mut [i32];
let coefficients: [i64; 12];
let qlp_shift: i16;
for i in 12..buffer.len() {
let prediction = coefficients.iter()
.zip(&buffer[i - 12..i])
.map(|(&c, &s)| c * s as i64)
.sum::<i64>() >> qlp_shift;
let delta = buffer[i];
buffer[i] = prediction as i32 + delta;
}
To calculate the value of prediction
, this code iterates through each of the 12 values in coefficients
and uses the zip
method to pair the coefficient values with the previous 12 values in buffer
. Then, for each pair, we multiply the values together, sum all the results, and shift the bits in the sum qlp_shift
bits to the right.
Calculations in applications like audio decoders often prioritize performance most highly. There’s no loop at all corresponding to the iteration over the values in coefficients
: Rust knows that there are 12 iterations, so it “unrolls” the loop. Unrolling is an optimization that removes the overhead of the loop controlling code and instead generates repetitive code for each iteration of the loop.
All of the coefficients get stored in registers, which means accessing the values is very fast. There are no bounds checks on the array access at runtime. All these optimizations that Rust is able to apply make the resulting code extremely efficient. Now that you know this, you can use iterators and closures without fear!
More About Cargo and Crates.io
Customizing Builds with Release Profiles
Cargo has two main profiles: the dev
profile Cargo uses when you run cargo build
and the release
profile Cargo uses when you run cargo build --release
.
Cargo has default settings for each of the profiles that apply when you haven’t explicitly added any [profile.*]
sections in the project’s Cargo.toml file. By adding [profile.*]
sections for any profile you want to customize, you override any subset of the default settings. For example, here are the default values for the opt-level
setting for the dev
and release
profiles:
Filename: Cargo.toml
[profile.dev]
opt-level = 0
[profile.release]
opt-level = 3
The opt-level
setting controls the number of optimizations Rust will apply to your code, with a range of 0 to 3. Applying more optimizations extends compiling time, so if you’re in development and compiling your code often, you’ll want fewer optimizations to compile faster even if the resulting code runs slower. The default opt-level
for dev is therefore 0
. When you’re ready to release your code, it’s best to spend more time compiling. You’ll only compile in release mode once, but you’ll run the compiled program many times, so release mode trades longer compile time for code that runs faster. That is why the default opt-level
for the release
profile is 3
.
You can override a default setting by adding a different value for it in Cargo.toml.
Filename: Cargo.toml
[profile.dev]
opt-level = 1
Because we set opt-level
to 1
, Cargo will apply more optimizations than the default, but not as many as in a release build.
Publishing a Crate to Crates.io
Making Useful Documentation Comments
In previous chapter, we discussed how to comment Rust code using two slashes, //
. Rust also has a particular kind of comment for documentation, known conveniently as a documentation comment, that will generate HTML documentation.
Documentation comments use three slashes, ///
, instead of two and support Markdown notation for formatting the text. Place documentation comments just before the item they’re documenting. The following listing shows documentation comments for an add_one
function in a crate named my_crate
.
Filename: src/lib.rs
/// Adds one to the number given.
///
/// # Examples
///
/// ```
/// let arg = 5;
/// let answer = my_crate::add_one(arg);
///
/// assert_eq!(6, answer);
/// ```
pub fn add_one(x: i32) -> i32 {
x + 1
}
We can generate the HTML documentation from this documentation comment by running cargo doc
. This command runs the rustdoc
tool distributed with Rust and puts the generated HTML documentation in the target/doc directory.
For convenience, running cargo doc --open
will build the HTML for your current crate’s documentation (as well as the documentation for all of your crate’s dependencies) and open the result in a web browser.
Commonly Used Sections
We used the # Examples
Markdown heading to create a section in the HTML with the title “Examples.” Here are some other sections that crate authors commonly use in their documentation:
Panics: The scenarios in which the function being documented could panic. Callers of the function who don’t want their programs to panic should make sure they don’t call the function in these situations.
Errors: If the function returns a
Result
, describing the kinds of errors that might occur and what conditions might cause those errors to be returned can be helpful to callers so they can write code to handle the different kinds of errors in different ways.Safety: If the function is
unsafe
to call (we discuss unsafety in later chapter), there should be a section explaining why the function is unsafe and covering the invariants that the function expects callers to uphold.
Most documentation comments don’t need all of these sections, but this is a good checklist to remind you of the aspects of your code users will be interested in knowing about.
Documentation Comments as Tests
Adding example code blocks in your documentation comments can help demonstrate how to use your library, and doing so has an additional bonus: running cargo test
will run the code examples in your documentation as tests! If we run cargo test
with the documentation for the add_one
function from the preceding listing, we will see a section in the test results like this:
Doc-tests my_crate
running 1 test
test src/lib.rs - add_one (line 5) ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.27s
Commenting Contained Items
The style of doc comment //!
adds documentation to the item that contains the comments rather than to the items following the comments. We typically use these doc comments inside the crate root file (src/lib.rs by convention) or inside a module to document the crate or the module as a whole.
For example, to add documentation that describes the purpose of the my_crate
crate that contains the add_one
function, we add documentation comments that start with //!
to the beginning of the src/lib.rs file, as shown in the following listing:
Filename: src/lib.rs
//! # My Crate
//!
//! `my_crate` is a collection of utilities to make performing certain
//! calculations more convenient.
/// Adds one to the number given.
// --snip--
Notice there isn’t any code after the last line that begins with //!
. Because we started the comments with //!
instead of ///
, we’re documenting the item that contains this comment rather than an item that follows this comment. In this case, that item is the src/lib.rs file, which is the crate root. These comments describe the entire crate.
When we run cargo doc --open
, these comments will display on the front page of the documentation for my_crate
above the list of public items in the crate
Documentation comments within items are useful for describing crates and modules especially. Use them to explain the overall purpose of the container to help your users understand the crate’s organization.
Exporting a Convenient Public API with pub use
The structure of your public API is a major consideration when publishing a crate.
However, the structure that makes sense to you while you’re developing a crate might not be very convenient for your users. You might want to organize your structs in a hierarchy containing multiple levels, but then people who want to use a type you’ve defined deep in the hierarchy might have trouble finding out that type exists. They might also be annoyed at having to enter use my_crate::some_module::another_module::UsefulType
; rather than use my_crate::UsefulType
;.
The good news is that if the structure isn’t convenient for others to use from another library, you don’t have to rearrange your internal organization: instead, you can re-export items to make a public structure that’s different from your private structure by using pub use
. Re-exporting takes a public item in one location and makes it public in another location, as if it were defined in the other location instead.
For example, say we made a library named art
for modeling artistic concepts. Within this library are two modules: a kinds
module containing two enums named PrimaryColor
and SecondaryColor
and a utils
module containing a function named mix
, as shown in the following listing:
Filename: src/lib.rs
//! # Art
//!
//! A library for modeling artistic concepts.
pub mod kinds {
/// The primary colors according to the RYB color model.
pub enum PrimaryColor {
Red,
Yellow,
Blue,
}
/// The secondary colors according to the RYB color model.
pub enum SecondaryColor {
Orange,
Green,
Purple,
}
}
pub mod utils {
use crate::kinds::*;
/// Combines two primary colors in equal amounts to create
/// a secondary color.
pub fn mix(c1: PrimaryColor, c2: PrimaryColor) -> SecondaryColor {
// --snip--
}
}
Note that the PrimaryColor
and SecondaryColor
types aren’t listed on the front page by cargo doc --open
, nor is the mix
function. We have to click kinds
and utils
to see them.
Another crate that depends on this library would need use
statements that bring the items from art
into scope, specifying the module structure that’s currently defined:
Filename: src/main.rs
use art::kinds::PrimaryColor;
use art::utils::mix;
fn main() {
let red = PrimaryColor::Red;
let yellow = PrimaryColor::Yellow;
mix(red, yellow);
}
The module structure of the art
crate is more relevant to developers working on the art
crate than to those using it. To remove the internal organization from the public API, we can modify the art
crate code to add pub use
statements to re-export the items at the top level, as shown in the following listing:
Filename: src/lib.rs
//! # Art
//!
//! A library for modeling artistic concepts.
pub use self::kinds::PrimaryColor;
pub use self::kinds::SecondaryColor;
pub use self::utils::mix;
pub mod kinds {
// --snip--
}
pub mod utils {
// --snip--
}
The API documentation that cargo doc
generates for this crate will now list and link re-exports on the front page, making the PrimaryColor
and SecondaryColor
types and the mix
function easier to find.
The art
crate users can still see and use the internal structure, or they can use the more convenient structure:
Filename: src/main.rs
use art::mix;
use art::PrimaryColor;
fn main() {
// --snip--
}
In cases where there are many nested modules, re-exporting the types at the top level with pub use
can make a significant difference in the experience of people who use the crate. Another common use of pub use
is to re-export definitions of a dependency in the current crate to make that crate’s definitions part of your crate’s public API.
Setting Up a Crates.io Account
Before you can publish any crates, you need to create an account on crates.io and get an API token. To do so, visit the home page at crates.io and log in via a GitHub account. (The GitHub account is currently a requirement, but the site might support other ways of creating an account in the future.) Once you’re logged in, visit your account settings at https://crates.io/me/ and retrieve your API key. Then run the cargo login
command with your API key, like this:
cargo login abcdefghijklmnopqrstuvwxyz012345
This command will inform Cargo of your API token and store it locally in ~/.cargo/credentials. Note that this token is a secret: do not share it with anyone else. If you do share it with anyone for any reason, you should revoke it and generate a new token on crates.io.
Adding Metadata to a New Crate
Before publishing, you’ll need to add some metadata in the [package]
section of the crate’s Cargo.toml file.
Your crate will need a unique name. While you’re working on a crate locally, you can name a crate whatever you’d like. However, crate names on crates.io are allocated on a first-come, first-served basis. Once a crate name is taken, no one else can publish a crate with that name. Before attempting to publish a crate, search for the name you want to use. If the name has been used, you will need to find another name and edit the name
field in the Cargo.toml file under the [package]
section to use the new name for publishing, like so:
Filename: Cargo.toml
[package]
name = "guessing_game"
Even if you’ve chosen a unique name, when you run cargo publish
to publish the crate at this point, you’ll get a warning and then an error:
$ cargo publish
Updating crates.io index
warning: manifest has no description, license, license-file, documentation, homepage or repository.
See https://doc.rust-lang.org/cargo/reference/manifest.html#package-metadata for more info.
--snip--
error: failed to publish to registry at https://crates.io
Caused by:
the remote server responded with an error: missing or empty metadata fields: description, license. Please see https://doc.rust-lang.org/cargo/reference/manifest.html for how to upload metadata
This errors because you’re missing some crucial information: a description and license are required so people will know what your crate does and under what terms they can use it. In Cargo.toml, add a description that’s just a sentence or two, because it will appear with your crate in search results. For the license
field, you need to give a license identifier value. The Linux Foundation’s Software Package Data Exchange (SPDX) lists the identifiers you can use for this value. For example, to specify that you’ve licensed your crate using the MIT License, add the MIT
identifier:
Filename: Cargo.toml
[package]
name = "guessing_game"
license = "MIT"
If you want to use a license that doesn’t appear in the SPDX, you need to place the text of that license in a file, include the file in your project, and then use license-file
to specify the name of that file instead of using the license
key.
Many people in the Rust community license their projects in the same way as Rust by using a dual license of MIT OR Apache-2.0
. This practice demonstrates that you can also specify multiple license identifiers separated by OR
to have multiple licenses for your project.
With a unique name, the version, your description, and a license added, the Cargo.toml file for a project that is ready to publish might look like this:
Filename: Cargo.toml
[package]
name = "guessing_game"
version = "0.1.0"
edition = "2021"
description = "A fun game where you guess what number the computer has chosen."
license = "MIT OR Apache-2.0"
[dependencies]
Cargo’s documentation describes other metadata you can specify to ensure others can discover and use your crate more easily.
Publishing to Crates.io
Publishing a crate uploads a specific version to crates.io for others to use.
Be careful, because a publish is permanent. The version can never be overwritten, and the code cannot be deleted. One major goal of crates.io is to act as a permanent archive of code so that builds of all projects that depend on crates from crates.io will continue to work. Allowing version deletions would make fulfilling that goal impossible. However, there is no limit to the number of crate versions you can publish.
Run the cargo publish
command again. It should succeed now:
$ cargo publish
Updating crates.io index
Packaging guessing_game v0.1.0 (file:///projects/guessing_game)
Verifying guessing_game v0.1.0 (file:///projects/guessing_game)
Compiling guessing_game v0.1.0
(file:///projects/guessing_game/target/package/guessing_game-0.1.0)
Finished dev [unoptimized + debuginfo] target(s) in 0.19s
Uploading guessing_game v0.1.0 (file:///projects/guessing_game)
Publishing a New Version of an Existing Crate
When you’ve made changes to your crate and are ready to release a new version, you change the version
value specified in your Cargo.toml file and republish. Use the Semantic Versioning rules to decide what an appropriate next version number is based on the kinds of changes you’ve made. Then run cargo publish
to upload the new version.
Deprecating Versions from Crates.io with cargo yank
Although you can’t remove previous versions of a crate, you can prevent any future projects from adding them as a new dependency. This is useful when a crate version is broken for one reason or another. In such situations, Cargo supports yanking a crate version.
Yanking a version prevents new projects from depending on that version while allowing all existing projects that depend on it to continue. Essentially, a yank means that all projects with a Cargo.lock will not break, and any future Cargo.lock files generated will not use the yanked version.
To yank a version of a crate, in the directory of the crate that you’ve previously published, run cargo yank
and specify which version you want to yank. For example, if we’ve published a crate named guessing_game
version 1.0.1 and we want to yank it, in the project directory for guessing_game
we’d run:
$ cargo yank --vers 1.0.1
Updating crates.io index
Yank [email protected]
By adding --undo
to the command, you can also undo a yank and allow projects to start depending on a version again:
$ cargo yank --vers 1.0.1 --undo
Updating crates.io index
Unyank [email protected]
A yank does not delete any code. It cannot, for example, delete accidentally uploaded secrets. If that happens, you must reset those secrets immediately.
Cargo Workspaces
As your project develops, you might find that the library crate continues to get bigger and you want to split your package further into multiple library crates. Cargo offers a feature called workspaces that can help manage multiple related packages that are developed in tandem.
Creating a Workspace
A workspace is a set of packages that share the same Cargo.lock and output directory. We’ll have a workspace containing a binary and two libraries. The binary, which will provide the main functionality, will depend on the two libraries. We’ll start by creating a new directory for the workspace:
mkdir add
cd add
Next, in the add directory, we create the Cargo.toml file that will configure the entire workspace. This file won’t have a [package]
section. Instead, it will start with a [workspace]
section that will allow us to add members to the workspace by specifying the path to the package with our binary crate; in this case, that path is adder:
Filename: Cargo.toml
[workspace]
members = [
"adder",
]
Next, we’ll create the adder
binary crate by running cargo new
within the add directory:
$ cargo new adder
Created binary (application) `adder` package
At this point, we can build the workspace by running cargo build
. The files in your add directory should look like this:
├── Cargo.lock
├── Cargo.toml
├── adder
│ ├── Cargo.toml
│ └── src
│ └── main.rs
└── target
The workspace has one target directory at the top level that the compiled artifacts will be placed into; the adder
package doesn’t have its own target directory. Even if we were to run cargo build
from inside the adder directory, the compiled artifacts would still end up in add/target rather than add/adder/target. Cargo structures the target directory in a workspace like this because the crates in a workspace are meant to depend on each other. If each crate had its own target directory, each crate would have to recompile each of the other crates in the workspace to place the artifacts in its own target directory. By sharing one target directory, the crates can avoid unnecessary rebuilding.
Creating the Second Package in the Workspace
Next, let’s create another member package in the workspace and call it add_one
. Change the top-level Cargo.toml to specify the add_one path in the members
list:
Filename: Cargo.toml
[workspace]
members = [
"adder",
"add_one",
]
Then generate a new library crate named add_one
:
$ cargo new add_one --lib
Created library `add_one` package
Your add directory should now have these directories and files:
├── Cargo.lock
├── Cargo.toml
├── add_one
│ ├── Cargo.toml
│ └── src
│ └── lib.rs
├── adder
│ ├── Cargo.toml
│ └── src
│ └── main.rs
└── target
In the add_one/src/lib.rs file, let’s add an add_one
function:
Filename: add_one/src/lib.rs
pub fn add_one(x: i32) -> i32 {
x + 1
}
Now we can have the adder
package with our binary depend on the add_one
package that has our library. First, we’ll need to add a path dependency on add_one
to adder/Cargo.toml.
Filename: adder/Cargo.toml
[dependencies]
add_one = { path = "../add_one" }
Cargo doesn’t assume that crates in a workspace will depend on each other, so we need to be explicit about the dependency relationships.
Next, let’s use the add_one
function (from the add_one
crate) in the adder
crate. Open the adder/src/main.rs file and add a use
line at the top to bring the new add_one
library crate into scope. Then change the main
function to call the add_one
function.
Filename: adder/src/main.rs
use add_one;
fn main() {
let num = 10;
println!("Hello, world! {num} plus one is {}!", add_one::add_one(num));
}
Let’s build the workspace by running cargo build
in the top-level add directory!
$ cargo build
Compiling add_one v0.1.0 (file:///projects/add/add_one)
Compiling adder v0.1.0 (file:///projects/add/adder)
Finished dev [unoptimized + debuginfo] target(s) in 0.68s
To run the binary crate from the add directory, we can specify which package in the workspace we want to run by using the -p
argument and the package name with cargo run
:
$ cargo run -p adder
Finished dev [unoptimized + debuginfo] target(s) in 0.0s
Running `target/debug/adder`
Hello, world! 10 plus one is 11!
This runs the code in adder/src/main.rs, which depends on the add_one
crate.
Depending on an External Package in a Workspace
Notice that the workspace has only one Cargo.lock file at the top level, rather than having a Cargo.lock in each crate’s directory. This ensures that all crates are using the same version of all dependencies. If we add the rand
package to the adder/Cargo.toml and add_one/Cargo.toml files, Cargo will resolve both of those to one version of rand
and record that in the one Cargo.lock. Making all crates in the workspace use the same dependencies means the crates will always be compatible with each other. Let’s add the rand
crate to the [dependencies]
section in the add_one/Cargo.toml file so we can use the rand
crate in the add_one
crate:
Filename: add_one/Cargo.toml
[dependencies]
rand = "0.8.5"
We can now add use rand
; to the add_one/src/lib.rs file, and building the whole workspace by running cargo build
in the add directory will bring in and compile the rand
crate. We will get one warning because we aren’t referring to the rand
we brought into scope:
$ cargo build
Updating crates.io index
Downloaded rand v0.8.5
--snip--
Compiling rand v0.8.5
Compiling add_one v0.1.0 (file:///projects/add/add_one)
warning: unused import: `rand`
--> add_one/src/lib.rs:1:5
|
1 | use rand;
| ^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: `add_one` (lib) generated 1 warning
Compiling adder v0.1.0 (file:///projects/add/adder)
Finished dev [unoptimized + debuginfo] target(s) in 10.18s
The top-level Cargo.lock now contains information about the dependency of add_one
on rand
. However, even though rand
is used somewhere in the workspace, we can’t use it in other crates in the workspace unless we add rand
to their Cargo.toml files as well. For example, if we add use rand
; to the adder/src/main.rs file for the adder
package, we’ll get an error:
$ cargo build
--snip--
Compiling adder v0.1.0 (file:///projects/add/adder)
error[E0432]: unresolved import `rand`
--> adder/src/main.rs:2:5
|
2 | use rand;
| ^^^^ no external crate `rand`
To fix this, edit the Cargo.toml file for the adder
package and indicate that rand
is a dependency for it as well. Building the adder
package will add rand
to the list of dependencies for adder
in Cargo.lock, but no additional copies of rand
will be downloaded. Cargo has ensured that every crate in every package in the workspace using the rand
package will be using the same version, saving us space and ensuring that the crates in the workspace will be compatible with each other.
Adding a Test to a Workspace
For another enhancement, let’s add a test of the add_one::add_one
function within the add_one
crate:
Filename: add_one/src/lib.rs
pub fn add_one(x: i32) -> i32 {
x + 1
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() {
assert_eq!(3, add_one(2));
}
}
Now run cargo test
in the top-level add directory. Running cargo test
in a workspace structured like this one will run the tests for all the crates in the workspace:
$ cargo test
Compiling add_one v0.1.0 (file:///projects/add/add_one)
Compiling adder v0.1.0 (file:///projects/add/adder)
Finished test [unoptimized + debuginfo] target(s) in 0.27s
Running unittests src/lib.rs (target/debug/deps/add_one-f0253159197f7841)
running 1 test
test tests::it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/main.rs (target/debug/deps/adder-49979ff40686fa8e)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests add_one
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
We can also run tests for one particular crate in a workspace from the top-level directory by using the -p
flag and specifying the name of the crate we want to test:
$ cargo test -p add_one
Finished test [unoptimized + debuginfo] target(s) in 0.00s
Running unittests src/lib.rs (target/debug/deps/add_one-b3235fea9a156f74)
running 1 test
test tests::it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Doc-tests add_one
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
If you publish the crates in the workspace to crates.io, each crate in the workspace will need to be published separately. Like cargo test
, we can publish a particular crate in our workspace by using the -p
flag and specifying the name of the crate we want to publish.
As your project grows, consider using a workspace: it’s easier to understand smaller, individual components than one big blob of code. Furthermore, keeping the crates in a workspace can make coordination between crates easier if they are often changed at the same time.
Installing Binaries with cargo install
The cargo install
command allows you to install and use binary crates locally. Note that you can only install packages that have binary targets. A binary target is the runnable program that is created if the crate has a src/main.rs file or another file specified as a binary, as opposed to a library target that isn’t runnable on its own but is suitable for including within other programs. Usually, crates have information in the README file about whether a crate is a library, has a binary target, or both.
All binaries installed with cargo install
are stored in the installation root’s bin folder. If you installed Rust using rustup.rs and don’t have any custom configurations, this directory will be $HOME/.cargo/bin. Ensure that directory is in your $PATH
to be able to run programs you’ve installed with cargo install
.
For example, there’s a Rust implementation of the grep
tool called ripgrep
for searching files. To install ripgrep
, we can run the following:
$ cargo install ripgrep
Updating crates.io index
Downloaded ripgrep v13.0.0
Downloaded 1 crate (243.3 KB) in 0.88s
Installing ripgrep v13.0.0
--snip--
Compiling ripgrep v13.0.0
Finished release [optimized + debuginfo] target(s) in 3m 10s
Installing ~/.cargo/bin/rg
Installed package `ripgrep v13.0.0` (executable `rg`)
The second-to-last line of the output shows the location and the name of the installed binary, which in the case of ripgrep
is rg
. As long as the installation directory is in your $PATH
, as mentioned previously, you can then run rg --help
and start using a faster, rustier tool for searching files!
Extending Cargo with Custom Commands
Cargo is designed so you can extend it with new subcommands without having to modify Cargo. If a binary in your $PATH
is named cargo-something
, you can run it as if it was a Cargo subcommand by running cargo something
. Custom commands like this are also listed when you run cargo --list
. Being able to use cargo install
to install extensions and then run them just like the built-in Cargo tools is a super convenient benefit of Cargo’s design!
Smart Pointers
A pointer is a general concept for a variable that contains an address in memory. This address refers to, or “points at,” some other data. The most common kind of pointer in Rust is a reference. References are indicated by the &
symbol and borrow the value they point to. They don’t have any special capabilities other than referring to data, and have no overhead.
Smart pointers, on the other hand, are data structures that act like a pointer but also have additional metadata and capabilities. Rust has a variety of smart pointers defined in the standard library that provide functionality beyond that provided by references. To explore the general concept, we’ll look at a couple of different examples of smart pointers, including a reference counting smart pointer type. This pointer enables you to allow data to have multiple owners by keeping track of the number of owners and, when no owners remain, cleaning up the data.
Rust, with its concept of ownership and borrowing, has an additional difference between references and smart pointers: while references only borrow data, in many cases, smart pointers own the data they point to.
Though we didn’t call them as such at the time, we’ve already encountered a few smart pointers in this book, including String
and Vec<T>
. Both these types count as smart pointers because they own some memory and allow you to manipulate it. They also have metadata and extra capabilities or guarantees. String
, for example, stores its capacity as metadata and has the extra ability to ensure its data will always be valid UTF-8.
Smart pointers are usually implemented using structs. Unlike an ordinary struct, smart pointers implement the Deref
and Drop
traits. The Deref
trait allows an instance of the smart pointer struct to behave like a reference so you can write your code to work with either references or smart pointers. The Drop
trait allows you to customize the code that’s run when an instance of the smart pointer goes out of scope.
We’ll cover the most common smart pointers in the standard library:
Box<T>
for allocating values on the heapRc<T>
, a reference counting type that enables multiple ownershipRef<T>
andRefMut<T>
, accessed throughRefCell<T>
, a type that enforces the borrowing rules at runtime instead of compile time
In addition, we’ll cover the interior mutability pattern where an immutable type exposes an API for mutating an interior value. We’ll also discuss reference cycles: how they can leak memory and how to prevent them.
Using Box<T>
to Point to Data on the Heap
The most straightforward smart pointer is a box, whose type is written Box<T>
. Boxes allow you to store data on the heap rather than the stack. What remains on the stack is the pointer to the heap data.
Boxes don’t have performance overhead, other than storing their data on the heap instead of on the stack. But they don’t have many extra capabilities either. You’ll use them most often in these situations:
When you have a type whose size can’t be known at compile time and you want to use a value of that type in a context that requires an exact size
When you have a large amount of data and you want to transfer ownership but ensure the data won’t be copied when you do so
When you want to own a value and you care only that it’s a type that implements a particular trait rather than being of a specific type
We’ll demonstrate the first situation in the “Enabling Recursive Types with Boxes” section. In the second case, transferring ownership of a large amount of data can take a long time because the data is copied around on the stack. To improve performance in this situation, we can store the large amount of data on the heap in a box. Then, only the small amount of pointer data is copied around on the stack, while the data it references stays in one place on the heap. The third case is known as a trait object, and the later chapter devotes an entire section, Using Trait Objects That Allow for Values of Different Types, just to that topic. So what you learn here you’ll apply again in the later Chapter!
Using a Box<T>
to Store Data on the Heap
The following listing shows how to use a box to store an i32
value on the heap:
Filename: src/main.rs
fn main() {
let b = Box::new(5);
println!("b = {}", b);
}
In this case, we can access the data in the box similar to how we would if this data were on the stack. Just like any owned value, when a box goes out of scope, as b
does at the end of main
, it will be deallocated. The deallocation happens both for the box (stored on the stack) and the data it points to (stored on the heap).
Enabling Recursive Types with Boxes
A value of recursive type can have another value of the same type as part of itself. Recursive types pose an issue because at compile time Rust needs to know how much space a type takes up. However, the nesting of values of recursive types could theoretically continue infinitely, so Rust can’t know how much space the value needs. Because boxes have a known size, we can enable recursive types by inserting a box in the recursive type definition.
As an example of a recursive type, let’s explore the cons list. This is a data type commonly found in functional programming languages. The cons list type we’ll define is straightforward except for the recursion; therefore, the concepts in the example we’ll work with will be useful any time you get into more complex situations involving recursive types.
More Information About the Cons List
A cons list is a data structure that comes from the Lisp programming language and its dialects and is made up of nested pairs, and is the Lisp version of a linked list. Its name comes from the cons
function (short for “construct function”) in Lisp that constructs a new pair from its two arguments. By calling cons
on a pair consisting of a value and another pair, we can construct cons lists made up of recursive pairs.
For example, here’s a pseudocode representation of a cons list containing the list 1, 2, 3 with each pair in parentheses:
(1, (2, (3, Nil)))
Each item in a cons list contains two elements: the value of the current item and the next item. The last item in the list contains only a value called Nil
without a next item. A cons list is produced by recursively calling the cons
function. The canonical name to denote the base case of the recursion is Nil
. Note that this is not the same as the “null” or “nil” concept in the previous chapter, which is an invalid or absent value.
The cons list isn’t a commonly used data structure in Rust. Most of the time when you have a list of items in Rust, Vec<T>
is a better choice to use. Other, more complex recursive data types are useful in various situations, but by starting with the cons list in this chapter, we can explore how boxes let us define a recursive data type without much distraction.
The following listing contains an enum definition for a cons list. Note that this code won’t compile yet because the List
type doesn’t have a known size, which we’ll demonstrate.
Filename: src/main.rs
enum List {
Cons(i32, List),
Nil,
}
Using the List
type to store the list 1, 2, 3
would look like the code in the following listing:
Filename: src/main.rs
use crate::List::{Cons, Nil};
fn main() {
let list = Cons(1, Cons(2, Cons(3, Nil)));
}
If we try to compile the code in the previous listing, we get the error shown in the following listing:
$ cargo run
Compiling cons-list v0.1.0 (file:///projects/cons-list)
error[E0072]: recursive type `List` has infinite size
--> src/main.rs:1:1
|
1 | enum List {
| ^^^^^^^^^
2 | Cons(i32, List),
| ---- recursive without indirection
|
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to break the cycle
|
2 | Cons(i32, Box<List>),
| ++++ +
For more information about this error, try `rustc --explain E0072`.
error: could not compile `cons-list` due to previous error
The error shows this type “has infinite size.” The reason is that we’ve defined List
with a variant that is recursive: it holds another value of itself directly. As a result, Rust can’t figure out how much space it needs to store a List
value.
Computing the Size of a Non-Recursive Type
Recall the Message
enum we defined in the previous listing when we discussed enum definitions in the previous chapter:
enum Message {
Quit,
Move { x: i32, y: i32 },
Write(String),
ChangeColor(i32, i32, i32),
}
To determine how much space to allocate for a Message
value, Rust goes through each of the variants to see which variant needs the most space. Rust sees that Message::Quit
doesn’t need any space, Message::Move
needs enough space to store two i32
values, and so forth. Because only one variant will be used, the most space a Message
value will need is the space it would take to store the largest of its variants.
Contrast this with what happens when Rust tries to determine how much space a recursive type like the List
enum in the previous listing needs. The compiler starts by looking at the Cons
variant, which holds a value of type i32
and a value of type List
. Therefore, Cons
needs an amount of space equal to the size of an i32
plus the size of a List
. To figure out how much memory the List
type needs, the compiler looks at the variants, starting with the Cons
variant. The Cons
variant holds a value of type i32
and a value of type List
, and this process continues infinitely.
Using Box<T>
to Get a Recursive Type with a Known Size
Because Rust can’t figure out how much space to allocate for recursively defined types, the compiler gives an error with this helpful suggestion:
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to make `List` representable
|
2 | Cons(i32, Box<List>),
| ++++ +
In this suggestion, “indirection” means that instead of storing a value directly, we should change the data structure to store the value indirectly by storing a pointer to the value instead.
Because a Box<T>
is a pointer, Rust always knows how much space a Box<T>
needs: a pointer’s size doesn’t change based on the amount of data it’s pointing to. This means we can put a Box<T>
inside the Cons
variant instead of another List
value directly. The Box<T>
will point to the next List
value that will be on the heap rather than inside the Cons
variant. Conceptually, we still have a list, created with lists holding other lists, but this implementation is now more like placing the items next to one another rather than inside one another.
We can change the definition of the List
enum and the usage of the List
in previous listings to the code in the following listing, which will compile:
Filename: src/main.rs
enum List {
Cons(i32, Box<List>),
Nil,
}
use crate::List::{Cons, Nil};
fn main() {
let list = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
}
The Cons
variant needs the size of an i32
plus the space to store the box’s pointer data. The Nil
variant stores no values, so it needs less space than the Cons
variant. We now know that any List
value will take up the size of an i32
plus the size of a box’s pointer data. By using a box, we’ve broken the infinite, recursive chain, so the compiler can figure out the size it needs to store a List
value.
Boxes provide only the indirection and heap allocation; they don’t have any other special capabilities, like those we’ll see with the other smart pointer types. They also don’t have the performance overhead that these special capabilities incur, so they can be useful in cases like the cons list where the indirection is the only feature we need.
The Box<T>
type is a smart pointer because it implements the Deref
trait, which allows Box<T>
values to be treated like references. When a Box<T>
value goes out of scope, the heap data that the box is pointing to is cleaned up as well because of the Drop
trait implementation.