Rust in pieces: the first 15 minutes

With the Rust language nearing its 1.0 release and many (but not all) APIs starting to stabilise, now seemed like a good time for someone with little time to play around with it.

Coming from a C++ background, the promise of powerful zero-cost abstractions and compiler-enforced lifetime management is certainly appealing. In practise, is it too much effort to be useful for the average performance-sensitive application? I hope we can extrapolate from some toy examples, because that’s all I’ll be able to manage just now!

If you want a tl;dr of my thoughts so far:

  • Compiler warnings are on a par with modern compilers for other languages (gcc, clang/xcode)

  • Type inference is great and cuts down on a lot of boilerplate.

  • The borrow checker is annoyingly merciless.

  • Type inference without garbage allocation makes it really easy to incur undefined behaviour everywhere.

  • The borrow checker fixes this.

  • What one hand giveth, the other taketh away, but there is a net positive: you’re safe from use-after-free bugs and data races.

If you are interested in how I arrived at these conclusions, then read on.

First, define a problem: Read a file and print back the unique lines in that file together with the number of times each line occurred. Should be easy, right? Let’s get some sample input, fruit.txt:

apple
banana
cherry
banana
apple
apple
damson
cherry
banana

And our program should output some permutation of the following:

apple: 3 time(s)
banana: 3 time(s)
cherry: 2 time(s)
damson: 1 time(s)

Let’s see how quickly we can get such a program together in Rust.

Rust programs using the Cargo package manager (as opposed to compiling everything with rustc in a makefile) should define a Cargo.toml, so let’s do that:

[package]
name = "fruitcount"
version = "0.0.1"
authors = [ "Chris Branch " ]

[[bin]]
name = "fruitcount"

Then our source should appear in a src directory. Let’s write a simple main function in src/main.rs that repeats stdin.

fn main() {
    let stdin = std::old_io::stdin();
    for maybe_line in stdin.lock().lines() {
        println!("{}", maybe_line.unwrap());
    }
}

Time to try our first cargo build!

~/rust/fruitcount  cargo build
   Compiling fruitcount v0.0.1 (file:///Users/chrisb/rust/fruitcount)
src/main.rs:3:23: 3:28 error: cannot borrow immutable local variable `stdin` as mutable
src/main.rs:3     for maybe_line in stdin.lock().lines() {
                                    ^~~~~
error: aborting due to previous error
Could not compile `fruitcount`.

Boo, I forgot that mutability is explicit. Changing let stdin to let mut stdin fixes the error. I can see myself accidentally typing ‘mut’ in my Swift code a few weeks from now… Anyway, rerunning cargo build and I get a bunch of lines like this:

src/main.rs:2:21: 2:39 warning: use of unstable library feature 'old_io'
src/main.rs:2     let mut stdin = std::old_io::stdin();
                                  ^~~~~~~~~~~~~~~~~~
src/main.rs:2:21: 2:39 help: add #![feature(old_io)] to the crate attributes to silence this warning
src/main.rs:2     let mut stdin = std::old_io::stdin();
                                  ^~~~~~~~~~~~~~~~~~

It’s nice to see the compiler suggest how to deal with that warning. In this case it’s not a temporary fix, but that’s the nature of an alpha release. The important thing is it compiles and it repeats anything sent to stdin – so far, so good.

Now we need a data structure to store unique values with an associated count. HashMap is the logical choice for this. First, let’s find unique lines:

#![feature(old_io)]

use std::collections::HashMap;

fn main() {
    let mut counted_words = HashMap::new();
    let mut stdin = std::old_io::stdin();
    for maybe_line in stdin.lock().lines() {
        let line = maybe_line.unwrap();
        counted_words.insert(line, 1);
    }
    for (line, count) in counted_words.iter() {
        println!("{}: {} time(s)", *line, *count);
    }
}

Do a little dance:

~/rust/fruitcount  cargo build
   Compiling fruitcount v0.0.1 (file:///Users/chrisb/rust/fruitcount)
~/rust/fruitcount  target/fruitcount <fruit.txt
apple
: 1 time(s)
cherry
: 1 time(s)
damson
: 1 time(s)
banana
: 1 time(s)

Ah, we didn’t want to include the newlines in the lines we read, but we can deal with that later. Let’s focus on getting counting to work. What we really want in that first loop is to insert 1 if it’s the first time we saw the line, or add 1 to an existing count.

I remember seeing some kind of find_or_insert_with method when I last saw some Rust, but it’s not there anymore. In Racket, I’d use the hash-update! function which accepts a default value if it’s not in the hash, or a function that accepts the existing value and returns a new one. However, C++ insert gives you an iterator if the key already exists. Rust might do things differently still. I could always check if the key exists, get and set a new value if so, or insert the key if not, but that would be unsatisfying given Rust’s raison d’etre.

Looking through the HashMap API reference I see this potentially useful method:

fn entry(&mut self, key: K) -> Entry

Gets the given key’s corresponding entry in the map for in-place manipulation.

WTF is an Entry anyway? It can be occupied or vacant, but I can only get values? I must be missing something. Maybe I should find out what happened to that find_or_insert_with function.

A bit of googling later proved my hunch right: that function and others like it were replaced by pattern matching on the more general Entry abstraction. It’s more elegant, though unfortunately more verbose, too: we have to import the Vacant and Occupied enum variants to use them. We can probably import the whole thing with a * but it’d be easier for developers to have that included when you import HashMap.

An example on the documentation for Entry wouldn’t hurt, though. Perhaps the API is aimed at people more experienced to Rust, and thus realising they should match on the enum in Entry; maybe I’m asking for the equivalent of what square brackets mean in an array. I’m just throwing it out there.

At least we know how to do this now:

#![feature(old_io)]

use std::collections::HashMap;
use std::collections::hash_map::Entry::Vacant;
use std::collections::hash_map::Entry::Occupied;

fn main() {
    let mut counted_words = HashMap::new();
    let mut stdin = std::old_io::stdin();
    for maybe_line in stdin.lock().lines() {
        let line = maybe_line.unwrap();
        match counted_words.entry(line) {
            Vacant(entry) => entry.insert(1),
            Occupied(entry) => *entry.get_mut() += 1,
        }
    }
    for (line, count) in counted_words.iter() {
        println!("{}: {} time(s)", *line, *count);
    }
}

Or maybe not:

src/main.rs:12:9: 15:10 error: match arms have incompatible types:
 expected `&mut _`,
    found `()`
(expected &-ptr,
    found ()) [E0308]
src/main.rs:12         match counted_words.entry(line) {
src/main.rs:13             Vacant(entry) => entry.insert(1),
src/main.rs:14             Occupied(entry) => *entry.get_mut() += 1,
src/main.rs:15         }
src/main.rs:14:32: 14:53 note: match arm with an incompatible type
src/main.rs:14             Occupied(entry) => *entry.get_mut() += 1,
                                              ^~~~~~~~~~~~~~~~~~~~~

Okay, match is an expression you can use to return things. We want to return void here, and the semicolon effectively makes an expression return nil. Therefore, add a semicolon!

    match counted_words.entry(line) {
        Vacant(entry) => entry.insert(1);,
        Occupied(entry) => *entry.get_mut() += 1;,
    }
src/main.rs:13:45: 13:46 error: expected one of `,`, `.`, `}`, or an operator, found `;`
src/main.rs:13             Vacant(entry) => entry.insert(1); ,
                                                           ^

Drat. But we can wrap it up in a block, so that should surely work?

        match counted_words.entry(line) {
            Vacant(entry) => { entry.insert(1); },
            Occupied(entry) => { *entry.get_mut() += 1; },
        }
src/main.rs:14:35: 14:40 error: cannot borrow immutable local variable `entry` as mutable
src/main.rs:14             Occupied(entry) => { *entry.get_mut() += 1; },
                                                 ^~~~~

Okay, well this looks like we’re making progress. This is probably the same reason I need to put let mut rather than let. Let’s fix it and build again:

        match counted_words.entry(line) {
            Vacant(entry) => { entry.insert(1); },
            Occupied(mut entry) => { *entry.get_mut() += 1; },
        }
src/main.rs:4:5: 4:46 warning: use of unstable library feature 'std_misc': precise API still being fleshed out
src/main.rs:4 use std::collections::hash_map::Entry::Vacant;
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

…multiplied by four, for each use of those keywords. Alpha release, whatever. Let’s run it now:

~/rust/fruitcount  target/fruitcount <fruit.txt
banana
: 3 time(s)
damson
: 1 time(s)
cherry
: 2 time(s)
apple
: 3 time(s)

Okay, not bad. That’s enough HashMap stuff. Let’s try removing those line breaks now. Isn’t it odd that there isn’t any kind of trim function in this list of string methods? But I forget there is also a primitive string type &str, and indeed there’s an implementation of a promisingly-named trim function. Let’s give it a shot:

    for maybe_line in stdin.lock().lines() {
        let line = maybe_line.unwrap();
        let trimmed_line = line.trim();
        match counted_words.entry(trimmed_line) {
            Vacant(entry) => { entry.insert(1); },
            Occupied(mut entry) => { *entry.get_mut() += 1; },
        }
    }
src/main.rs:12:28: 12:32 error: `line` does not live long enough
src/main.rs:12         let trimmed_line = line.trim();
                                          ^~~~
src/main.rs:8:43: 21:2 note: reference must be valid for the block suffix following statement 0 at 8:42...
src/main.rs:8     let mut counted_words = HashMap::new();
src/main.rs:9     let mut stdin = std::old_io::stdin();
src/main.rs:10     for maybe_line in stdin.lock().lines() {
src/main.rs:11         let line = maybe_line.unwrap();
src/main.rs:12         let trimmed_line = line.trim();
src/main.rs:13         match counted_words.entry(trimmed_line) {
               ...
src/main.rs:11:39: 17:6 note: ...but borrowed value is only valid for the block suffix following statement 0 at 11:38
src/main.rs:11         let line = maybe_line.unwrap();
src/main.rs:12         let trimmed_line = line.trim();
src/main.rs:13         match counted_words.entry(trimmed_line) {
src/main.rs:14             Vacant(entry) => { entry.insert(1); },
src/main.rs:15             Occupied(mut entry) => { *entry.get_mut() += 1; },
src/main.rs:16         }
               ...

Say what? You had no problems with line living long enough before. What exactly is trim doing here? Its function signature is fn trim(&self) -> &str. Rust uses & to indicate a borrowed object, so the hash map is trying to borrow those strings too? What type is counted_words, anyway?

Rust is a statically typed language, which means that we specify our types up front. So why does our first example compile? Well, Rust has this thing called type inference. If it can figure out what the type of something is, Rust doesn’t require you to actually type it out. – Rust Book – Variable Bindings

So if we’re trying to insert a &str into a map, the compiler would infer the HashMap has &str-typed keys. Since &str is really an array slice, we actually want the keys to be of the owned type String.

Okay, so if you think this is annoying, consider the alternative. Imagine if C++ had this kind of type inference. Not auto – we’d still have to define the type of the HashMap fully – but if we defined the container based on what we tried to put into it. We would have string_view inferred as the key type and segfaulted at runtime. Unless we have a fully garbage collected language, we rely on the borrow checker to make sure we don’t do anything this stupid.

Back to the problem at hand. There ought to be a way to convert &str to String. Looking for functions that return String in the documentation for &str turns up a likely candidate, fn to_owned(&self) -> String, of the trait ToOwned. This looks handy to remember, since I’m sure I’m going to make the same mistake with other types…

Could this be it?

    for maybe_line in stdin.lock().lines() {
        let line = maybe_line.unwrap();
        let trimmed_line = line.trim().to_owned();
        match counted_words.entry(trimmed_line) {
            Vacant(entry) => { entry.insert(1); },
            Occupied(mut entry) => { *entry.get_mut() += 1; },
        }
    }
src/main.rs:12:40: 12:50 error: type `&str` does not implement any method in scope named `to_owned`
src/main.rs:12         let trimmed_line = line.trim().to_owned();
                                                      ^~~~~~~~~~
src/main.rs:12:50: 12:50 help: methods from traits can only be called if the trait is in scope; the following trait is implemented but not in scope, perhaps add a `use` for it:
src/main.rs:12:50: 12:50 help: candidate #1: use `collections::borrow::ToOwned`
src/main.rs:19:36: 19:41 error: the type of this value must be known in this context
src/main.rs:19         println!("{}: {} time(s)", *line, *count);
                                                  ^~~~~
note: in expansion of format_args!
:2:43: 2:76 note: expansion site
:1:1: 2:78 note: in expansion of println!
src/main.rs:19:9: 19:51 note: expansion site
src/main.rs:19:43: 19:49 error: the type of this value must be known in this context
src/main.rs:19         println!("{}: {} time(s)", *line, *count);
                                                         ^~~~~~
note: in expansion of format_args!
:2:43: 2:76 note: expansion site
:1:1: 2:78 note: in expansion of println!
src/main.rs:19:9: 19:51 note: expansion site

Ha ha, just kidding. I have no idea what happened with that println! macro, probably the type inference got confused when it didn’t know the result of to_owned(). At least the warning is helpful again.

src/main.rs:3:5: 3:16 error: unresolved import `collections::borrow::ToOwned`. Maybe a missing `extern crate collections`?
src/main.rs:3 use collections::borrow::ToOwned;
                  ^~~~~~~~~~~

Ha ha, got you again. Good thing I investigated that ToOwned trait and saw it lived in std::borrow; I’ll chalk this up to a compiler bug. Here’s the fixed code:

#![feature(old_io)]
#![feature(std_misc)]

use std::borrow::ToOwned;
use std::collections::HashMap;
use std::collections::hash_map::Entry::Vacant;
use std::collections::hash_map::Entry::Occupied;

fn main() {
    let mut counted_words = HashMap::new();
    let mut stdin = std::old_io::stdin();
    for maybe_line in stdin.lock().lines() {
        let line = maybe_line.unwrap();
        let trimmed_line = line.trim().to_owned();
        match counted_words.entry(trimmed_line) {
            Vacant(entry) => { entry.insert(1); },
            Occupied(mut entry) => { *entry.get_mut() += 1; },
        }
    }
    for (line, count) in counted_words.iter() {
        println!("{}: {} time(s)", *line, *count);
    }
}

And here’s the result:

~/rust/fruitcount  cargo build
   Compiling fruitcount v0.0.1 (file:///Users/chrisb/rust/fruitcount)
~/rust/fruitcount  target/fruitcount <fruit.txt
damson: 1 time(s)
banana: 3 time(s)
cherry: 2 time(s)
apple: 3 time(s)

Not so bad. We could trivially sort the results by using a BTreeMap instead of HashMap, as explained in the collections reference. All it takes is a search and replace – give it a try yourself.

One thought on “Rust in pieces: the first 15 minutes”

  1. Great article! I am new to google analytics, on my GA page there are sales – but its showing $0 for assisted conversions.  We are very active in social media but it appears that there have been no “assisted sales” by social … even looking over the span of a year. I find this very hard to believe. Is this because we have no goals set in GA? I saw in your post that “You need to have a goal active in order to see this. If you don’t have an active goal, you won’t be able to see this info.” …HELP! Thank you for this great post.

    http://allin1panel.com/blog/find-social-media-marketing-online/

Leave a Reply

Your email address will not be published. Required fields are marked *