Why You Can't Index Strings in Rust

Have you ever tried accessing a Rust String with s[0] only to be greeted by a compiler error? If you’re coming from Python or C++, this might feel unexpectedly restrictive.

Consider the following snippet:

fn main() {
    let s = String::from("hello");
    let c = s[0];
}

Attempting to compile this results in the following error:

error[E0277]: the type `str` cannot be indexed by `{integer}`
 --> src/main.rs:3:15
  |
3 |     let c = s[0];
  |               ^ string indices are ranges of `usize`
  |
  = note: you can use `.chars().nth()` or `.bytes().nth()`
          for more information, see chapter 8 in The Book: <https://doc.rust-lang.org/book/ch08-02-strings.html#indexing-into-strings>

The Core Reason: Rust Strings Are UTF-8

In Rust, strings are stored as UTF-8 encoded byte sequences. This means “one character ≠ one byte”.

Multibyte characters (e.g. Japanese, emojis) cannot be accessed via a single byte index without risking invalid UTF-8 data. The compiler rightly prevents such unsafe operations.

String vs. &str

Rust primarily provides two string types:

  • String: heap-allocated, growable, and owns its contents
  • &str: string slice — a view into a sequence of UTF-8 bytes

Example:

fn main() {
    let mut owned = String::from("Hello");
    owned.push_str(", world!");

    let slice = &owned[0..5]; // "Hello"
    let literal = "こんにちは"; // &'static str

    println!("{}", owned);  // Hello, world!
    println!("{}", slice);  // Hello
    println!("{}", literal); // こんにちは
}

Both String and &str enforce the same safety guarantees against invalid UTF-8 indexing.

Consider:

fn main() {
    let s = "こんにちは";
    println!("Bytes: {}", s.len());          // 15 bytes
    println!("Characters: {}", s.chars().count()); // 5 chars
}

Each Japanese character is 3 bytes. So s[1] would incorrectly refer to the second byte, not the second character. That’s why Rust disallows this access entirely.

Safe Byte Slice (Only at Char Boundaries)

fn main() {
    let s = "こんにちは";
    let slice = &s[0..3]; // "こ"
    println!("{}", slice);

    // let invalid = &s[0..2]; // This panics at runtime
}

Character Access and Performance Trade-offs

You can safely access characters using:

fn main() {
    let s = "こんにちは";

    // Method 1: .chars().nth(n) — O(n) time complexity
    let third = s.chars().nth(2).unwrap();
    println!("Third character: {}", third);

    // Method 2: Iterate over chars
    for (i, c) in s.chars().enumerate() {
        println!("Char {}: {}", i, c);
    }

    // Method 3: Iterate bytes
    for b in s.bytes() {
        println!("Byte: {}", b);
    }

    // Char-based slicing
    let sub: String = s.chars().skip(1).take(2).collect();
    println!("Char slice: {}", sub); // "んに"
}

.chars().nth(n) must iterate up to n, making it O(n) — less efficient than direct indexing in fixed-width encodings.

A Note on char in Rust

Rust’s char represents a Unicode scalar value, not a “grapheme cluster”. It’s always 4 bytes (32-bit) and can store any valid Unicode scalar.

Comparison: C++ and Python

In contrast:

C++:

#include<iostream>
#include<string>
int main() {
    std::string s = "hello";
    char c = s[0]; // OK — but byte-based
}

C++ treats strings as raw byte arrays. UTF-8 semantics are ignored unless explicitly handled — leading to subtle bugs or vulnerabilities.

Python:

s = "こんにちは"
print(s[0])  # "こ"
print(s[1])  # "ん"

Python strings are sequences of Unicode code points, so indexing behaves intuitively, similar to chars().nth(n) in Rust but with native support.

Unicode Is Hard: Emojis & Combining Characters

fn main() {
    let emoji = "Hello 🦀 Rust!";
    println!("Bytes: {}", emoji.len());  // 24
    println!("Chars: {}", emoji.chars().count());  // 13

    let combined = "e\u{301}"; // 'e' + combining acute
    println!("Bytes: {}", combined.len());  // 3
    println!("Chars: {}", combined.chars().count());  // 2

    let family = "👨‍👩‍👧‍👦";
    println!("Bytes: {}", family.len());  // 25
    println!("Chars: {}", family.chars().count());  // 7

    // Requires `unicode-segmentation` crate
    use unicode_segmentation::UnicodeSegmentation;
    println!("Graphemes: {}", family.graphemes(true).count()); // 1
}

The family emoji consists of multiple Unicode code points connected via zero-width joiners. Visually it’s one glyph, but .chars() sees seven.

To handle these properly, use unicode-segmentation to work with grapheme clusters, the user-perceived “characters”.

Conclusion

Rust does not allow s[0] on strings because it chooses safety and correctness over convenience. UTF-8 is a variable-width encoding. Indexing by byte without context can yield invalid slices or crash your program.

Rust’s strictness may feel annoying at first, but it protects you from subtle bugs that other languages let slip through.

If you’re dealing with strings in Rust: ✨ Embrace chars(), avoid [], and respect Unicode.

© 2025 Daisuke Kuriyama