Why You Can't Index Strings in Rust
Have you ever tried accessing a Rust String
with s[0]
only to be greeted by a compiler error?
If you’re coming from Python or C++, this might feel unexpectedly restrictive.
Consider the following snippet:
fn main() {
let s = String::from("hello");
let c = s[0];
}
Attempting to compile this results in the following error:
error[E0277]: the type `str` cannot be indexed by `{integer}`
--> src/main.rs:3:15
|
3 | let c = s[0];
| ^ string indices are ranges of `usize`
|
= note: you can use `.chars().nth()` or `.bytes().nth()`
for more information, see chapter 8 in The Book: <https://doc.rust-lang.org/book/ch08-02-strings.html#indexing-into-strings>
The Core Reason: Rust Strings Are UTF-8
In Rust, strings are stored as UTF-8 encoded byte sequences. This means “one character ≠ one byte”.
Multibyte characters (e.g. Japanese, emojis) cannot be accessed via a single byte index without risking invalid UTF-8 data. The compiler rightly prevents such unsafe operations.
String vs. &str
Rust primarily provides two string types:
String
: heap-allocated, growable, and owns its contents&str
: string slice — a view into a sequence of UTF-8 bytes
Example:
fn main() {
let mut owned = String::from("Hello");
owned.push_str(", world!");
let slice = &owned[0..5]; // "Hello"
let literal = "こんにちは"; // &'static str
println!("{}", owned); // Hello, world!
println!("{}", slice); // Hello
println!("{}", literal); // こんにちは
}
Both String
and &str
enforce the same safety guarantees against invalid UTF-8 indexing.
Consider:
fn main() {
let s = "こんにちは";
println!("Bytes: {}", s.len()); // 15 bytes
println!("Characters: {}", s.chars().count()); // 5 chars
}
Each Japanese character is 3 bytes. So s[1]
would incorrectly refer to the second byte, not the second character. That’s why Rust disallows this access entirely.
Safe Byte Slice (Only at Char Boundaries)
fn main() {
let s = "こんにちは";
let slice = &s[0..3]; // "こ"
println!("{}", slice);
// let invalid = &s[0..2]; // This panics at runtime
}
Character Access and Performance Trade-offs
You can safely access characters using:
fn main() {
let s = "こんにちは";
// Method 1: .chars().nth(n) — O(n) time complexity
let third = s.chars().nth(2).unwrap();
println!("Third character: {}", third);
// Method 2: Iterate over chars
for (i, c) in s.chars().enumerate() {
println!("Char {}: {}", i, c);
}
// Method 3: Iterate bytes
for b in s.bytes() {
println!("Byte: {}", b);
}
// Char-based slicing
let sub: String = s.chars().skip(1).take(2).collect();
println!("Char slice: {}", sub); // "んに"
}
.chars().nth(n)
must iterate up to n
, making it O(n) — less efficient than direct indexing in fixed-width encodings.
A Note on char
in Rust
Rust’s char
represents a Unicode scalar value, not a “grapheme cluster”. It’s always 4 bytes (32-bit) and can store any valid Unicode scalar.
Comparison: C++ and Python
In contrast:
C++:
#include<iostream>
#include<string>
int main() {
std::string s = "hello";
char c = s[0]; // OK — but byte-based
}
C++ treats strings as raw byte arrays. UTF-8 semantics are ignored unless explicitly handled — leading to subtle bugs or vulnerabilities.
Python:
s = "こんにちは"
print(s[0]) # "こ"
print(s[1]) # "ん"
Python strings are sequences of Unicode code points, so indexing behaves intuitively, similar to chars().nth(n)
in Rust but with native support.
Unicode Is Hard: Emojis & Combining Characters
fn main() {
let emoji = "Hello 🦀 Rust!";
println!("Bytes: {}", emoji.len()); // 24
println!("Chars: {}", emoji.chars().count()); // 13
let combined = "e\u{301}"; // 'e' + combining acute
println!("Bytes: {}", combined.len()); // 3
println!("Chars: {}", combined.chars().count()); // 2
let family = "👨👩👧👦";
println!("Bytes: {}", family.len()); // 25
println!("Chars: {}", family.chars().count()); // 7
// Requires `unicode-segmentation` crate
use unicode_segmentation::UnicodeSegmentation;
println!("Graphemes: {}", family.graphemes(true).count()); // 1
}
The family emoji consists of multiple Unicode code points connected via zero-width joiners. Visually it’s one glyph, but .chars()
sees seven.
To handle these properly, use unicode-segmentation to work with grapheme clusters, the user-perceived “characters”.
Conclusion
Rust does not allow s[0]
on strings because it chooses safety and correctness over convenience.
UTF-8 is a variable-width encoding. Indexing by byte without context can yield invalid slices or crash your program.
Rust’s strictness may feel annoying at first, but it protects you from subtle bugs that other languages let slip through.
If you’re dealing with strings in Rust:
✨ Embrace chars()
, avoid []
, and respect Unicode.