1

So i was just tinkering with C libs in Rust and i found that the following code:

extern crate libc;
use libc::{c_char, c_int, size_t};


extern "C" {

    fn printf(fmt: *const c_char, ...) -> c_int;
    
    fn strlen(arr: *const c_char) -> size_t;
}

fn main() {
    unsafe {
        printf("This uses C's standard lib printf".as_ptr() as *const i8);
        print!("\n");
        let s = "Useless thing again";
        print!("Length of {}: ", s);
        let x = strlen(s.as_ptr() as *const i8);
        print!("{}", &x);
    }
}

Will produce this:

This uses C's standard lib printf

Length of Useless thing again: 31  

strlen() also counted the string slice inside print! macro. But if i do this:

extern crate libc;
use libc::{c_char, c_int, size_t};


extern "C" {

    fn printf(fmt: *const c_char, ...) -> c_int;
    
    fn strlen(arr: *const c_char) -> size_t;
}

fn main() {
    unsafe {
        printf("This uses C's standard lib printf".as_ptr() as *const i8);
        print!("\n");
        print!("blah blah blah\n");
        let s = "Useless thing again";
        let x = strlen(s.as_ptr() as *const i8);
        print!("{}", &x);
    }
}

It will produce this:

This uses C's standard lib printf

blah blah blah
19

It counted "Useless thing again" correctly and won't count anything above s variable. I know it probably has some kind of connection with memory but i am actually quite new to low level. Can i have some detailed explanations?

5
  • 3
    As you may know, C used a null character ('\0' = 0) to indicate the end of a string (char*). Because of this behavior, all string literals in C have an implicit null character. However, since Rust opts to store the length of strings at runtime like C++'s std::string, the null character is no longer needed. Once compiled, these strings are stored in the text section of the binary file. In order to save space, the compiler omitted the null character and likely put all of the string literals together. Commented Feb 11, 2021 at 18:44
  • While I can't say for sure, I imagine if you were to printf the character pointer from the first example (or look at the text section of the binary file) it would look something like this: "Useless thing againLength of : " Commented Feb 11, 2021 at 18:46
  • As for why the first string didn't come out looking malformed, I imagine the compiler saw that you immediately converted the string literal into a pointer and added the null terminator for safety. Commented Feb 11, 2021 at 18:51
  • @Locke The reason it sort of appears to work in the first program is almost certainly because of line buffering. C's printf and Rust's print! use different buffers. The string sent to printf might be This uses C's standard lib printf\nUseless thing againLength of : (just all the literals joined together), but only up to the \n will be printed by C, and Rust doesn't know about C's stdout buffer, so the rest will just be left there. The Rust compiler is almost certainly not injecting null bytes "for safety" (although it's anyone's guess why the second program seems to work properly). Commented Feb 11, 2021 at 20:18
  • You can tell that printf doesn't stop at the end of the string because there's an empty line in the output before Length of Useless thing again -- that's the same \n being printed twice, first by C and then by Rust. Commented Feb 11, 2021 at 20:21

1 Answer 1

6

This boils down to the difference between C strings, fat pointers, and how string literals are stored in an executable.

C Strings

As you may already know, C represents a string as a char *. Since there is no way to know when to stop reading the string from memory, a null terminator (a byte with a value of 0) is added to the end.

So what strlen does is it just counts the number of bytes until it finds a byte with a value of 0. printf does something similar except it outputs what it finds to stdout.

// This string occupies 5 bytes of memory due to the implicit null terminator
char *string_literal = "test";
// ['t', 'e', 's', 't', 0]

Fat Pointers

However there can be issues with the C String approach. If you want to take a substring, you need to either modify the original string to add a new null terminator or copy the desired section to a new part of memory. The solution to this is to store the length of the string with the pointer

// This isn't technically correct, but it is easier to think of this way
pub struct string {
    ptr: *const i8,
    length: usize,
}

You can see fat pointers used in C++'s std::string and Rust's slices. Since Rust decided to use fat pointers as the default, the compiler will choose not to include a null terminator when possible to save space.

Memory

In a Linux executable (ELF format), all of the string literals and constants used in your code are added at the compiler's discretion to the text section of the binary.

Without knowing too much, I'm going to guess what the text section for the first code sample looks like:

This uses C's standard lib printf\0\nUseless thing againLength of : \0

I got this approximation by just putting together all of the string literals in the order they were given in the code and remove the parts that would be removed at compile time such as the {} in rust's print statements. With this naïve estimation, we actually see exactly 31 characters before the null terminator matching the output of the first code sample. You can verify this yourself using objdump -sj .text executable_file (assuming I got that command right).

Exceptions

One thing I would like to point out is that the length of a character is not fixed. For example a Unicode character can be 4 bytes long. So if you plan on passing a string to c, it is recommended that you use a binary string instead to be more explicit about the data type and add the null terminator directly if you are unsure whether it will be conveyed.

// The b converts the string to a [u8; N] and \0 is the null terminator.
let example = b"test 123\0";
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much for the explanation! I know how strlen works know and yes, using the null terminator solved my problem!
As can be seen here, the compiler does not "add an implicit null terminator for safety" (notice the use of .ascii versus .asciz in the generated assembly).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.