During Data Races: What Is the Risk of Accessing Half-Written Variables? Is There Any?

Killercat103@slrpnk.net · 8 days ago

During Data Races: What Is the Risk of Accessing Half-Written Variables? Is There Any?

ZILtoid1991@lemmy.world · 3 days ago

I have written a custom synthesizer for my game engine, with the audio running on its own thread. In this case, I managed to get away without any audible issues.

Same with my evdev event readers, that required their own threads, or else they blocked all other threads. No mutexes.

Your mileage may vary though, and probably with extensive testing I could find hazard cases.

e0qdk@reddthat.com · 8 days ago

In general, yes, it’s possible to end up with half-written variables – mutexes (and/or other synchronization primitives) are used to avoid that kind of problem.

Whether you can encounter it in practice depends on the specific programming language, CPU, compiler, and actual instructions involved. Some operations that seem like they should be atomic from the perspective of a high level language are actually implemented as multiple machine code instructions and the thread could be interrupted between them, causing problems, unless steps are deliberately taken to avoid issues with concurrency.

I have minimal experience with Rust, so I’m not sure how bad the footguns you can run into with unsafe are there specifically, but you can definitely blow your leg off in C/C++…

Killercat103@slrpnk.net · 8 days ago

Hm. Good to know. As far as i know. rusts unsafe is kind of like sudo where more tools are granted to you with less checks and balances. Those checks being the borrow checker. So I think unsafe is more like saying “I don’t want the extra protection against bad memory use.” (Like say using memory you already freed). Now im not familiar with what makes a variable atomic or not. But assuming if a variable can be read or written with a single instruction. I’m interpereting that as making it safe?

e0qdk@reddthat.com · 8 days ago

Whether something’s atomic or not depends on the language you use – and if the language was vague about it (like old C) then also how the CPU works.

At the CPU instruction level, there are other factors like how an instruction interacts with memory. Go look up CMPXCHG (from x86) for example, if you want to go down the rabbit hole. There’s a StackOverflow answer here that you might find interesting about using that instruction in combination with the LOCK prefix.

At the language level, there are usually either guarantees (or a lack of guarantees…) about what is safe to do. C++11 (and later) have std::atomic for defining variables that are accessible from multiple threads safely without manually using a mutex, for example. You generally cannot assume that writing to a variable will be safe between threads otherwise (even if you think the operation should compile to a single CPU instruction) without using a supported concurrency mechanism that the compiler knows how to deal with. Consider the case where the compiler chooses to store a value in a register during a loop as an optimization and only write the value back to RAM at the end of the loop – while that value is changed in RAM by another thread! If you use an atomic variable or protect access with a mutex, then the program will behave coherently. If not, you can end up with inconsistent state between threads and then who the fuck knows what will happen. This SA answer might also be interesting to you.

In Python (specifically the cpython implementation), there’s the Global Interpreter Lock (GIL). Some things are safe to do there in that language implementation that aren’t safe to do in C because of the GIL. (You still generally shouldn’t depend on them though since people are trying to remove the GIL to get better performance out of Python…) Basically, cpython guarantees that only one thread can run Python byte code at a time so interactions are serialized at that level even if the OS swaps threads in the middle of cpython computing the behavior of an instruction.

Hope that helps a bit.

hendrik@palaver.p3x.de · 8 days ago

You’re probably doing this on a 32bit or 64bit processor. It always writes 32bit (or 64) at a time, using one instruction. There is no time in between.

Killercat103@slrpnk.net · 8 days ago

Thank you for replying so quickly. Very interesting that it writes 64 bits at a time. (at least on the x86_64 platform i am on) So theres no tangible risk of a cpu processing a read and write instruction in parallel messing up the data that was read?

hendrik@palaver.p3x.de · edit-2 8 days ago

Well, as long as you’re doing single machine instructions. I think. But you might be doing something that’s done in multiple instructions. And you don’t really know what the compiler does, and what machine instructions your code translates to… And there will be other issues. If you allow your code to access stuff in random sequence, you might end up reading before a write, or read after the write. So your variable might be set, or undefined… Depending on the programming language and type, and if it’s in the heap or stack, it could be zero, or whatever happened to be in memory before… I don’t have a clue about Rust. Just think the half-set with primitive types isn’t really how it works. If it’s that short, it will be one of the two. You might be able to do something like it with longer data structures, though. Like do a loop to set a very long string / array. And do something while the other thread is in the middle of writing. That’d be possible.

Killercat103@slrpnk.net · 8 days ago

Well worst case to come with language design i could always learn assembly ;) Half joking but thank you very much for your answer. As for the example the only reason its rust is i figured it would be the easiest language to get the logic right even if i have more hours in C++ technically. Arrays seemed obvious enough that would break but i was unsure about things like pointers and integers. Just find the “lower” levels kinda fun ngl.

hendrik@palaver.p3x.de · edit-2 8 days ago

Hehe, me too. I love microcontroller programming. That kind of forces you (at times) to think about the low-level stuff. And maybe have a look at the CPU datasheet once you go deep down. Something like an ESP32 or RP2040 has 2 CPU cores. And it’s way easier to tell what happens compared to a computer with a complicated operating system in between, and an x86-64 CPU that’s massively complicated and more or less just pretends to execute your machine instructions, but in reality it does all kinds of arcane magic to subdivide them, reorder things and optimize.

(Edit: And with C++ you get to learn all the dirty stuff… How it sometimes initializes variables to zero, sometimes it doesn’t… It’s your job to address memory correctly… Maybe one day I’ll learn Rust instead of all the peculiarities of C++ 😆 And Rust support on microcontrollers is coming along, these days.)

entwine@programming.dev · 8 days ago

Here is an excellent free book on the subject by the guy that invented RCU: https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.2024.12.27a.pdf

lad@programming.dev · 8 days ago

I would expect unsynchronized access to the same memory to be Undefined Behaviour (and I checked that in C++ it is: https://stackoverflow.com/a/79698067/1122720 and then I also checked that in Rust it also is: https://doc.rust-lang.org/reference/behavior-considered-undefined.html)

Considering that, I wouldn’t expect coherent results. In simple cases the compiler may optimise away a lot of code based on the assumption that UB never happens, see also: https://predr.ag/blog/falsehoods-programmers-believe-about-undefined-behavior/

Killercat103@slrpnk.net · 7 days ago

For anyone interested, I did make a second version of the same logic. Even if i kind of had atomic operations in mind i did not know that i was thinking of atomic or what atomic was for that matter. So in a broad sense i assumed raw pointers were atomic. Also worth noting implementing a false Sync for &str seemed to be unnecessary to get it to run in the first example even if i got the impression during construction that it was. Just pretend in the first example “ExemptSyncStringSlice” is just &str if you want to compare them for some reason.

I don’t know the overhead of AtomicPtr compared to raw pointers and it is dependent on operating systems that can use an atomic load store (whatever that means.) But this version seems more “correct” at least.

use std::sync::atomic::{AtomicPtr, Ordering};
use std::time::Duration;


fn main()
{
	let pointer: AtomicPtr<&'static str> = AtomicPtr::new(&mut "Hello!");

	let mut value2: &'static str = "Hi!"; // Place outside scope due to lifetime.

	thread::scope(|scope|
	{
		scope.spawn
		(|| {
			for _ in 1..1000
			{
				unsafe { println!("String = {}",  *pointer.load(Ordering::Relaxed)); }
			}
		});

		scope.spawn
		(|| {
			sleep(Duration::from_millis(1));
			pointer.store(&mut value2, Ordering::Relaxed)
		});
	});
}

Thank you all who responded btw.