Fixing Incorrect Tracing Parent Spans with Futures and JoinSet in Rust
I recently had a client who experienced issues with their tracing
events not being connected with the correct parent events.
I immediately understood the problem on the call, but there were a few details that still confused me.
So I thought it would be great to fully understand those details.
And to share the knowledge gained with the wider Rust community.
This post assumes intermediate Rust knowledge, including familiarity with tracing, async programming, and tokio. It focuses on solving a specific issue with tracing spans in asynchronous contexts.
The following code snippet is a minimal example of the issue.
use std::future::Future;
use tokio::task::JoinSet;
use tracing::{info, instrument};
use tracing_subscriber::fmt;
#[tokio::main]
async fn main() {
// A custom tracing setup to aid in the explanations for this post
fmt()
.pretty()
.with_thread_ids(true)
.with_target(false)
.with_line_number(true)
.init();
info!("in main");
work().await
}
#[instrument]
async fn work() {
info!("starting work");
// Using a [JoinSet] for a dynamic number of subtasks
let mut set = JoinSet::new();
set.spawn(sub_task());
// Wait for all the tasks to be done
while let Some(_) = set.join_next().await {}
}
#[instrument]
fn sub_task() -> impl Future<Output = ()> {
info!("making sub task");
async {
info!("performing task");
}
}
2024-08-20T12:37:48.092480Z INFO in main at src/main.rs:17 on ThreadId(1) 2024-08-20T12:37:48.092622Z INFO starting work at src/main.rs:24 on ThreadId(1) in work 2024-08-20T12:37:48.092641Z INFO making sub task at src/main.rs:37 on ThreadId(1) in work 2024-08-20T12:37:48.092780Z INFO performing task at src/main.rs:40 on ThreadId(17) in sub_task
In main()
we are logging "in main" which is correctly not associated with any tracing span.
Then there is a "starting work" log which is also correctly associated with the work
span.
But then the issues start.
The next "making sub task" is not associated with the work -> sub_task
span hierarchy.
Nor is "performing task" associated with the work -> sub_task
span hierarchy.
Understanding the Causes
So what is happening here? Well, let's first focus on the "making sub task" log.
#[instrument] issues
From the instrument
documentation we see the macro also works with async functions which uses async-trait
.
The documentation for async-trait
explains how it transforms async functions to return a pinned boxed future.
But wait, our sub_task
also returns a future. 🤔
By looking at the tracing::instrument
code, I see it eventually calls this function to determine whether the macro annotated function implements a manual async or not.
According to the function's documentation, it should only detect pin boxed futures.
But it will also check if the last statement in the function is an async expression.
So it seems the instrument
macro is messing up the expanded block.
Let's confirm this by changing the function to not have an async statement as the last item in the function.
#[instrument]
fn sub_task() -> impl Future<Output = ()> {
info!("making sub task");
let fut = async {
info!("performing task");
};
fut
}
2024-08-20T12:39:12.663662Z INFO in main at src/main.rs:17 on ThreadId(1) 2024-08-20T12:39:12.663753Z INFO starting work at src/main.rs:24 on ThreadId(1) in work 2024-08-20T12:39:12.663786Z INFO making sub task at src/main.rs:37 on ThreadId(1) in sub_task in work 2024-08-20T12:39:12.663851Z INFO performing task at src/main.rs:40 on ThreadId(17)
Nice, the "making sub task" is now correctly in the work -> sub_task
hierarchy.
The "performing task" is also technically in the correct hierarchy of nothing.
But I'm getting ahead of myself.
This is also not the fix for this problem. We just confirmed what the problem is.
The correct way to do this will be to create a manual span and enter it like so:
// The [instrument] macro is removed in favour of a manual span
fn sub_task() -> impl Future<Output = ()> {
let _span = tracing::info_span!("sub_task").entered();
info!("making sub task");
async {
info!("performing task");
}
}
2024-08-20T12:39:46.292346Z INFO in main at src/main.rs:17 on ThreadId(1) 2024-08-20T12:39:46.292396Z INFO starting work at src/main.rs:24 on ThreadId(1) in work 2024-08-20T12:39:46.292409Z INFO making sub task at src/main.rs:38 on ThreadId(1) in sub_task in work 2024-08-20T12:39:46.292460Z INFO performing task at src/main.rs:41 on ThreadId(17)
Why no parent span?
Now we can get to the issue I want to address in this article.
The issue my client called me in to address.
Why does "performing task" not have any parent spans while it is in the work -> sub_task
hierarchy?
Well, that question makes an incorrect assumption.
Since async functions & futures in Rust are not started immediately, only the creation of the "performing task" future is in the work -> sub_task
hierarchy.
For all we know that future can be awaited in the work()
function.
Or the work()
function might return it to main()
where it is awaited.
Let's await it in work to better show what I mean:
#[instrument]
async fn work() {
info!("starting work");
// Remember `sub_task` is not marked as async
// But it does return a future we can await
sub_task().await;
}
2024-08-20T12:40:16.513251Z INFO in main at src/main.rs:17 on ThreadId(1) 2024-08-20T12:40:16.513299Z INFO starting work at src/main.rs:24 on ThreadId(1) in work 2024-08-20T12:40:16.513312Z INFO making sub task at src/main.rs:32 on ThreadId(1) in sub_task in work 2024-08-20T12:40:16.513333Z INFO performing task at src/main.rs:35 on ThreadId(1) in work
Great, the log is now correctly linked with the work
parent where it "executed".
So why did it not have the correct parent when a JoinSet
was used?
Well, did you notice that the "performing task" log had a different thread id than the other logs this whole time? It is only with this last code change that it no longer does. Scroll up and check if you missed it.
Manual spawn
While I've never used JoinSet
s before, I knew from a recent YouTube video of Jon Gjengset decrusting the tokio crate that the JoinSet
will start tokio spawns under the hood.
So we can create the same effect by also starting our future on a spawn.
#[instrument]
async fn work() {
info!("starting work");
let _result = tokio::spawn(sub_task()).await;
}
2024-08-20T12:40:55.952528Z INFO in main at src/main.rs:17 on ThreadId(1) 2024-08-20T12:40:55.952591Z INFO starting work at src/main.rs:24 on ThreadId(1) in work 2024-08-20T12:40:55.952603Z INFO making sub task at src/main.rs:32 on ThreadId(1) in sub_task in work 2024-08-20T12:40:55.952674Z INFO performing task at src/main.rs:35 on ThreadId(16)
Now it is "incorrect" again and has a different thread ID just like the JoinSet
.
This confirms starting a manual spawn does the same as a JoinSet
.
If you think about it, then "performing task" not having a parent is actually correct.
Since we spawned it on a new thread, that thread no longer has the work -> sub_task
span hierarchy.
In fact, the "performing task" future is at the root of that thread and therefore correctly does not have any span parents in the logs.
The span documentation already mentions the core of the issue when talking about an entered span being held across an .await
point.
The documentation explains that an issue might arise when another future is started / makes progress at an .await
point.
So it's clear that every future has its own "span context" - for lack of a better term.
And when our future executes on a new thread, it simply does not have any "span context" for its parent.
The Solution
The core of the issue lies in how futures and spawned tasks handle tracing contexts. When a future is spawned on a new thread, it loses its original tracing context. To maintain the correct span hierarchy across thread boundaries, we need to explicitly instrument our futures.
The span documentation also states that an .instrument()
is the way to fix this instead.
#[instrument]
async fn work() {
info!("starting work");
let _result = tokio::spawn(
sub_task().instrument(tracing::info_span!("spawned sub_task"))
).await;
}
2024-08-20T12:41:43.699116Z INFO in main at src/main.rs:17 on ThreadId(1) 2024-08-20T12:41:43.699189Z INFO starting work at src/main.rs:24 on ThreadId(1) in work 2024-08-20T12:41:43.699201Z INFO making sub task at src/main.rs:34 on ThreadId(1) in sub_task in work 2024-08-20T12:41:43.699251Z INFO performing task at src/main.rs:37 on ThreadId(17) in spawned sub_task in work
The work -> spawned sub_task
span hierarchy might seem unexpected, but it's correct.
Here's why:
- We instrument the future with the "spawned sub_task" span, making it the direct parent of any logs within the future.
- This instrumentation happens before the future is spawned on a new thread, where it would have a new context, so it captures
work
as its parent "span context". - When the future executes, it carries this constructed span information with it, maintaining the correct hierarchy even on a different thread.
This approach effectively bridges the span context across thread boundaries.
Just to more clearly show the spawned sub_task
span is created on thread ID 1, the code can be refactored as follow:
#[instrument]
async fn work() {
info!("starting work");
let future = sub_task().instrument(tracing::info_span!("spawned sub_task"));
let _result = tokio::spawn(future).await;
}
The instrumented future is clearly created in the work
context.
JoinSet
again
If we now switch the code back to a JoinSet
then everything is still fine.
#[instrument]
async fn work() {
info!("starting work");
let mut set = JoinSet::new();
let future = sub_task().instrument(tracing::info_span!("spawned sub_task"));
set.spawn(future);
while let Some(_) = set.join_next().await {}
}
2024-08-20T12:42:47.327459Z INFO in main at src/main.rs:17 on ThreadId(1) 2024-08-20T12:42:47.327669Z INFO starting work at src/main.rs:24 on ThreadId(1) in work 2024-08-20T12:42:47.327713Z INFO making sub task at src/main.rs:37 on ThreadId(1) in sub_task in work 2024-08-20T12:42:47.327852Z INFO performing task at src/main.rs:40 on ThreadId(17) in spawned sub_task in work
Conclusion
This article addressed a specific issue where tracing events were not correctly associated with their parent spans when using futures and JoinSet in Rust. We explored the root cause: the loss of tracing context when futures are executed on different threads.
The solution lies in using the .instrument()
method to explicitly attach spans to futures before they're spawned.
This ensures that the correct span hierarchy is maintained across thread boundaries, resulting in accurate and meaningful tracing logs.
When working with asynchronous code in Rust that uses spawn
or JoinSet
, remember to manually instrument your futures to maintain proper tracing context.
About the Author
Pieter Engelbrecht
Rust Consultant & Software Craftsman
A Rust Programming Language Consultant with over 3 years of exclusive Rust experience. I specialize in architecting Rust projects and helping teams adopt Rust effectively.