Virtual Threads
Preparing Your Code
Structured Concurrency
Project Loom

A simple web request

Imagine a hypothetical HTTP request:

  1. interpret request

  2. query database (blocks)

  3. process data for response

Resource utilization:

  • good for 1. and 3.

  • really bad for 2.

How to implement that request?


Align application’s unit of concurrency (request)
with Java’s unit of concurrency (thread):

  • use thread per request

  • simple to write, debug, profile

  • blocks threads on certain calls

  • limited number of platform threads
    ⇝ bad resource utilization
    ⇝ low throughput


Only use threads for actual computations:

  • use non-blocking APIs (futures / reactive streams)

  • harder to write, challenging to debug/profile

  • incompatible with synchronous code

  • shares platform threads
    ⇝ great resource utilization
    ⇝ high throughput


There’s a conflict between:

  • simplicity

  • throughput

Nota Bene

There are other conflicts:

  • design vs performance (⇝ Valhalla)

  • explicitness vs succinctness (⇝ Amber)

  • flexibility vs safety (⇝ Panama)

  • optimization vs specification (⇝ Leyden)

Enter virtual threads!

A virtual thread:

  • is a regular Thread

  • low memory footprint ([k]bytes)

  • small switching cost

  • scheduled by the Java runtime

  • requires no OS thread when waiting

Virtual things

Virtual memory:

  • maps large virtual address space
    to limited physical memory

  • gives illusion of plentiful memory

Virtual threads:

  • map large number of virtual threads
    to a small number of OS threads

  • give the illusion of plentiful threads

Virtual things

Programs rarely care about virtual vs physical memory.

Programs need rarely care about virtual vs platform thread.


  • write straightforward (blocking) code

  • runtime shares available OS threads

  • reduces the cost of blocking to near zero


try (var executor = Executors
		.newVirtualThreadPerTaskExecutor()) {
		.range(0, 1_000_000)
		.forEach(number -> {
			executor.submit(() -> {
				return number;
} // executor.close() is called implicitly, and waits


Virtual threads:

  • remove "number of threads" as bottleneck

  • match app’s unit of concurrency to Java’s

simplicity && throughput


Virtual threads aren’t "faster threads":

  • same number of CPU cycles

  • each task takes the same time (same latency)

So why bother?

Parallelism vs concurrency


Task origin






Resource use







CPU cores


# of threads

# of cores

# of tasks


When workload is not CPU-bound:

  • start waiting as early as possible

  • for as many tasks as possible

⇝ Virtual threads increase throughput:

  • when workload is not CPU-bound

  • when number of concurrent tasks is high

Server How-To

For servers:

  • request handling threads are started by web framework

  • frameworks will offer (easy) configuration options

We’re not there yet.

Spring Boot

Replace executors:

public TomcatProtocolHandlerCustomizer<?>
		createExecutorForSyncCalls() {
	return handler -> handler.setExecutor(

public AsyncTaskExecutor
		createExecutorForAsyncCalls() {
	return new TaskExecutorAdapter(


Annotate request handling method:

public String handle() {
	// ...

(Requires --add-opens java.base/java.lang=ALL-UNNAMED.)

Virtual Threads

Go forth and multiply (your threads)

Scaling Simply

Preparing your code

Virtual threads:

  • always work correctly

  • may not scale perfectly

Code changes can improve scalability
(and maintainability, debuggability, observability).

Avoid thread pools

Only pool expensive resources
but virtual threads are cheap.

⇝ Replace thread pools (for concurrency),
with virtual threads plus, e.g., semaphores.

With thread pools

// limits concurrent queries but pools 👎🏾
private static final ExecutorService DB_POOL =

public <T> Future<T> queryDatabase(Callable<T> query) {
	return DB_POOL.submit(query);

With semaphore

// limits concurrent queries without pool 👍🏾
private static final Semaphore DB_SEMAPHORE =
	new Semaphore(16);

public <T> T queryDatabase(Callable<T> query)
		throws Exception {
	try {
	} finally {

Where are the virtual threads? ⇝ Later.


To understand virtual thread caveats
we need to understand how they work.

(Also, it’s very interesting.)

Under the hood

The Java runtime manages virtual threads:

  • runs them on a pool of carrier threads

  • on blocking call:

    • internally calls non-blocking operation

    • unmounts from carrier thread!

  • when call returns:

    • mounts to (other) carrier thread

    • continues

Stack chunks

A virtual thread stack:

  • when waiting, is stored on heap (stack chunk objects)

  • when continuing, is lazily streamed to stack

This keeps switching cheap.

The simple web request

Remember the hypothetical request:

  1. interpret request

  2. query database (blocks)

  3. process data for response

In a virtual thread:

  • runtime submits task to carrier thread pool

  • when 2. blocks, virtual thread unmounts

  • runtime hands carrier thread back to pool

  • when 2. unblocks, runtime resubmits task

  • virtual thread mounts and continues with 3.


Virtual threads work correctly with everything:

  • all blocking operations

  • synchronized

  • Thread, currentThread, etc.

  • thread interruption

  • thread-locals

  • native code

But not all scale perfectly.

Caveat #1: capture

Despite lots of internal rework (e.g. JEPs 353, 373)
not all blocking operations unmount.

Some capture platform thread:

  • Object::wait

  • file I/O (⇝ io_uring)

⇝ Compensated by temporarily growing carrier pool.

⚠️ Problematic when capturing operations dominate.

Caveat #2: pinning

Some operations pin (operations don’t unmount):

  • native method call (JNI)

  • foreign function call (FFM)

  • synchronized block (for now)

⇝ No compensation

⚠️ Problematic when:

  • pinning is frequent

  • contains blocking operations

Avoid long-running pins

If possible:

  • avoid pinning operations

  • remove blocking operations
    from pinning code sections.

With synchronization

// guarantees sequential access, but pins (for now) 👎🏾
public synchronized String accessResource() {
	return access();

With lock

// guarantees sequential access without pinning 👍🏾
private static final ReentrantLock LOCK =
	new ReentrantLock();

public String accessResource() {
	// lock guarantees sequential access
	try {
		return access();
	} finally {

Caveat #3: thread locals

Thread-locals can hinder scalability:

  • can be inherited

  • to keep them thread-local,
    values are copied

  • can occupy lots of memory

(There are also API shortcomings.)

⇝ Refactor to scoped values (JEP 446).

With thread-local

// copies value for each inheriting thread 👎🏾
static final ThreadLocal<Principal> PRINCIPAL =
	new ThreadLocal<>();

public void serve(Request request, Response response) {
	var level = request.isAdmin() ? ADMIN : GUEST;
	var principal = new Principal(level);
	Application.handle(request, response);

With scoped value

// immutable, so no copies needed 👍🏾
static final ScopedValue<Principal> PRINCIPAL =
	new ScopedValue<>();

public void serve(Request request, Response response) {
	var level = request.isAdmin() ? ADMIN : GUEST;
	var principal = new Principal(level);
		.where(PRINCIPAL, principal)
		.run(() -> Application
			.handle(request, response));

Preparing your code:

Most importantly:

  1. replace thread pools with semaphores

Also helpful:

  1. remove long-running I/O from pinned sections

  2. replace thread-locals with scoped values

  3. replace synchronized with locks

Scaling Simply

Unlocking the full potential

Virtual threads are cheap and plentiful:

  • no pooling necessary

  • allows thread per task

  • allows liberal creation
    of threads for subtasks

⇝ Enables new concurrency programming model.

A first step

Whenever you need concurrent subtasks,
spawn virtual threads for each:

private static final ExecutorService VIRTUAL =

void handle(Request request, Response response)
		throws InterruptedException {
	var taskA = VIRTUAL.submit(this::doA);
	var taskB = VIRTUAL.submit(this::doB);
	response.send(taskA.get() + taskB.get());

But we can do (much) better!

Structured programming

Emerged when the sea of statements and GOTOs
became unmaintainable:

  • prescribes control structures

  • prescribes single entry point
    and clearly defined exit points

  • influenced languages and runtimes

The stricter approach made code (much) clearer!

Unstructured concurrency

private static final ExecutorService VIRTUAL =

void handle(Request request, Response response)
		throws InterruptedException {
	// what's the relationship between
	// this and the two spawned threads?
	// what happens when one of them fails?
	var taskA = VIRTUAL.submit(this::doA);
	var taskB = VIRTUAL.submit(this::doB);
	// what if we only need the faster one?
	response.send(taskA.get() + taskB.get());

Structured concurrency

When the flow of execution splits
into multiple concurrent flows,
they rejoin in the same code block.

⇝ Thread lifecycle is simple:

  • starts when task begins

  • ends on completion

⇝ Enables parent-child/sibling relationships
and logical grouping of threads.

Structured concurrency

void handle(Request request, Response response)
		throws InterruptedException {
	// define explicit success/error handling
	try (var scope = new StructuredTaskScope
			.ShutdownOnFailure()) {
		var taskA = scope.fork(this::doA);
		var taskB = scope.fork(this::doB);
		// wait explicitly until success criteria met
		response.send(taskA.get() + taskB.get());
	} catch (ExecutionException ex) {;


  • success/failure policy can be defined
    across all children

  • create your own (explicit) policies

  • forked tasks are children of the scope

  • creates relationship between threads


Every task scope thread knows its parent!

During debugging/analyzing, you can:

  1. navigate that thread’s stack

  2. navigate to its parent thread

  3. GOTO 1

A task’s entire context is visible.

Scaling Simply

Project Loom

JVM features and APIs for supporting easy-to-use, high-throughput, lightweight concurrency and new programming models


Project Loom

Virtual threads:

  • code is simple to write, debug, profile

  • better scalibility / higher throughput

Structured concurrency:

  • clearer concurrency code

  • simpler failure/success policies

  • better debugging

Deeper Dives

