Scaling Simply

With Virtual Threads

Nicolai Parlog

nipafx.dev / @nipafx

Java

youtube.com/java / @java

Developer Advocate

Java Team at Oracle

Lots to talk about!

Virtual Threads

Preparing Your Code

Structured Concurrency

Project Loom

Slides: slides.nipafx.dev/virtual-threads
Code: github.com/nipafx/loom-lab

Scaling Simply

Virtual Threads

Preparing Your Code

Structured Concurrency

Project Loom

A simple web request

Imagine a hypothetical HTTP request:

interpret request
query database (blocks)
process data for response

Resource utilization:

good for 1. and 3.
really bad for 2.

How to implement that request?

Synchronous

Align application’s unit of concurrency (request)
with Java’s unit of concurrency (thread):

use thread per request
simple to write, debug, profile
blocks threads on certain calls
limited number of platform threads
⇝ bad resource utilization
⇝ low throughput

Asynchronous

Only use threads for actual computations:

use non-blocking APIs (futures / reactive streams)
harder to write, challenging to debug/profile
incompatible with synchronous code
shares platform threads
⇝ great resource utilization
⇝ high throughput

Conflict!

There’s a conflict between:

simplicity
throughput

Nota Bene

There are other conflicts:

design vs performance (⇝ Valhalla)
explicitness vs succinctness (⇝ Amber)
flexibility vs safety (⇝ Panama)
optimization vs specification (⇝ Leyden)

Enter virtual threads!

A virtual thread:

is a regular Thread
low memory footprint ([k]bytes)
small switching cost
scheduled by the Java runtime
requires no OS thread when waiting

Virtual things

Virtual memory:

maps large virtual address space
to limited physical memory
gives illusion of plentiful memory

Virtual threads:

map large number of virtual threads
to a small number of OS threads
give the illusion of plentiful threads

Virtual things

Programs rarely care about virtual vs physical memory.

Programs need rarely care about virtual vs platform thread.

Instead:

write straightforward (blocking) code
runtime shares available OS threads
reduces the cost of blocking to near zero

Example

try (var executor = Executors
		.newVirtualThreadPerTaskExecutor()) {
	IntStream
		.range(0, 1_000_000)
		.forEach(number -> {
			executor.submit(() -> {
				Thread.sleep(Duration.ofSeconds(1));
				return number;
			});
		});
} // executor.close() is called implicitly, and waits

Effects

Virtual threads:

remove "number of threads" as bottleneck
match app’s unit of concurrency to Java’s

⇝ simplicity && throughput

Performance

Virtual threads aren’t "faster threads":

same number of CPU cycles
each task takes the same time (same latency)

So why bother?

Parallelism vs concurrency

	Parallelism	Concurrency
Task origin	solution	problem
Control	developer	environment
Resource use	coordinated	competitive
Metric	latency	throughput
Abstraction	CPU cores	tasks
# of threads	# of cores	# of tasks

Parallelism

Concurrency

Task origin

solution

problem

Control

developer

environment

Resource use

coordinated

competitive

Metric

latency

throughput

Abstraction

CPU cores

tasks

# of threads

# of cores

# of tasks

Performance

When workload is not CPU-bound:

start waiting as early as possible
for as many tasks as possible

⇝ Virtual threads increase throughput:

when workload is not CPU-bound
when number of concurrent tasks is high

Server How-To

For servers:

request handling threads are started by web framework
frameworks will offer (easy) configuration options

We’re not there yet.

Spring Boot

Replace executors:

@Bean
public TomcatProtocolHandlerCustomizer<?>
		createExecutorForSyncCalls() {
	return handler -> handler.setExecutor(
			Executors.newVirtualThreadPerTaskExecutor());
}

@Bean
public AsyncTaskExecutor
		createExecutorForAsyncCalls() {
	return new TaskExecutorAdapter(
			Executors.newVirtualThreadPerTaskExecutor());
}

Quarkus

Annotate request handling method:

@GET
@Path("api")
@RunOnVirtualThread
public String handle() {
	// ...
}

(Requires --add-opens java.base/java.lang=ALL-UNNAMED.)

Virtual Threads

Go forth and multiply (your threads)

Scaling Simply

Virtual Threads

Preparing Your Code

Structured Concurrency

Project Loom

Preparing your code

Virtual threads:

always work correctly
may not scale perfectly

Code changes can improve scalability
(and maintainability, debuggability, observability).

Avoid thread pools

Only pool expensive resources
but virtual threads are cheap.

⇝ Replace thread pools (for concurrency),
with virtual threads plus, e.g., semaphores.

With thread pools

// limits concurrent queries but pools 👎🏾
private static final ExecutorService DB_POOL =
	Executors.newFixedThreadPool(16);

public <T> Future<T> queryDatabase(Callable<T> query) {
	return DB_POOL.submit(query);
}

With semaphore

// limits concurrent queries without pool 👍🏾
private static final Semaphore DB_SEMAPHORE =
	new Semaphore(16);

public <T> T queryDatabase(Callable<T> query)
		throws Exception {
	DB_SEMAPHORE.acquire();
	try {
		return query.call();
	} finally {
		DB_SEMAPHORE.release();
	}
}

Where are the virtual threads? ⇝ Later.

Caveats

To understand virtual thread caveats
we need to understand how they work.

(Also, it’s very interesting.)

Under the hood

The Java runtime manages virtual threads:

runs them on a pool of carrier threads
on blocking call:
- internally calls non-blocking operation
- unmounts from carrier thread!
when call returns:
- mounts to (other) carrier thread
- continues

Stack chunks

A virtual thread stack:

when waiting, is stored on heap (stack chunk objects)
when continuing, is lazily streamed to stack

This keeps switching cheap.

The simple web request

Remember the hypothetical request:

interpret request
query database (blocks)
process data for response

In a virtual thread:

runtime submits task to carrier thread pool
when 2. blocks, virtual thread unmounts
runtime hands carrier thread back to pool
when 2. unblocks, runtime resubmits task
virtual thread mounts and continues with 3.

Compatibility

Virtual threads work correctly with everything:

all blocking operations
synchronized
Thread, currentThread, etc.
thread interruption
thread-locals
native code

But not all scale perfectly.

Caveat #1: capture

Despite lots of internal rework (e.g. JEPs 353, 373)
not all blocking operations unmount.

Some capture platform thread:

Object::wait
file I/O (⇝ io_uring)

⇝ Compensated by temporarily growing carrier pool.

⚠️ Problematic when capturing operations dominate.

Caveat #2: pinning

Some operations pin (operations don’t unmount):

native method call (JNI)
foreign function call (FFM)
synchronized block (for now)

⇝ No compensation

⚠️ Problematic when:

pinning is frequent
contains blocking operations

Avoid long-running pins

If possible:

avoid pinning operations
remove blocking operations
from pinning code sections.

With synchronization

// guarantees sequential access, but pins (for now) 👎🏾
public synchronized String accessResource() {
	return access();
}

With lock

// guarantees sequential access without pinning 👍🏾
private static final ReentrantLock LOCK =
	new ReentrantLock();

public String accessResource() {
	// lock guarantees sequential access
	LOCK.lock();
	try {
		return access();
	} finally {
		LOCK.unlock();
	}
}

Caveat #3: thread locals

Thread-locals can hinder scalability:

can be inherited
to keep them thread-local,
values are copied
can occupy lots of memory

(There are also API shortcomings.)

⇝ Refactor to scoped values (JEP 446).

With thread-local

// copies value for each inheriting thread 👎🏾
static final ThreadLocal<Principal> PRINCIPAL =
	new ThreadLocal<>();

public void serve(Request request, Response response) {
	var level = request.isAdmin() ? ADMIN : GUEST;
	var principal = new Principal(level);
	PRINCIPAL.set(principal);
	Application.handle(request, response);
}

With scoped value

// immutable, so no copies needed 👍🏾
static final ScopedValue<Principal> PRINCIPAL =
	new ScopedValue<>();

public void serve(Request request, Response response) {
	var level = request.isAdmin() ? ADMIN : GUEST;
	var principal = new Principal(level);
	ScopedValue
		.where(PRINCIPAL, principal)
		.run(() -> Application
			.handle(request, response));
}

Preparing your code:

Most importantly:

replace thread pools with semaphores

Also helpful:

remove long-running I/O from pinned sections
replace thread-locals with scoped values
replace synchronized with locks

Scaling Simply

Virtual Threads

Preparing Your Code

Structured Concurrency

Project Loom

Unlocking the full potential

Virtual threads are cheap and plentiful:

no pooling necessary
allows thread per task
allows liberal creation
of threads for subtasks

⇝ Enables new concurrency programming model.

A first step

Whenever you need concurrent subtasks,
spawn virtual threads for each:

private static final ExecutorService VIRTUAL =
	Executors.newVirtualThreadPerTaskExecutor()

void handle(Request request, Response response)
		throws InterruptedException {
	var taskA = VIRTUAL.submit(this::doA);
	var taskB = VIRTUAL.submit(this::doB);
	response.send(taskA.get() + taskB.get());
}

But we can do (much) better!

Structured programming

Emerged when the sea of statements and GOTOs
became unmaintainable:

prescribes control structures
prescribes single entry point
and clearly defined exit points
influenced languages and runtimes

The stricter approach made code (much) clearer!

Unstructured concurrency

private static final ExecutorService VIRTUAL =
	Executors.newVirtualThreadPerTaskExecutor()

void handle(Request request, Response response)
		throws InterruptedException {
	// what's the relationship between
	// this and the two spawned threads?
	// what happens when one of them fails?
	var taskA = VIRTUAL.submit(this::doA);
	var taskB = VIRTUAL.submit(this::doB);
	// what if we only need the faster one?
	response.send(taskA.get() + taskB.get());
}

Structured concurrency

When the flow of execution splits
into multiple concurrent flows,
they rejoin in the same code block.

⇝ Thread lifecycle is simple:

starts when task begins
ends on completion

⇝ Enables parent-child/sibling relationships
and logical grouping of threads.

Structured concurrency

void handle(Request request, Response response)
		throws InterruptedException {
	// define explicit success/error handling
	try (var scope = new StructuredTaskScope
			.ShutdownOnFailure()) {
		var taskA = scope.fork(this::doA);
		var taskB = scope.fork(this::doB);
		// wait explicitly until success criteria met
		scope.join();
		scope.throwIfFailed();
		response.send(taskA.get() + taskB.get());
	} catch (ExecutionException ex) {
		response.fail(ex);
	}
}

Benefits

success/failure policy can be defined
across all children
create your own (explicit) policies
forked tasks are children of the scope
creates relationship between threads

Observability

Every task scope thread knows its parent!

During debugging/analyzing, you can:

navigate that thread’s stack
navigate to its parent thread
GOTO 1

A task’s entire context is visible.

Scaling Simply

Virtual Threads

Preparing Your Code

Structured Concurrency

Project Loom

JVM features and APIs for supporting easy-to-use, high-throughput, lightweight concurrency and new programming models

Profile:

project / wiki / mailing list / early access builds
launched January 2018
led by Ron Pressler

Project Loom

Virtual threads:

code is simple to write, debug, profile
better scalibility / higher throughput

Structured concurrency:

clearer concurrency code
simpler failure/success policies
better debugging

Deeper Dives

📝 On Parallelism and Concurrency
📝 Structured Concurrency
📝 Notes on structured concurrency […]
🎥 Modern, Scalable Concurrency for the Java Platform
(Sep 2021)
🎥 State of Project Loom with Ron Pressler (Jun 2021)
🎥 Java 19 Virtual Threads - JEP Café #11 (Jun 2022)

So long…

37% off with
code fccparlog

bit.ly/the-jms

Slides at slides.nipafx.dev
⇜ Get my book!

Follow Nicolai

nipafx.dev
/nipafx

Follow Java

inside.java // dev.java
/java // /openjdk

Scaling Simply

With Virtual Threads

Lots to talk about!

Scaling Simply

A simple web request

Synchronous

Asynchronous

Conflict!

Nota Bene

Enter virtual threads!

Virtual things

Virtual things

Example

Effects

Performance

Parallelism vs concurrency

Performance

Server How-To

Spring Boot

Quarkus

Virtual Threads

Scaling Simply

Preparing your code

Avoid thread pools

With thread pools

With semaphore

Caveats

Under the hood

Stack chunks

The simple web request

Compatibility

Caveat #1: capture

Caveat #2: pinning

Avoid long-running pins

With synchronization

With lock

Caveat #3: thread locals

With thread-local

With scoped value

Preparing your code:

Scaling Simply

Unlocking the full potential

A first step

Structured programming

Unstructured concurrency

Structured concurrency

Structured concurrency

Benefits

Observability

Scaling Simply

Project Loom

Project Loom

Deeper Dives

So long…​

More

Follow Nicolai

Follow Java

Image Credits

So long…