Scaling Simply

With Virtual Threads

Developer Advocate

Java Team at Oracle

Lots to talk about!

Virtual Threads
Preparing Your Code
Structured Concurrency
Project Loom

Scaling Simply

Virtual Threads
Preparing Your Code
Structured Concurrency
Project Loom

A simple web request

Imagine a hypothetical HTTP request:

  1. interpret request

  2. query database (blocks)

  3. process data for response

Resource utilization:

  • good for 1. and 3.

  • really bad for 2.

How to implement that request?

Synchronous

Align application’s unit of concurrency (request)
with Java’s unit of concurrency (thread):

  • use thread per request

  • simple to write, debug, profile

  • blocks threads on certain calls

  • limited number of platform threads
    ⇝ bad resource utilization
    ⇝ low throughput

Asynchronous

Only use threads for actual computations:

  • use non-blocking APIs (futures / reactive streams)

  • harder to write, challenging to debug/profile

  • incompatible with synchronous code

  • shares platform threads
    ⇝ great resource utilization
    ⇝ high throughput

Conflict!

There’s a conflict between:

  • simplicity

  • throughput

Nota Bene

There are other conflicts:

  • design vs performance (⇝ Valhalla)

  • explicitness vs succinctness (⇝ Amber)

  • flexibility vs safety (⇝ Panama)

  • optimization vs specification (⇝ Leyden)

Enter virtual threads!

A virtual thread:

  • is a regular Thread

  • low memory footprint ([k]bytes)

  • small switching cost

  • scheduled by the Java runtime

  • requires no OS thread when waiting

Virtual things

Virtual memory:

  • maps large virtual address space
    to limited physical memory

  • gives illusion of plentiful memory

Virtual threads:

  • map large number of virtual threads
    to a small number of OS threads

  • give the illusion of plentiful threads

Virtual things

Programs rarely care about virtual vs physical memory.

Programs need rarely care about virtual vs platform thread.

Instead:

  • write straightforward (blocking) code

  • runtime shares available OS threads

  • reduces the cost of blocking to near zero

Example

try (var executor = Executors
		.newVirtualThreadPerTaskExecutor()) {
	IntStream
		.range(0, 1_000_000)
		.forEach(number -> {
			executor.submit(() -> {
				Thread.sleep(Duration.ofSeconds(1));
				return number;
			});
		});
} // executor.close() is called implicitly, and waits

Effects

Virtual threads:

  • remove "number of threads" as bottleneck

  • match app’s unit of concurrency to Java’s

simplicity && throughput

Performance

Virtual threads aren’t "faster threads":

  • same number of CPU cycles

  • each task takes the same time (same latency)

So why bother?

Parallelism vs concurrency

ParallelismConcurrency

Task origin

solution

problem

Control

developer

environment

Resource use

coordinated

competitive

Metric

latency

throughput

Abstraction

CPU cores

tasks

# of threads

# of cores

# of tasks

Performance

When workload is not CPU-bound:

  • start waiting as early as possible

  • for as many tasks as possible

⇝ Virtual threads increase throughput:

  • when workload is not CPU-bound

  • when number of concurrent tasks is high

Server How-To

For servers:

  • request handling threads are started by web framework

  • frameworks will offer (easy) configuration options

We’re not there yet.

Spring Boot

Replace executors:

@Bean
public TomcatProtocolHandlerCustomizer<?>
		createExecutorForSyncCalls() {
	return handler -> handler.setExecutor(
			Executors.newVirtualThreadPerTaskExecutor());
}

@Bean
public AsyncTaskExecutor
		createExecutorForAsyncCalls() {
	return new TaskExecutorAdapter(
			Executors.newVirtualThreadPerTaskExecutor());
}

Quarkus

Annotate request handling method:

@GET
@Path("api")
@RunOnVirtualThread
public String handle() {
	// ...
}

(Requires --add-opens java.base/java.lang=ALL-UNNAMED.)

Virtual Threads

Go forth and multiply (your threads)

Scaling Simply

Virtual Threads
Preparing Your Code
Structured Concurrency
Project Loom

Preparing your code

Virtual threads:

  • always work correctly

  • may not scale perfectly

Code changes can improve scalability
(and maintainability, debuggability, observability).

Avoid thread pools

Only pool expensive resources
but virtual threads are cheap.

⇝ Replace thread pools (for concurrency),
with virtual threads plus, e.g., semaphores.

With thread pools

// limits concurrent queries but pools 👎🏾
private static final ExecutorService DB_POOL =
	Executors.newFixedThreadPool(16);

public <T> Future<T> queryDatabase(Callable<T> query) {
	return DB_POOL.submit(query);
}

With semaphore

// limits concurrent queries without pool 👍🏾
private static final Semaphore DB_SEMAPHORE =
	new Semaphore(16);

public <T> T queryDatabase(Callable<T> query)
		throws Exception {
	DB_SEMAPHORE.acquire();
	try {
		return query.call();
	} finally {
		DB_SEMAPHORE.release();
	}
}

Where are the virtual threads? ⇝ Later.

Caveats

To understand virtual thread caveats
we need to understand how they work.

(Also, it’s very interesting.)

Under the hood

The Java runtime manages virtual threads:

  • runs them on a pool of carrier threads

  • on blocking call:

    • internally calls non-blocking operation

    • unmounts from carrier thread!

  • when call returns:

    • mounts to (other) carrier thread

    • continues

Stack chunks

A virtual thread stack:

  • when waiting, is stored on heap (stack chunk objects)

  • when continuing, is lazily streamed to stack

This keeps switching cheap.

The simple web request

Remember the hypothetical request:

  1. interpret request

  2. query database (blocks)

  3. process data for response

In a virtual thread:

  • runtime submits task to carrier thread pool

  • when 2. blocks, virtual thread unmounts

  • runtime hands carrier thread back to pool

  • when 2. unblocks, runtime resubmits task

  • virtual thread mounts and continues with 3.

Compatibility

Virtual threads work correctly with everything:

  • all blocking operations

  • synchronized

  • Thread, currentThread, etc.

  • thread interruption

  • thread-locals

  • native code

But not all scale perfectly.

Caveat #1: capture

Despite lots of internal rework (e.g. JEPs 353, 373)
not all blocking operations unmount.

Some capture platform thread:

  • Object::wait

  • file I/O (⇝ io_uring)

⇝ Compensated by temporarily growing carrier pool.

⚠️ Problematic when capturing operations dominate.

Caveat #2: pinning

Some operations pin (operations don’t unmount):

  • native method call (JNI)

  • foreign function call (FFM)

  • synchronized block (for now)

⇝ No compensation

⚠️ Problematic when:

  • pinning is frequent

  • contains blocking operations

Avoid long-running pins

If possible:

  • avoid pinning operations

  • remove blocking operations
    from pinning code sections.

With synchronization

// guarantees sequential access, but pins (for now) 👎🏾
public synchronized String accessResource() {
	return access();
}

With lock

// guarantees sequential access without pinning 👍🏾
private static final ReentrantLock LOCK =
	new ReentrantLock();

public String accessResource() {
	// lock guarantees sequential access
	LOCK.lock();
	try {
		return access();
	} finally {
		LOCK.unlock();
	}
}

Caveat #3: thread locals

Thread-locals can hinder scalability:

  • can be inherited

  • to keep them thread-local,
    values are copied

  • can occupy lots of memory

(There are also API shortcomings.)

⇝ Refactor to scoped values (JEP 446).

With thread-local

// copies value for each inheriting thread 👎🏾
static final ThreadLocal<Principal> PRINCIPAL =
	new ThreadLocal<>();

public void serve(Request request, Response response) {
	var level = request.isAdmin() ? ADMIN : GUEST;
	var principal = new Principal(level);
	PRINCIPAL.set(principal);
	Application.handle(request, response);
}

With scoped value

// immutable, so no copies needed 👍🏾
static final ScopedValue<Principal> PRINCIPAL =
	new ScopedValue<>();

public void serve(Request request, Response response) {
	var level = request.isAdmin() ? ADMIN : GUEST;
	var principal = new Principal(level);
	ScopedValue
		.where(PRINCIPAL, principal)
		.run(() -> Application
			.handle(request, response));
}

Preparing your code:

Most importantly:

  1. replace thread pools with semaphores

Also helpful:

  1. remove long-running I/O from pinned sections

  2. replace thread-locals with scoped values

  3. replace synchronized with locks

Scaling Simply

Virtual Threads
Preparing Your Code
Structured Concurrency
Project Loom

Unlocking the full potential

Virtual threads are cheap and plentiful:

  • no pooling necessary

  • allows thread per task

  • allows liberal creation
    of threads for subtasks

⇝ Enables new concurrency programming model.

A first step

Whenever you need concurrent subtasks,
spawn virtual threads for each:

private static final ExecutorService VIRTUAL =
	Executors.newVirtualThreadPerTaskExecutor()

void handle(Request request, Response response)
		throws InterruptedException {
	var taskA = VIRTUAL.submit(this::doA);
	var taskB = VIRTUAL.submit(this::doB);
	response.send(taskA.get() + taskB.get());
}

But we can do (much) better!

Structured programming

Emerged when the sea of statements and GOTOs
became unmaintainable:

  • prescribes control structures

  • prescribes single entry point
    and clearly defined exit points

  • influenced languages and runtimes

The stricter approach made code (much) clearer!

Unstructured concurrency

private static final ExecutorService VIRTUAL =
	Executors.newVirtualThreadPerTaskExecutor()

void handle(Request request, Response response)
		throws InterruptedException {
	// what's the relationship between
	// this and the two spawned threads?
	// what happens when one of them fails?
	var taskA = VIRTUAL.submit(this::doA);
	var taskB = VIRTUAL.submit(this::doB);
	// what if we only need the faster one?
	response.send(taskA.get() + taskB.get());
}

Structured concurrency

When the flow of execution splits
into multiple concurrent flows,
they rejoin in the same code block.

⇝ Thread lifecycle is simple:

  • starts when task begins

  • ends on completion

⇝ Enables parent-child/sibling relationships
and logical grouping of threads.

Structured concurrency

void handle(Request request, Response response)
		throws InterruptedException {
	// define explicit success/error handling
	try (var scope = new StructuredTaskScope
			.ShutdownOnFailure()) {
		var taskA = scope.fork(this::doA);
		var taskB = scope.fork(this::doB);
		// wait explicitly until success criteria met
		scope.join();
		scope.throwIfFailed();
		response.send(taskA.get() + taskB.get());
	} catch (ExecutionException ex) {
		response.fail(ex);
	}
}

Benefits

  • success/failure policy can be defined
    across all children

  • create your own (explicit) policies

  • forked tasks are children of the scope

  • creates relationship between threads

Observability

Every task scope thread knows its parent!

During debugging/analyzing, you can:

  1. navigate that thread’s stack

  2. navigate to its parent thread

  3. GOTO 1

A task’s entire context is visible.

Scaling Simply

Virtual Threads
Preparing Your Code
Structured Concurrency
Project Loom

Project Loom

JVM features and APIs for supporting easy-to-use, high-throughput, lightweight concurrency and new programming models

Profile:

Project Loom

Virtual threads:

  • code is simple to write, debug, profile

  • better scalibility / higher throughput

Structured concurrency:

  • clearer concurrency code

  • simpler failure/success policies

  • better debugging

Deeper Dives

So long…​

37% off with
code fccparlog

bit.ly/the-jms

More

Slides at slides.nipafx.dev
⇜ Get my book!

Follow Nicolai

nipafx.dev
/nipafx

Follow Java

inside.java // dev.java
/java    //    /openjdk

Image Credits