Java Next!

Amber to Valhalla, Loom to Leyden, Babylon to Panama

Nicolai Parlog

nipafx.dev / @nipafx

Jforum

#jforum

Developer Advocate

Java Team at Oracle

Lots to talk about!

Project Panama

Project Loom

Project Leyden

Project Valhalla

Project Babylon

Project Amber

Slides at slides.nipafx.dev/java-next.

Project Panama

Interconnecting JVM and native code

Profile:

project / mailing list
launched July 2014
led by Maurizio Cimadamore

Subprojects

vector API
foreign memory API
foreign function API

Vectorization

Given two float arrays a and b,
compute c = - (a² + b²):

// a, b, c have same length
void compute(float[] a, float[] b, float[] c) {
	for (int i = 0; i < a.length; i++) {
		// c = -(a² + b²)
		c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
	}
}

Auto-vectorization

Vectorization - modern CPUs:

have multi-word registers (e.g. 512 bit)
can store several numbers (e.g. 16 floats)
can execute several computations at once

⇝ single instruction, multiple data (SIMD)

Just-in-time compiler tries to vectorize loops.
⇝ Auto-vectorization

Works but isn’t reliable.

Vector API

static final VectorSpecies<Float> VS =
	FloatVector.SPECIES_PREFERRED;

// a, b, c length is multiple of vector length
void compute(float[] a, float[] b, float[] c) {
	int upperBound = VS.loopBound(a.length);
	for (int i = 0; i < upperBound; i += VS.length()) {
		var va = FloatVector.fromArray(VS, a, i);
		var vb = FloatVector.fromArray(VS, b, i);
		// c = -(a² + b²)
		var vc = va.mul(va)
			.add(vb.mul(vb))
			.neg();
		vc.intoArray(c, i);
	}
}

Vector API

Properties:

clear and concise API (given the requirements)
platform agnostic
reliable run-time compilation and performance
graceful degradation

Foreign memory

Storing data off-heap is tough:

ByteBuffer is limited (2GB) and inefficient
Unsafe is… unsafe and not supported

Foreign-memory API

Safe and performant foreign-memory API:

control (de)allocation:
Arena, MemorySegment, SegmentAllocator
to access/manipulate: MemoryLayout, VarHandle

Foreign functions

JNI isn’t ideal:

involves several tedious artifacts (header file, impl, …)
can only interoperate with languages that align
with OS/architecture the JVM was built for
doesn’t reconcile Java/C type systems

Foreign-function API

Streamlined tooling/API for foreign functions
based on method handles:

jextract: generates method handles from header file
classes to call foreign functions
Linker, FunctionDescriptor, SymbolLookup

Project Panama

connects Java with the native world
offers safe, detailed, and performant APIs

Timeline

JDK 21:

foreign APIs in 3rd preview (JEP 442)
vector API in 6th incubation (JEP 448)

JDK 22:

foreign APIs finalize (JEP 454)

JDK 24:

vector API in 9th incubation (JEP 489),
still waiting for Valhalla’s value types

Timeline

Current work:

improve memory access performance
reduce startup/warmup cost
refine record mappers
improve jextract

Deeper Dives

Vector API:

📝 JEP 489: Vector API (Ninth Incubator)
🎥 Fast Java Code with the Vector API (Mar 2023)
🎥 The Vector API in JDK 17 (Sep 2021)
📝 FizzBuzz – SIMD Style! (Mar 2021)

Deeper Dives

Foreign APIs:

📝 design documents
🎥 Panama Update with Maurizio Cimadamore (Jul 2019)
🎥 ByteBuffers are dead, long live ByteBuffers! (Feb 2020)
🎥 The State of Project Panama with Maurizio Cimadamore (Jun 2021)

Project Loom

JVM features and APIs for supporting easy-to-use, high-throughput, lightweight concurrency and new programming models

Profile:

project / wiki / mailing list
launched January 2018
led by Ron Pressler

Motivation

An application with many blocking operations
had two options:

block platform (OS) threads until task completion:
- simple-to-use programming paradigm
- can limit throughput
use asnychronous programming
- harder to write and harder still to debug
- allows higher throughput

Motivation

Resolve the conflict between:

simplicity
throughput

Enter virtual threads!

A virtual thread:

is a regular Thread
low memory footprint (stack + bytes)
small switching cost
scheduled by the Java runtime
executes on platform thread
waits in memory
(no platform thread blocked)

Exceptions

Pinning:

a pinned VT will block the PT
caused by object monitors,
native calls, class initialization

Capture:

a captured VT blocks the PT
caused by file I/O

Object monitor pinning

Object monitor implementation:

was bound to OS threads
required deep refactoring
to work with VTs
fix ships with JDK 24

⇝ No more pinning for synchronized.

Native code pinning

Cause:

native code works on PT’s stack
switching PTs would wreak havoc

Fix:

possible in the JVM, but expensive
fairly easy to avoid

⇝ Don’t call native code, then back to Java, then block.

File I/O capture

File I/O capture is caused by JVM/OS limitations.

Linux io_uring allows async I/O but:

adoption incurrs overhead
considerable compared to cached SSD-reads
cost/benefit is not good

⇝ No fix for now.

Performance

Virtual threads aren’t "faster threads":
Each task takes the same time (same latency).

Virtual threads increase throughput:

when workload is not CPU-bound and
when number of concurrent tasks is high

Use Cases

Virtual threads are cheap and plentiful:

no pooling necessary
allows thread per task
allows liberal creation
of threads for subtasks

⇝ Enables new concurrency programming models.

Structured programming

prescribes single entry point
and clearly defined exit points
influenced languages and runtimes

Structured concurrency

When the flow of execution splits into multiple concurrent flows, they rejoin in the same code block.

⇝ Threads are short-lived:

start when task begins
end on completion

⇝ Enables parent-child/sibling relationships
and logical grouping of threads.

Structured concurrency

void handle(Request request, Response response)
		throws InterruptedException {
	// implicitly short-circuits on error
	try (var scope = StructuredTaskScope.open()) {
		var subtaskA = scope.fork(this::taskA);
		var subtaskB = scope.fork(this::taskB);
		// wait explicitly for success
		// (throws errors if there were any)
		scope.join();

		response.send(subtaskA.get() + subtaskB.get());
	} catch (ExecutionException ex) {
		response.fail(ex);
	}
}

Completion

Use Joiner to configure completion:

how are results collected?
when are subtasks cancelled?
when does join throw?

Pass to StructuredTaskScope.open(Joiner).

Joiners

Existing joiners for heterogeneous results:

awaitAllSuccessfulOrThrow():
- cancels/throws on first error
- default behavior of open()
awaitAll():
- never cancels/throws

Joiners

Existing joiners for homogeneous results:

allSuccessfulOrThrow():
- cancels/throws on first error
- returns Stream<RESULT>
anySuccessfulResultOrThrow()
- cancels/throws if all fail
- returns RESULT

Structured concurrency

forked tasks are children of the scope
(visible in thread dumps)
creates relationship between threads
success/failure policy can be defined
across all children

Sharing data

With ThreadLocal:

static final ThreadLocal<Principal> PRINCIPAL =
	new ThreadLocal<>();

public void serve(Request request, Response response) {
	var level = request.isAdmin() ? ADMIN : GUEST;
	var principal = new Principal(level);
	PRINCIPAL.set(principal);
	Application.handle(request, response);
}

// elsewhere
PRINCIPAL.get()

Sharing data

ThreadLocal downsides:

unconstrained mutability
unbounded lifetime
expensive inheritance

ScopedValues improve on that:

write-once (per thread)
clearly scoped
free inheritance

With scoped value

static final ScopedValue<Principal> PRINCIPAL =
	new ScopedValue<>();

public void serve(Request request, Response response) {
	var level = request.isAdmin() ? ADMIN : GUEST;
	var principal = new Principal(level);
	ScopedValue
		.where(PRINCIPAL, principal)
		.run(() -> Application
			.handle(request, response));
}

// elsewhere
PRINCIPAL.get()

Project Loom

Virtual threads:

code is simple to write, debug, profile
allows high throughput

Structured concurrency:

clearer concurrency code
simpler failure/success policies
better debugging

Scoped values:

safer, more scalable data sharing

Timeline

JDK 21:

virtual threads finalize (JEP 444)
structured concurrency previews (JEP 453)
scoped values preview (JEP 446)

JDK 24:

object monitors no longer pin virtual threads (JEP 491)
structured concurrency in 4th preview (JEP 499)
scoped values in 4th preview (JEP 487)

Timeline

Current work:

finalize structured concurrency and
scoped values APIs
reduce pinning during class initialization
improve lock info in thread dumps

JDK 25:

structured concurrency in 5th preview (JEP 499)
scoped values 🤷🏾‍♂️

Deeper Dives

📝 On Parallelism and Concurrency
📝 Structured Concurrency
📝 Notes on structured concurrency […]
🎥 Modern, Scalable Concurrency for the Java Platform
(Sep 2021)
🎥 State of Project Loom with Ron Pressler (Jun 2021)
🎥 Java 19 Virtual Threads - JEP Café #11 (Jun 2022)

Project Leyden

Faster startup, shorter time to peak performance, smaller footprint

Profile:

project / mailing list / early access builds
launched May 2022
led by Mark Reinhold

Motivation

Java has really good peak performance,
but also tends to have:

slow startup time
slow warmup time
large footprint

For now, Leyden focusses on startup/warmup.

Computation

Two kinds of computation:

expressed by the program
done on behalf of the program, e.g.:
- class-loading
- JIT compilation
- garbage collection

For now, Leyden focusses in the latter.

Startup & Warmup

Early computation on behalf of the program:

class loading
callsite linkage
constant pool resolution
interpretation
profile gathering
JIT compilation (C1, C2)

Shifting Computation

Java already shifts computation:

compile-time constant folding
class loading
garbage collection
out-of-order execution

Let’s shift more computation ahead of time!

What computation?

Shifting Everything

Shift everything ahead of time?

class loading & linking
JIT compilation
method profiling
lambda resolution
dead-code elimination
…

But…

Dynamic Java

Java is highly dynamic:

class loading
class redefinition
linkage
access control
method dispatch
run-time typing (e.g. casting)
introspection
JIT compilation, decompilation

How to AOT everything?

Enter AOTCache

Leyden introduces AOTCache:

observe JVM
capture decisions in AOTCache
(expansion of CDS Archive)
use as "initial state" during future run
fall back to live observation/optimization
if necessary and possible

AOT workflow

# training run (⇝ profile)
$ java -XX:AOTMode=record
       -XX:AOTConfiguration=app.aotconf
       -cp app.jar com.example.App ...
# assembly phase (profile ⇝ AOTCache)
$ java -XX:AOTMode=create
       -XX:AOTConfiguration=app.aotconf
       -XX:AOTCache=app.aot
       -cp app.jar
# production run (AOTCache ⇝ performance)
$ java -XX:AOTCache=app.aot
       -cp app.jar com.example.App ...

(Open to improvements.)

AOT class loading & linking

Introduced by JEP 483:

Improve startup time by making the classes of an application instantly available, in a loaded and linked state, when the HotSpot JVM starts.

Spring PetClinic benchmarks:

up to ~40% startup time reduction
AOT cache size of ~130 MB

AOT class loading & linking

Limitation:

same JDK release / hardware / OS
consistent class path for training and production
consistent module options
limited use of JVMTI agents

Otherwise, AOTCache is ignored.

AOT everything

Leyden’s early access builds AOT more:

method profiling
constant resolution
code compilation
dynamic proxies
reflection data
unfound classes

Benchmarks show ~70% startup time reduction.

Beyond Fallback

Most cached data can be:

validated at runtime
replaced with more accurate
or better data (e.g. JIT code)

More optimizations are possible:

if dynamism is constrained
if program is constrained

Constraining Dynamism

Let developers accept constraints, e.g.:

limited class redefinition
closed-world assumption
fixed program configuration

Let Java apply suitable optimizations.

⇝ Performance is an emergent property.

Project Leyden

improves Java’s overall footprint
for now: focusses on startup/warmup time
by caching early JVM work
in the future: explores stricter constraints
for more aggressive optimization

Timeline

JDK 24:

AOT class loading & linking (JEP 483)

Current work:

AOT code compilation (JEP draft)
AOT method profiling (JEP draft)
work towards more limiting constraints

Deeper Dives

📝 Selectively Shifting and Constraining Computation
📝 Thoughts on Training Runs
🎥 A Preview of What’s Coming in Project Leyden (Oct 2024)
🎥 Project Leyden: Capturing Lightning in a Bottle (Feb 2024)
🎥 Project Leyden Update #JVMLS (August 2024)

Project Valhalla

Advanced Java VM and Language feature candidates

Profile:

project / wiki / mailing list / early access builds
launched July 2014
led by Brian Goetz

Motivation

Java has a split type system:

primitives
classes

We can only create classes, but:

have identity
have references

Identity

All classes come with identity:

extra memory for header
mutability
locking, synchronization, etc.

But not all custom types need that!

References

All class instances come as references:

memory access indirection
nullability

But not all custom types need that!

Project Valhalla

Valhalla’s goal is to unify the type system:

value types (disavow identity)
null-restriction + implicit constructors
(disavow identity + references)
universal generics (ArrayList<int>)
specialized generics (backed by int[])

Value types

value class ComplexNumber {

	private double real;
	private double imaginary;

	// constructor, etc.
}

Codes (almost) like a class - exceptions:

class and fields are implcitly final
superclasses are limited

Value type behavior

No identity:

some runtime operations throw exceptions
"identity" check == compares by state
null is default value

Benefits:

guaranteed immutability
more expressiveness
more optimizations

Migration to value types

The JDK (as well as other libraries) has many value-based classes, such as Optional and LocalDateTime. […] We plan to migrate many value-based classes in the JDK to value classes.

Getting rid of references

In general, value types have references:

allow null
prevent flattening

How do we get rid of them?

Null-restriction

Details are in flux, but possibly:

null-restructed variables and fields:

// number can't be null
ComplexNumber! number = // ...

implicit constructor marks good default instance

Implicit constructors

value class ComplexNumber {

	private double real;
	private double imaginary;

	// implicitly sets all fields to default values
	public implicit ComplexNumber();

	public ComplexNumber(double r, double i) {
		// ...
	}

	// etc.

}

No references

The just-in-time compiler can
inline/flatten variables …

of a value type
with implicit constructor
that are null-restricted

Performance comparable to today’s primitives! 🚀

Emergent performance

Don’t create a type in order to get performance.

Instead:

"Is the type value-ish?" ⇝ value type
"Is all-fields-default usable?" ⇝ implicit constructor
"Is no null needed?" ⇝ restrict nullness

Performance emerges from domain decisions!

Universal generics

When everybody creates their own value classes,
boxing becomes omni-present and very painful!

Universal generics allow value classes
as type parameters:

List<long> ids = new ArrayList<>();
List<RationalNumber> numbers = new ArrayList<>();

Specialized generics

Healing the rift in the type system is great!

But if ArrayList<int> is backed by Object[],
it will still be avoided in many cases.

Specialized generics will fix that:
Generics over primitives will avoid references!

Project Valhalla

Value types, implicit constructors, null-restriction
plus universal and specialized generics:

fewer trade-offs between
design and performance
no more manual specializations
better performance
can express design more clearly
more robust APIs

Makes Java more expressive and performant.

Timeline

🤷🏾‍♂️

(All effort is focussed on JEP 401.)

Deeper Dives

JEP 401: Value Classes and Objects
JEP draft: Null-restricted Value Types
JEP 402: Enhanced Primitive Boxing
JEP draft: Value Objects

Deeper Dives

📝 State of Valhalla
🎥 Valhalla - Java’s Epic Refactor (Aug 2021)

Project Babylon

Extend the reach of Java to foreign programming models such as SQL, differentiable programming, machine learning models, and GPUs

Profile:

project / mailing list / code base
launched January 2024
led by Paul Sandoz

Motivation

Java is adjacent to other programmable systems:

GPUs and FPGAs
SQL databases
differentiable functions

Allow programming them with Java code.

Approach

Don’t adapt to each realm in a separate project.

Instead:

make Java code accessible
provide API to read and transform it
let ecosystem provide adaptions

Code Reflection

Babylons’s central mechanism is code reflection:

enhancement of "regular" reflection
reaches down into methods/lambdas
symbolic representation of (Java) code

These are called code models.

NIH?

Abstract syntax tree:

constructed during compilation
closely aligned with Java grammar
too much syntactic info

Bytecode:

created by compiler
specified by JVM Specification
too little important info

Code Models

The code model design is heavily influenced by the design of data structures used by many modern compilers to represent code. These data structures are commonly referred to as Intermediate Representations (IRs). The design is further influenced by Multi-Level Intermediate Representation (MLIR), a sub-project of the LLVM Compiler Infrastructure project.

Code Models

Identify code (e.g. with annotation):

@CodeReflection
static double sub(double a, double b) {
   return a - b;
}

Then:

compiler creates code model
stored in class files
accessible via reflection API
can be transformed by Java code

Transforming Code Models

"Direct" GPU programming:

transform to GPU kernels (OpenCL C or CUDA C)
compile with GPU-specific toolchain

Triton-style:

offer class Triton with static methods
transform to Triton code model
compile with Triton toolchain

Triton

@CodeReflection
static void add_kernel2(
		Ptr x, Ptr y, Ptr result, int n, int size) {
    var pid = Triton.programId(0);
    var block_start = pid * size;
    var range = Triton.arange(0, size);
    var offsets = Triton.add(block_start, range);
    var mask = Triton.compare(
		offsets, n, Triton.CompareKind.LessThan);
    var x = Triton.load(Triton.add(x, offsets), mask);
    var y = Triton.load(Triton.add(y, offsets), mask);
    var output = Triton.add(x, y);
    Triton.store(
		Triton.add(result, offsets), output, mask);
}

Project Babylon

introduces code reflection & code models
allows their transformation
expands Java to foreign programming models
spearheads Java-on-GPU efforts (HAT)

Timeline

🤷🏾‍♂️

Deeper Dives

📝 Code Models
📝 Exploring Triton GPU programming for neural networks in Java
🎥 Code Reflection (Aug 2024)
🎥 Heterogeneous Accelerator Toolkit (Sep 2024)
🎥 Translating Java to SPIR-V (Aug 2024)

Project Amber

Smaller, productivity-oriented Java language features

Profile:

project / wiki / mailing list
launched March 2017
led by Brian Goetz

Motivation

Some downsides of Java:

can be cumbersome
tends to require boilerplate
situational lack of expressiveness

Amber continuously improves that situation.

Delivered

multi-file source launcher ㉒ (JEP 458)
unnamed variables and patterns ㉒ (JEP 456)
patterns in switch ㉑ (JEP 441)
record patterns ㉑ (JEP 440)
sealed types ⑰ (JEP 409)
records ⑯ (JEP 395)
type pattern matching ⑯ (JEP 394)
text blocks ⑮ (JEP 378)
switch expressions ⑭ (JEP 361)
local-variable type inference with var ⑩ (JEP 286)

Pattern Matching

Amber’s main thrust is pattern matching:

records
sealed types
improved switch
patterns

Sum > Parts

Inside Java Newscast #29

Amber endeavors

Other endeavors and conversations:

primitive types in patterns (JEP 488)
simplified main (JEP 495)
flexible constructor bodies (JEP 492)
deconstruction of classes
derived record creation ("withers") (JEP 468)
deconstruction assignment (announcement)
serialization 2.0 (talk at Devoxx BE)
concise method bodies (JEP draft)

String Templates?

Inside Java Newscast #71

Project Amber

makes Java more expressive
reduces amount of code
makes us more productive

Timeline

JDK 21:

records & sealed types
pattern matching basics
text blocks
single-file source launcher

JDK 22:

unnamed patterns
multi-file source launcher

Timeline

Current work:

simplified main (JEP 495)
flexible constructor bodies (JEP 492)
primitive types in patterns (JEP 488)
deconstruction

Deeper Dives

📝 Pattern Matching in the Java Object Model
🎥 Java 21 Pattern Matching Tutorial (Sep 2023)
🎥 Java Language Futures: Fall 2024 Edition (Oct 2024)

So long…

37% off with
code fccparlog

bit.ly/the-jms

Slides at slides.nipafx.dev
⇜ Get my book!

Follow Nicolai

nipafx.dev
/nipafx

Follow Java

inside.java // dev.java
/java // /openjdk

Image Credits

amber: CC0
https://commons.wikimedia.org/wiki/File:Pieter_Bruegel_the_Elder_-The_Tower_of_Babel(Vienna)_-_Google_Art_Project.jpg[babylon]: Pieter Brueghel the Elder, public domain
crystal ball: Joshua Woroniecki, Unsplash license
leyden: Steve @ the alligator farm, CC BY-SA 2.0
loom: Fancycrave, Unsplash license
panama-canal: Ron-01, Pixabay License
super-tree: Hu Chen, Unsplash license
training run: Anthony Soberal, Unsplash license
valhalla: Emil Doepler, public domain
question-mark: Milos Milosevic, CC-BY 2.0