Java Next!

Amber to Valhalla, Loom to Leyden, Babylon to Panama

Nicolai Parlog

nipafx.dev / @nipafx

IT-Tage 2025

#ittage

Developer Advocate

Java Team at Oracle

Lots to talk about!

Project Panama

Project Loom

Project Amber

Project Leyden

Project Valhalla

Project Babylon

Slides at slides.nipafx.dev/java-next.

Project Panama

Interconnecting JVM and native code

Profile:

project / mailing list
launched July 2014
led by Maurizio Cimadamore

Subprojects

vector API
foreign memory API
foreign function API

Vectorization

Given two float arrays a and b,
compute c = - (a² + b²):

// a, b, c have same length
void compute(float[] a, float[] b, float[] c) {
	for (int i = 0; i < a.length; i++) {
		// c = -(a² + b²)
		c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
	}
}

Auto-vectorization

Vectorization - modern CPUs:

have multi-word registers (e.g. 512 bit)
can store several numbers (e.g. 16 floats)
can execute several computations at once

⇝ single instruction, multiple data (SIMD)

Just-in-time compiler tries to vectorize loops.
⇝ Auto-vectorization

Works but isn’t reliable.

Vector API

static final VectorSpecies<Float> VS =
	FloatVector.SPECIES_PREFERRED;

// a, b, c length is multiple of vector length
void compute(float[] a, float[] b, float[] c) {
	int upperBound = VS.loopBound(a.length);
	for (int i = 0; i < upperBound; i += VS.length()) {
		var va = FloatVector.fromArray(VS, a, i);
		var vb = FloatVector.fromArray(VS, b, i);
		// c = -(a² + b²)
		var vc = va.mul(va)
			.add(vb.mul(vb))
			.neg();
		vc.intoArray(c, i);
	}
}

Vector API

Properties:

clear and concise API (given the requirements)
platform agnostic
reliable run-time compilation and performance
graceful degradation

Foreign memory

Storing data off-heap is tough:

ByteBuffer is limited (2GB) and inefficient
Unsafe is… unsafe and not supported

Foreign-memory API

Safe and performant foreign-memory API:

control (de)allocation:
Arena, MemorySegment, SegmentAllocator
to access/manipulate: MemoryLayout, VarHandle

Foreign functions

JNI isn’t ideal:

involves several tedious artifacts (header file, impl, …)
can only interoperate with languages that align
with OS/architecture the JVM was built for
doesn’t reconcile Java/C type systems

Foreign-function API

Streamlined tooling/API for foreign functions
based on method handles:

jextract: generates method handles from header file
classes to call foreign functions
Linker, FunctionDescriptor, SymbolLookup

Project Panama

connects Java with the native world
offers safe, detailed, and performant APIs

Timeline

JDK 21:

foreign APIs in 3rd preview (JEP 442)
vector API in 6th incubation (JEP 448)

JDK 22:

foreign APIs finalize (JEP 454)

JDK 25:

vector API in 10th incubation (JEP 508),
waiting for Valhalla’s value types

Timeline

Current work on FFM:

improve memory access performance
reduce startup/warmup cost
refine record mappers
improve jextract

Deeper Dives

Vector API:

📝 JEP 508: Vector API (Tenth Incubator)
🎥 Fast Java Code with the Vector API (Mar 2023)
🎥 The Vector API in JDK 17 (Sep 2021)
📝 FizzBuzz – SIMD Style! (Mar 2021)

Deeper Dives

Foreign APIs:

📝 design documents
🎥 Panama Update with Maurizio Cimadamore (Jul 2019)
🎥 ByteBuffers are dead, long live ByteBuffers! (Feb 2020)
🎥 The State of Project Panama with Maurizio Cimadamore (Jun 2021)

Project Loom

JVM features and APIs for supporting easy-to-use, high-throughput, lightweight concurrency and new programming models

Profile:

project / wiki / mailing list
launched January 2018
led by Ron Pressler

Virtual threads

A virtual thread:

is a regular Thread
low memory footprint (stack + bytes)
small switching cost
scheduled by the Java runtime
executes on platform thread
waits in memory
(no platform thread blocked)

Exceptions

Pinning:

a pinned VT will block the PT
caused by:
- ~~object monitors (synchronized)~~ ⇜ ㉔
- class initialization
- native calls

Capture:

a captured VT blocks the PT
caused by file I/O

Virtual threads

Resolve the conflict between:

Simplicity

simple-to-use, blocking programming
aligns with platform (tooling, debugging, …)

Throughput

minimizes overhead while waiting
removes number-of-threads as bottleneck

Performance

Virtual threads aren’t "faster threads":
Each task takes the same time (same latency).

Virtual threads increase throughput:

when workload is not CPU-bound and
when number of concurrent tasks is high

Use Cases

Virtual threads are cheap and plentiful:

no pooling necessary
allows thread per task
allows liberal creation
of threads for subtasks

⇝ Enables new concurrency programming models.

Structured programming

prescribes single entry point
and clearly defined exit points
influenced languages and runtimes

Structured concurrency

When the flow of execution splits into multiple concurrent flows, they rejoin in the same code block.

⇝ Threads are short-lived:

start when task begins
end on completion

⇝ Enables parent-child/sibling relationships
and logical grouping of threads.

Structured concurrency

String executeTasks() throws InterruptedException {
	// implicitly short-circuits on error
	try (var scope = StructuredTaskScope.open()) {
		Subtask<String> taskA = scope.fork(this::doA);
		Subtask<String> taskB = scope.fork(this::doB);

		// wait explicitly for success
		// (throws errors if there were any)
		scope.join();

		// all tasks succeeded
		return taskA.get() + taskB.get();
	} catch (FailedException ex) {
		return ex.getMessage();
	}
}

Structured concurrency

forked tasks are children of the scope
(visible in thread dumps)
creates relationship between threads
success/failure policy can be defined
across all children

Joiner

Use Joiner to configure success/failure policy:

how are results collected?
when are subtasks cancelled?
when does join throw?

Pass to StructuredTaskScope.open(Joiner).

Project Loom

Virtual threads:

code is simple to write, debug, profile
allows high throughput

Structured concurrency:

clearer concurrency code
simpler failure/success policies
better debugging

Scoped values:

safer, more scalable data sharing

Timeline

JDK 21:

virtual threads finalize (JEP 444)
structured concurrency previews (JEP 453)
scoped values preview (JEP 446)

JDK 25:

object monitors don’t pin virtual threads (JEP 491)
structured concurrency preview revamp (JEP 499)
scoped values finalize (JEP 506)

Timeline

Current work:

finalize structured concurrency
reduce pinning during class initialization
improve lock info in thread dumps

Deeper Dives

📝 On Parallelism and Concurrency
📝 Structured Concurrency
📝 Notes on structured concurrency […]
🎥 Modern, Scalable Concurrency for the Java Platform
(Sep 2021)
🎥 State of Project Loom with Ron Pressler (Jun 2021)
🎥 Java 19 Virtual Threads - JEP Café #11 (Jun 2022)

Project Amber

Smaller, productivity-oriented Java language features

Profile:

project / wiki / mailing list
launched March 2017
led by Brian Goetz

Motivation

Some downsides of Java:

can be cumbersome
tends to require boilerplate
situational lack of expressiveness

Amber continuously improves that situation.

Delivered

local-variable type inference ⑩ (JEP 286)
switch expressions ⑭ (JEP 361)
text blocks ⑮ (JEP 378)
type pattern matching ⑯ (JEP 394)
records ⑯ (JEP 395)
sealed types ⑰ (JEP 409)
record patterns ㉑ (JEP 440)
patterns in switch ㉑ (JEP 441)

Delivered

unnamed variables and patterns ㉒ (JEP 456)
multi-file source launcher ㉒ (JEP 458)
module imports ㉕ (JEP 511)
simplified main ㉕ (JEP 512)
flexible constructor bodies ㉕ (JEP 513)

Pattern Matching

Amber’s main thrust is pattern matching:

records
sealed types
improved switch
patterns

Sum > Parts

Data-Oriented Programming

Amber endeavors

Other endeavors and conversations:

primitive types in patterns (JEP 530)
deconstruction of classes
derived record creation ("withers") (JEP 468)
deconstruction assignment (announcement)
serialization 2.0 (talks #1, #2)
concise method bodies (JEP draft)

String Templates?

🎥 What Happened to Java’s String Templates?

Project Amber

makes Java more expressive
reduces amount of code
makes us more productive

Timeline

JDK 21:

records & sealed types
pattern matching basics
text blocks
single-file source launcher

JDK 25:

unnamed variables and patterns
multi-file source launcher
simplified main & module imports
flexible constructor bodies

Timeline

Current work:

primitive types in patterns (JEP 530)
deconstruction

Deeper Dives

📝 Pattern Matching in the Java Object Model
🎥 Java 21 Pattern Matching Tutorial (Sep 2023)
🎥 Java Language Futures: Fall 2024 Edition (Oct 2024)

Project Leyden

Faster startup, shorter time to peak performance, smaller footprint

Profile:

project / mailing list / early access builds
launched May 2022
led by Mark Reinhold

Motivation

Java has really good peak performance,
but also tends to have:

slow startup time
slow warmup time

Startup & Warmup

Early work by the runtime:

class loading
callsite linkage
constant pool resolution
interpretation
profile gathering
JIT compilation (C1, C2)

Can we shift this work?

Shifting Computation

Java already shifts computation:

compile-time constant folding
class loading
garbage collection
out-of-order execution
…

Let’s shift more computation ahead of time!

Dynamic Java

But Java is highly dynamic:

class loading
class redefinition
linkage
access control
method dispatch
run-time typing (e.g. casting)
introspection
JIT compilation, decompilation

How to AOT everything?

Enter AOTCache

Leyden introduces AOTCache:

observe JVM
capture decisions in AOTCache
(expansion of CDS Archive)
use as "initial state" during future run
fall back to live observation/optimization
if necessary and possible

AOT workflow

# training run (⇝ profile)
$ java -XX:AOTMode=record
       -XX:AOTConfiguration=app.aotconf
       -cp app.jar com.example.App ...
# assembly phase (profile ⇝ AOTCache)
$ java -XX:AOTMode=create
       -XX:AOTConfiguration=app.aotconf
       -XX:AOTCache=app.aot
       -cp app.jar
# production run (AOTCache ⇝ performance)
$ java -XX:AOTCache=app.aot
       -cp app.jar com.example.App ...

AOT workflow

Shortcut for most cases:

# training run (⇝ AOTCache)
$ java -XX:AOTCacheOutput=app.aot
       -cp app.jar com.example.App ...
# production run (AOTCache ⇝ performance)
$ java -XX:AOTCache=app.aot
       -cp app.jar com.example.App ...

(Open to further improvements.)

AOT class loading & linking

Improve startup time by making the classes of an application instantly available, in a loaded and linked state, when the HotSpot JVM starts.

Spring PetClinic benchmarks:

up to ~40% startup time reduction
AOT cache size of ~130 MB

AOT method profiling

Improve warmup time by making method-execution profiles from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts.

Benchmark of a 100_000x loop over a simple stream:

~20% run time reduction
AOT cache size increased by ~2.5%

AOT limitations

Limitation so far:

same JDK release / architecture / OS
consistent class path for training and production
consistent module options
limited use of JVMTI agents

Otherwise, AOT cache is ignored.

AOT everything

Leyden’s early access builds AOT more:

constant resolution
code compilation
dynamic proxies
reflection data
unfound classes

Benchmarks show ~70% startup time reduction.

Project Leyden

improves Java’s overall footprint
focusses on startup/warmup time
by caching early JVM work
may explore stricter constraints
for more aggressive optimization

Timeline

JDK 25:

AOT class loading & linking (JEP 483)
AOT method profiling (JEP 515)

JDK 26:

AOT with any GC (JEP 516)

Current work:

AOT code compilation (JEP draft)
inclusion of more dynamic information

Deeper Dives

📝 Selectively Shifting and Constraining Computation
📝 Thoughts on Training Runs
🎥 A Preview of What’s Coming in Project Leyden (Oct 2024)
🎥 Project Leyden: Capturing Lightning in a Bottle (Feb 2024)
🎥 Project Leyden Update #JVMLS (August 2024)

Project Valhalla

Advanced Java VM and Language feature candidates

Profile:

project / wiki / mailing list / early access builds
launched July 2014
led by Brian Goetz

Motivation

Java has a split type system:

primitives
classes

We can only create classes, but:

have identity
have references

Identity

All classes come with identity:

extra memory for header
mutability
locking, synchronization, etc.

But not all custom types need that!

References

All class instances come as references:

memory access indirection
nullability

But not all custom types need that!

Project Valhalla

Valhalla’s goal is to unify the type system:

value types (disavow identity)
null-restriction + implicit constructors
(disavow references)

Potential follow-up work:

type classes (limited operator overloading)
universal generics (ArrayList<int>)
specialized generics (backed by int[])

Value types

value class ComplexNumber {

	private double real;
	private double imaginary;

	// constructor, etc.
}

Codes (almost) like a class - exceptions:

class and fields are implicitly final
superclasses are limited

Value type behavior

No identity:

some runtime operations throw exceptions
"identity" check == compares by state
null is default value

Benefits:

guaranteed immutability
more expressiveness
more optimizations

Migration to value types

The JDK (as well as other libraries) has many value-based classes, such as Optional and LocalDateTime. […] We plan to migrate many value-based classes in the JDK to value classes.

Getting rid of references

In general, value types have references:

allow null
prevent flattening

How do we get rid of them?

Null-restriction

Details are in flux, but possibly:

null-restructed variables and fields:

// number can't be null
ComplexNumber! number = // ...

implicit constructor marks good default instance

Implicit constructors

value class ComplexNumber {

	private double real;
	private double imaginary;

	// implicitly sets all fields to default values
	public implicit ComplexNumber();

	public ComplexNumber(double r, double i) {
		// ...
	}

	// etc.

}

No references

The just-in-time compiler can
inline/flatten variables …

of a value type
with implicit constructor
that are null-restricted

Performance comparable to today’s primitives! 🚀

Emergent performance

Don’t create a type in order to get performance.

Instead:

"Is the type value-ish?" ⇝ value type
"Is all-fields-default usable?" ⇝ implicit constructor
"Is no null needed?" ⇝ restrict nullness

Performance emerges from domain decisions!

Type classes

For value types to feel like primitives,
we need to use them with operators.

Maybe (!) Java will let us define common operations
for suitale types (with type classes):

var one = new ComplexNumber(1, 0);
var i = new ComplexNumber(0, 1);

var x = one + i; // maybe
var y = one * i; // maybe
var z = one $ i; // NO!

Universal generics

When everybody creates their own value classes,
boxing becomes omni-present and very painful!

Universal generics allow value classes
as type parameters:

List<long> ids = new ArrayList<>();
List<RationalNumber> numbers = new ArrayList<>();

Specialized generics

Healing the rift in the type system is great!

But if ArrayList<int> is backed by Object[],
it will still be avoided in many cases.

Specialized generics will fix that:
Generics over primitives will avoid references!

Project Valhalla

Value types, implicit constructors, null-restriction
plus universal and specialized generics:

fewer trade-offs between
design and performance
no more manual specializations
better performance
can express design more clearly
more robust APIs

Makes Java more expressive and performant.

Timeline

🤷🏾‍♂️

(All effort is focussed on JEP 401.)

Deeper Dives

JEP 401: Value Classes and Objects
JEP draft: Null-restricted Value Types
JEP 402: Enhanced Primitive Boxing
JEP draft: Value Objects

Deeper Dives

📝 State of Valhalla
🎥 Valhalla - Java’s Epic Refactor (Dec 2024)
🎥 Growing the Java Language (Aug 2025)

Project Babylon

Extend the reach of Java to foreign programming models such as SQL, differentiable programming, machine learning models, and GPUs

Profile:

project / mailing list / code base
launched January 2024
led by Paul Sandoz

Motivation

Java is adjacent to other programmable systems:

GPUs and FPGAs
SQL databases
differentiable functions

Allow programming them with Java code.

Approach

Don’t adapt to each realm in a separate project.

Instead:

make Java code accessible
provide API to read and transform it
let ecosystem provide adaptions

Code Reflection

Babylons’s central mechanism is code reflection:

enhancement of "regular" reflection
reaches down into methods/lambdas
symbolic representation of (Java) code

These are called code models.

NIH?

Abstract syntax tree:

constructed during compilation
closely aligned with Java grammar
too much syntactic info

Bytecode:

created by compiler
specified by JVM Specification
too little important info

Code Models

The code model design is heavily influenced by the design of data structures used by many modern compilers to represent code. These data structures are commonly referred to as Intermediate Representations (IRs). The design is further influenced by Multi-Level Intermediate Representation (MLIR), a sub-project of the LLVM Compiler Infrastructure project.

Code Models

Identify code (e.g. with annotation):

@CodeReflection
static double sub(double a, double b) {
   return a - b;
}

Then:

compiler creates code model
stored in class files
accessible via reflection API
can be transformed by Java code

Transforming Code Models

"Direct" GPU programming:

transform to GPU kernels (OpenCL C or CUDA C)
compile with GPU-specific toolchain

Triton-style:

offer class Triton with static methods
transform to Triton code model
compile with Triton toolchain

Triton

@CodeReflection
static void add_kernel2(
		Ptr x, Ptr y, Ptr result, int n, int size) {
    var pid = Triton.programId(0);
    var block_start = pid * size;
    var range = Triton.arange(0, size);
    var offsets = Triton.add(block_start, range);
    var mask = Triton.compare(
		offsets, n, Triton.CompareKind.LessThan);
    var x = Triton.load(Triton.add(x, offsets), mask);
    var y = Triton.load(Triton.add(y, offsets), mask);
    var output = Triton.add(x, y);
    Triton.store(
		Triton.add(result, offsets), output, mask);
}

Project Babylon

introduces code reflection & code models
allows their transformation
expands Java to foreign programming models
spearheads Java-on-GPU efforts (HAT)

Timeline

🤷🏾‍♂️

Deeper Dives

📝 Code Models
📝 Accelerating Java on Parallel Architectures (Oct 2024)
🎥 Java for AI (Oct 2025)
🎥 Writing GPU-Ready AI Models in Pure Java (Oct 2025)
🎥 ONNX Based Generative AI LLMs in Java (Nov 2025)