Java Next!

Amber to Valhalla, Loom to Leyden, Babylon to Panama

Developer Advocate

Java Team at Oracle

Lots to talk about!

Project Panama
Project Loom
Project Leyden
Project Valhalla
Project Babylon
Project Amber

Project Panama

Interconnecting JVM and native code

Profile:

Subprojects

  • vector API

  • foreign memory API

  • foreign function API

Vectorization

Given two float arrays a and b,
compute c = - (a² + b²):

// a, b, c have same length
void compute(float[] a, float[] b, float[] c) {
	for (int i = 0; i < a.length; i++) {
		// c = -(a² + b²)
		c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
	}
}

Auto-vectorization

Vectorization - modern CPUs:

  • have multi-word registers (e.g. 512 bit)

  • can store several numbers (e.g. 16 float​s)

  • can execute several computations at once

single instruction, multiple data (SIMD)

Just-in-time compiler tries to vectorize loops.
Auto-vectorization

Works but isn’t reliable.

Vector API

static final VectorSpecies<Float> VS =
	FloatVector.SPECIES_PREFERRED;

// a, b, c length is multiple of vector length
void compute(float[] a, float[] b, float[] c) {
	int upperBound = VS.loopBound(a.length);
	for (int i = 0; i < upperBound; i += VS.length()) {
		var va = FloatVector.fromArray(VS, a, i);
		var vb = FloatVector.fromArray(VS, b, i);
		// c = -(a² + b²)
		var vc = va.mul(va)
			.add(vb.mul(vb))
			.neg();
		vc.intoArray(c, i);
	}
}

Vector API

Properties:

  • clear and concise API (given the requirements)

  • platform agnostic

  • reliable run-time compilation and performance

  • graceful degradation

Foreign memory

Storing data off-heap is tough:

  • ByteBuffer is limited (2GB) and inefficient

  • Unsafe is…​ unsafe and not supported

Foreign-memory API

Safe and performant foreign-memory API:

  • control (de)allocation:
    Arena, MemorySegment, SegmentAllocator

  • to access/manipulate: MemoryLayout, VarHandle

Foreign functions

JNI isn’t ideal:

  • involves several tedious artifacts (header file, impl, …​)

  • can only interoperate with languages that align
    with OS/architecture the JVM was built for

  • doesn’t reconcile Java/C type systems

Foreign-function API

Streamlined tooling/API for foreign functions
based on method handles:

  • jextract: generates method handles from header file

  • classes to call foreign functions
    Linker, FunctionDescriptor, SymbolLookup

Project Panama

  • connects Java with the native world

  • offers safe, detailed, and performant APIs

Timeline

JDK 21:

  • foreign APIs in 3rd preview (JEP 442)

  • vector API in 6th incubation (JEP 448)

JDK 22:

JDK 24:

  • vector API in 9th incubation (JEP 489),
    still waiting for Valhalla’s value types

Timeline

Current work:

  • improve memory access performance

  • reduce startup/warmup cost

  • refine record mappers

  • improve jextract

Deeper Dives

Deeper Dives

Project Loom

JVM features and APIs for supporting easy-to-use, high-throughput, lightweight concurrency and new programming models

Profile:

Motivation

An application with many blocking operations
had two options:

  • block platform (OS) threads until task completion:

    • simple-to-use programming paradigm

    • can limit throughput

  • use asnychronous programming

    • harder to write and harder still to debug

    • allows higher throughput

Motivation

Resolve the conflict between:

  • simplicity

  • throughput

Enter virtual threads!

A virtual thread:

  • is a regular Thread

  • low memory footprint (stack + bytes)

  • small switching cost

  • scheduled by the Java runtime

  • executes on platform thread

  • waits in memory
    (no platform thread blocked)

Exceptions

Pinning:
  • a pinned VT will block the PT

  • caused by object monitors,
    native calls, class initialization

Capture:
  • a captured VT blocks the PT

  • caused by file I/O

Object monitor pinning

Object monitor implementation:

  • was bound to OS threads

  • required deep refactoring
    to work with VTs

  • fix ships with JDK 24

⇝ No more pinning for synchronized.

Native code pinning

Cause:

  • native code works on PT’s stack

  • switching PTs would wreak havoc

Fix:

  • possible in the JVM, but expensive

  • fairly easy to avoid

⇝ Don’t call native code, then back to Java, then block.

File I/O capture

File I/O capture is caused by JVM/OS limitations.

Linux io_uring allows async I/O but:

  • adoption incurrs overhead

  • considerable compared to cached SSD-reads

  • cost/benefit is not good

⇝ No fix for now.

Performance

Virtual threads aren’t "faster threads":
Each task takes the same time (same latency).

Virtual threads increase throughput:

  • when workload is not CPU-bound and

  • when number of concurrent tasks is high

Use Cases

Virtual threads are cheap and plentiful:

  • no pooling necessary

  • allows thread per task

  • allows liberal creation
    of threads for subtasks

⇝ Enables new concurrency programming models.

Structured programming

  • prescribes single entry point
    and clearly defined exit points

  • influenced languages and runtimes

Structured concurrency

When the flow of execution splits into multiple concurrent flows, they rejoin in the same code block.

⇝ Threads are short-lived:

  • start when task begins

  • end on completion

⇝ Enables parent-child/sibling relationships
and logical grouping of threads.

Structured concurrency

void handle(Request request, Response response)
		throws InterruptedException {
	// implicitly short-circuits on error
	try (var scope = StructuredTaskScope.open()) {
		var subtaskA = scope.fork(this::taskA);
		var subtaskB = scope.fork(this::taskB);
		// wait explicitly for success
		// (throws errors if there were any)
		scope.join();

		response.send(subtaskA.get() + subtaskB.get());
	} catch (ExecutionException ex) {
		response.fail(ex);
	}
}

Completion

Use Joiner to configure completion:

  • how are results collected?

  • when are subtasks cancelled?

  • when does join throw?

Pass to StructuredTaskScope.open(Joiner).

Joiners

Existing joiners for heterogeneous results:

  • awaitAllSuccessfulOrThrow():

    • cancels/throws on first error

    • default behavior of open()

  • awaitAll():

    • never cancels/throws

Joiners

Existing joiners for homogeneous results:

  • allSuccessfulOrThrow():

    • cancels/throws on first error

    • returns Stream<RESULT>

  • anySuccessfulResultOrThrow()

    • cancels/throws if all fail

    • returns RESULT

Structured concurrency

  • forked tasks are children of the scope
    (visible in thread dumps)

  • creates relationship between threads

  • success/failure policy can be defined
    across all children

Sharing data

With ThreadLocal:

static final ThreadLocal<Principal> PRINCIPAL =
	new ThreadLocal<>();

public void serve(Request request, Response response) {
	var level = request.isAdmin() ? ADMIN : GUEST;
	var principal = new Principal(level);
	PRINCIPAL.set(principal);
	Application.handle(request, response);
}

// elsewhere
PRINCIPAL.get()

Sharing data

ThreadLocal downsides:

  • unconstrained mutability

  • unbounded lifetime

  • expensive inheritance

ScopedValues improve on that:

  • write-once (per thread)

  • clearly scoped

  • free inheritance

With scoped value

static final ScopedValue<Principal> PRINCIPAL =
	new ScopedValue<>();

public void serve(Request request, Response response) {
	var level = request.isAdmin() ? ADMIN : GUEST;
	var principal = new Principal(level);
	ScopedValue
		.where(PRINCIPAL, principal)
		.run(() -> Application
			.handle(request, response));
}

// elsewhere
PRINCIPAL.get()

Project Loom

Virtual threads:

  • code is simple to write, debug, profile

  • allows high throughput

Structured concurrency:

  • clearer concurrency code

  • simpler failure/success policies

  • better debugging

Scoped values:

  • safer, more scalable data sharing

Timeline

JDK 21:

  • virtual threads finalize (JEP 444)

  • structured concurrency previews (JEP 453)

  • scoped values preview (JEP 446)

JDK 24:

  • object monitors no longer pin virtual threads (JEP 491)

  • structured concurrency in 4th preview (JEP 499)

  • scoped values in 4th preview (JEP 487)

Timeline

Current work:

  • finalize structured concurrency and
    scoped values APIs

  • reduce pinning during class initialization

  • improve lock info in thread dumps

JDK 25:

  • structured concurrency in 5th preview (JEP 499)

  • scoped values 🤷🏾‍♂️

Deeper Dives

Project Leyden

Faster startup, shorter time to peak performance, smaller footprint

Profile:

Motivation

Java has really good peak performance,
but also tends to have:

  • slow startup time

  • slow warmup time

  • large footprint

For now, Leyden focusses on startup/warmup.

Computation

Two kinds of computation:

  • expressed by the program

  • done on behalf of the program, e.g.:

    • class-loading

    • JIT compilation

    • garbage collection

For now, Leyden focusses in the latter.

Startup & Warmup

Early computation on behalf of the program:

  • class loading

  • callsite linkage

  • constant pool resolution

  • interpretation

  • profile gathering

  • JIT compilation (C1, C2)

Shifting Computation

Java already shifts computation:

  • compile-time constant folding

  • class loading

  • garbage collection

  • out-of-order execution

Let’s shift more computation ahead of time!

What computation?

Shifting Everything

Shift everything ahead of time?

  • class loading & linking

  • JIT compilation

  • method profiling

  • lambda resolution

  • dead-code elimination

  • …​

But…​

Dynamic Java

Java is highly dynamic:

  • class loading

  • class redefinition

  • linkage

  • access control

  • method dispatch

  • run-time typing (e.g. casting)

  • introspection

  • JIT compilation, decompilation

How to AOT everything?

Enter AOTCache

Leyden introduces AOTCache:

  • observe JVM

  • capture decisions in AOTCache
    (expansion of CDS Archive)

  • use as "initial state" during future run

  • fall back to live observation/optimization
    if necessary and possible

AOT workflow

# training run (⇝ profile)
$ java -XX:AOTMode=record
       -XX:AOTConfiguration=app.aotconf
       -cp app.jar com.example.App ...
# assembly phase (profile ⇝ AOTCache)
$ java -XX:AOTMode=create
       -XX:AOTConfiguration=app.aotconf
       -XX:AOTCache=app.aot
       -cp app.jar
# production run (AOTCache ⇝ performance)
$ java -XX:AOTCache=app.aot
       -cp app.jar com.example.App ...

(Open to improvements.)

AOT class loading & linking

Introduced by JEP 483:

Improve startup time by making the classes of an application instantly available, in a loaded and linked state, when the HotSpot JVM starts.

Spring PetClinic benchmarks:

  • up to ~40% startup time reduction

  • AOT cache size of ~130 MB

AOT class loading & linking

Limitation:

  • same JDK release / hardware / OS

  • consistent class path for training and production

  • consistent module options

  • limited use of JVMTI agents

Otherwise, AOTCache is ignored.

AOT everything

Leyden’s early access builds AOT more:

  • method profiling

  • constant resolution

  • code compilation

  • dynamic proxies

  • reflection data

  • unfound classes

Benchmarks show ~70% startup time reduction.

Beyond Fallback

Most cached data can be:

  • validated at runtime

  • replaced with more accurate
    or better data (e.g. JIT code)

More optimizations are possible:

  • if dynamism is constrained

  • if program is constrained

Constraining Dynamism

Let developers accept constraints, e.g.:

  • limited class redefinition

  • closed-world assumption

  • fixed program configuration

Let Java apply suitable optimizations.

⇝ Performance is an emergent property.

Project Leyden

  • improves Java’s overall footprint

  • for now: focusses on startup/warmup time
    by caching early JVM work

  • in the future: explores stricter constraints
    for more aggressive optimization

Timeline

JDK 24:

  • AOT class loading & linking (JEP 483)

Current work:

  • AOT code compilation (JEP draft)

  • AOT method profiling (JEP draft)

  • work towards more limiting constraints

Deeper Dives

Project Valhalla

Advanced Java VM and Language feature candidates

Profile:

Motivation

Java has a split type system:

  • primitives

  • classes

We can only create classes, but:

  • have identity

  • have references

Identity

All classes come with identity:

  • extra memory for header

  • mutability

  • locking, synchronization, etc.

But not all custom types need that!

References

All class instances come as references:

  • memory access indirection

  • nullability

But not all custom types need that!

Project Valhalla

Valhalla’s goal is to unify the type system:

  • value types (disavow identity)

  • null-restriction + implicit constructors
    (disavow identity + references)

  • universal generics (ArrayList<int>)

  • specialized generics (backed by int[])

Value types

value class ComplexNumber {

	private double real;
	private double imaginary;

	// constructor, etc.
}

Codes (almost) like a class - exceptions:

  • class and fields are implcitly final

  • superclasses are limited

Value type behavior

No identity:

  • some runtime operations throw exceptions

  • "identity" check == compares by state

  • null is default value

Benefits:

  • guaranteed immutability

  • more expressiveness

  • more optimizations

Migration to value types

The JDK (as well as other libraries) has many value-based classes, such as Optional and LocalDateTime. […​] We plan to migrate many value-based classes in the JDK to value classes.

Getting rid of references

In general, value types have references:

  • allow null

  • prevent flattening

How do we get rid of them?

Null-restriction

Details are in flux, but possibly:

  • null-restructed variables and fields:

    // number can't be null
    ComplexNumber! number = // ...
  • implicit constructor marks good default instance

Implicit constructors

value class ComplexNumber {

	private double real;
	private double imaginary;

	// implicitly sets all fields to default values
	public implicit ComplexNumber();

	public ComplexNumber(double r, double i) {
		// ...
	}

	// etc.

}

No references

The just-in-time compiler can
inline/flatten variables …

  • of a value type

  • with implicit constructor

  • that are null-restricted

Performance comparable to today’s primitives! 🚀

Emergent performance

Don’t create a type in order to get performance.

Instead:

  • "Is the type value-ish?" ⇝ value type

  • "Is all-fields-default usable?" ⇝ implicit constructor

  • "Is no null needed?" ⇝ restrict nullness

Performance emerges from domain decisions!

Universal generics

When everybody creates their own value classes,
boxing becomes omni-present and very painful!

Universal generics allow value classes
as type parameters:

List<long> ids = new ArrayList<>();
List<RationalNumber> numbers = new ArrayList<>();

Specialized generics

Healing the rift in the type system is great!

But if ArrayList<int> is backed by Object[],
it will still be avoided in many cases.

Specialized generics will fix that:
Generics over primitives will avoid references!

Project Valhalla

Value types, implicit constructors, null-restriction
plus universal and specialized generics:

  • fewer trade-offs between
    design and performance

  • no more manual specializations

  • better performance

  • can express design more clearly

  • more robust APIs

Makes Java more expressive and performant.

Timeline

🤷🏾‍♂️

(All effort is focussed on JEP 401.)

Deeper Dives

Deeper Dives

Project Babylon

Extend the reach of Java to foreign programming models such as SQL, differentiable programming, machine learning models, and GPUs

Profile:

Motivation

Java is adjacent to other programmable systems:

  • GPUs and FPGAs

  • SQL databases

  • differentiable functions

Allow programming them with Java code.

Approach

Don’t adapt to each realm in a separate project.

Instead:

  • make Java code accessible

  • provide API to read and transform it

  • let ecosystem provide adaptions

Code Reflection

Babylons’s central mechanism is code reflection:

  • enhancement of "regular" reflection

  • reaches down into methods/lambdas

  • symbolic representation of (Java) code

These are called code models.

NIH?

Abstract syntax tree:

  • constructed during compilation

  • closely aligned with Java grammar

  • too much syntactic info

Bytecode:

  • created by compiler

  • specified by JVM Specification

  • too little important info

Code Models

The code model design is heavily influenced by the design of data structures used by many modern compilers to represent code. These data structures are commonly referred to as Intermediate Representations (IRs). The design is further influenced by Multi-Level Intermediate Representation (MLIR), a sub-project of the LLVM Compiler Infrastructure project.

Code Models

Identify code (e.g. with annotation):

@CodeReflection
static double sub(double a, double b) {
   return a - b;
}

Then:

  • compiler creates code model

  • stored in class files

  • accessible via reflection API

  • can be transformed by Java code

Transforming Code Models

"Direct" GPU programming:

  • transform to GPU kernels (OpenCL C or CUDA C)

  • compile with GPU-specific toolchain

Triton-style:

  • offer class Triton with static methods

  • transform to Triton code model

  • compile with Triton toolchain

Triton

@CodeReflection
static void add_kernel2(
		Ptr x, Ptr y, Ptr result, int n, int size) {
    var pid = Triton.programId(0);
    var block_start = pid * size;
    var range = Triton.arange(0, size);
    var offsets = Triton.add(block_start, range);
    var mask = Triton.compare(
		offsets, n, Triton.CompareKind.LessThan);
    var x = Triton.load(Triton.add(x, offsets), mask);
    var y = Triton.load(Triton.add(y, offsets), mask);
    var output = Triton.add(x, y);
    Triton.store(
		Triton.add(result, offsets), output, mask);
}

Project Babylon

  • introduces code reflection & code models

  • allows their transformation

  • expands Java to foreign programming models

  • spearheads Java-on-GPU efforts (HAT)

Timeline

🤷🏾‍♂️

Deeper Dives

Project Amber

Smaller, productivity-oriented Java language features

Profile:

Motivation

Some downsides of Java:

  • can be cumbersome

  • tends to require boilerplate

  • situational lack of expressiveness

Amber continuously improves that situation.

Delivered

Pattern Matching

Amber’s main thrust is pattern matching:

  • records

  • sealed types

  • improved switch

  • patterns

Sum > Parts

inside java newscast 29

Inside Java Newscast #29

Amber endeavors

Other endeavors and conversations:

String Templates?

inside java newscast 71

Inside Java Newscast #71

Project Amber

  • makes Java more expressive

  • reduces amount of code

  • makes us more productive

Timeline

JDK 21:

  • records & sealed types

  • pattern matching basics

  • text blocks

  • single-file source launcher

JDK 22:

  • unnamed patterns

  • multi-file source launcher

Timeline

Current work:

  • simplified main (JEP 495)

  • flexible constructor bodies (JEP 492)

  • primitive types in patterns (JEP 488)

  • deconstruction

Deeper Dives

So long…​

37% off with
code fccparlog

bit.ly/the-jms

More

Slides at slides.nipafx.dev
⇜ Get my book!

Follow Nicolai

nipafx.dev
/nipafx

Follow Java

inside.java // dev.java
/java    //    /openjdk

Image Credits