// a, b, c have same length
void compute(float[] a, float[] b, float[] c) {
for (int i = 0; i < a.length; i++) {
// c = -(a² + b²)
c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
}
}| Project Panama |
| Project Loom |
| Project Amber |
| Project Leyden |
| Project Valhalla |
| Project Babylon |
Slides at slides.nipafx.dev/java-next.
Interconnecting JVM and native code
Profile:
launched July 2014
led by Maurizio Cimadamore
vector API
foreign memory API
foreign function API
Given two float arrays a and b,
compute c = - (a² + b²):
// a, b, c have same length
void compute(float[] a, float[] b, float[] c) {
for (int i = 0; i < a.length; i++) {
// c = -(a² + b²)
c[i] = (a[i] * a[i] + b[i] * b[i]) * -1.0f;
}
}Vectorization - modern CPUs:
have multi-word registers (e.g. 512 bit)
can store several numbers (e.g. 16 floats)
can execute several computations at once
⇝ single instruction, multiple data (SIMD)
Just-in-time compiler tries to vectorize loops.
⇝ Auto-vectorization
Works but isn’t reliable.
static final VectorSpecies<Float> VS =
FloatVector.SPECIES_PREFERRED;
// a, b, c length is multiple of vector length
void compute(float[] a, float[] b, float[] c) {
int upperBound = VS.loopBound(a.length);
for (int i = 0; i < upperBound; i += VS.length()) {
var va = FloatVector.fromArray(VS, a, i);
var vb = FloatVector.fromArray(VS, b, i);
// c = -(a² + b²)
var vc = va.mul(va)
.add(vb.mul(vb))
.neg();
vc.intoArray(c, i);
}
}Properties:
clear and concise API (given the requirements)
platform agnostic
reliable run-time compilation and performance
graceful degradation
Storing data off-heap is tough:
ByteBuffer is limited (2GB) and inefficient
Unsafe is… unsafe and not supported
Safe and performant foreign-memory API:
control (de)allocation:
Arena, MemorySegment, SegmentAllocator
to access/manipulate: MemoryLayout, VarHandle
JNI isn’t ideal:
involves several tedious artifacts (header file, impl, …)
can only interoperate with languages that align
with OS/architecture the JVM was built for
doesn’t reconcile Java/C type systems
Streamlined tooling/API for foreign functions
based on method handles:
jextract: generates method handles from header file
classes to call foreign functions
Linker, FunctionDescriptor, SymbolLookup
connects Java with the native world
offers safe, detailed, and performant APIs
Current work on FFM:
improve memory access performance
reduce startup/warmup cost
refine record mappers
improve jextract
Vector API:
🎥 Fast Java Code with the Vector API (Mar 2023)
🎥 The Vector API in JDK 17 (Sep 2021)
📝 FizzBuzz – SIMD Style! (Mar 2021)
Foreign APIs:
📝 design documents
🎥 Panama Update with Maurizio Cimadamore (Jul 2019)
🎥 ByteBuffers are dead, long live ByteBuffers! (Feb 2020)
🎥 The State of Project Panama with Maurizio Cimadamore (Jun 2021)
JVM features and APIs for supporting easy-to-use, high-throughput, lightweight concurrency and new programming models
Profile:
project / wiki / mailing list
launched January 2018
led by Ron Pressler
A virtual thread:
is a regular Thread
low memory footprint (stack + bytes)
small switching cost
scheduled by the Java runtime
executes on platform thread
waits in memory
(no platform thread blocked)
a pinned VT will block the PT
caused by:
object monitors ( ⇜ ㉔synchronized)
class initialization
native calls
a captured VT blocks the PT
caused by file I/O
Resolve the conflict between:
simple-to-use, blocking programming
aligns with platform (tooling, debugging, …)
minimizes overhead while waiting
removes number-of-threads as bottleneck
Virtual threads aren’t "faster threads":
Each task takes the same time (same latency).
Virtual threads increase throughput:
when workload is not CPU-bound and
when number of concurrent tasks is high
Virtual threads are cheap and plentiful:
no pooling necessary
allows thread per task
allows liberal creation
of threads for subtasks
⇝ Enables new concurrency programming models.
prescribes single entry point
and clearly defined exit points
influenced languages and runtimes
When the flow of execution splits into multiple concurrent flows, they rejoin in the same code block.
⇝ Threads are short-lived:
start when task begins
end on completion
⇝ Enables parent-child/sibling relationships
and logical grouping of threads.
String executeTasks() throws InterruptedException {
// implicitly short-circuits on error
try (var scope = StructuredTaskScope.open()) {
Subtask<String> taskA = scope.fork(this::doA);
Subtask<String> taskB = scope.fork(this::doB);
// wait explicitly for success
// (throws errors if there were any)
scope.join();
// all tasks succeeded
return taskA.get() + taskB.get();
} catch (FailedException ex) {
return ex.getMessage();
}
}forked tasks are children of the scope
(visible in thread dumps)
creates relationship between threads
success/failure policy can be defined
across all children
Use Joiner to configure success/failure policy:
how are results collected?
when are subtasks cancelled?
when does join throw?
Pass to StructuredTaskScope.open(Joiner).
Virtual threads:
code is simple to write, debug, profile
allows high throughput
Structured concurrency:
clearer concurrency code
simpler failure/success policies
better debugging
Scoped values:
safer, more scalable data sharing
Current work:
finalize structured concurrency
reduce pinning during class initialization
improve lock info in thread dumps
Smaller, productivity-oriented Java language features
Profile:
project / wiki / mailing list
launched March 2017
led by Brian Goetz
Some downsides of Java:
can be cumbersome
tends to require boilerplate
situational lack of expressiveness
Amber continuously improves that situation.
Amber’s main thrust is pattern matching:
records
sealed types
improved switch
patterns
makes Java more expressive
reduces amount of code
makes us more productive
JDK 21:
records & sealed types
pattern matching basics
text blocks
single-file source launcher
JDK 25:
unnamed variables and patterns
multi-file source launcher
simplified main & module imports
flexible constructor bodies
Current work:
primitive types in patterns (JEP 530)
deconstruction
Faster startup, shorter time to peak performance, smaller footprint
Profile:
launched May 2022
led by Mark Reinhold
Java has really good peak performance,
but also tends to have:
slow startup time
slow warmup time
Early work by the runtime:
class loading
callsite linkage
constant pool resolution
interpretation
profile gathering
JIT compilation (C1, C2)
Can we shift this work?
Java already shifts computation:
compile-time constant folding
class loading
garbage collection
out-of-order execution
…
Let’s shift more computation ahead of time!
But Java is highly dynamic:
class loading
class redefinition
linkage
access control
method dispatch
run-time typing (e.g. casting)
introspection
JIT compilation, decompilation
How to AOT everything?
Leyden introduces AOTCache:
observe JVM
capture decisions in AOTCache
(expansion of CDS Archive)
use as "initial state" during future run
fall back to live observation/optimization
if necessary and possible
# training run (⇝ profile)
$ java -XX:AOTMode=record
-XX:AOTConfiguration=app.aotconf
-cp app.jar com.example.App ...
# assembly phase (profile ⇝ AOTCache)
$ java -XX:AOTMode=create
-XX:AOTConfiguration=app.aotconf
-XX:AOTCache=app.aot
-cp app.jar
# production run (AOTCache ⇝ performance)
$ java -XX:AOTCache=app.aot
-cp app.jar com.example.App ...Shortcut for most cases:
# training run (⇝ AOTCache)
$ java -XX:AOTCacheOutput=app.aot
-cp app.jar com.example.App ...
# production run (AOTCache ⇝ performance)
$ java -XX:AOTCache=app.aot
-cp app.jar com.example.App ...(Open to further improvements.)
Improve startup time by making the classes of an application instantly available, in a loaded and linked state, when the HotSpot JVM starts.
Spring PetClinic benchmarks:
up to ~40% startup time reduction
AOT cache size of ~130 MB
Improve warmup time by making method-execution profiles from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts.
Benchmark of a 100_000x loop over a simple stream:
~20% run time reduction
AOT cache size increased by ~2.5%
Limitation so far:
same JDK release / architecture / OS
consistent class path for training and production
consistent module options
limited use of JVMTI agents
Otherwise, AOT cache is ignored.
Leyden’s early access builds AOT more:
constant resolution
code compilation
dynamic proxies
reflection data
unfound classes
Benchmarks show ~70% startup time reduction.
improves Java’s overall footprint
focusses on startup/warmup time
by caching early JVM work
may explore stricter constraints
for more aggressive optimization
Advanced Java VM and Language feature candidates
Profile:
launched July 2014
led by Brian Goetz
Java has a split type system:
primitives
classes
We can only create classes, but:
have identity
have references
All classes come with identity:
extra memory for header
mutability
locking, synchronization, etc.
But not all custom types need that!
All class instances come as references:
memory access indirection
nullability
But not all custom types need that!
Valhalla’s goal is to unify the type system:
value types (disavow identity)
null-restriction + implicit constructors
(disavow references)
Potential follow-up work:
type classes (limited operator overloading)
universal generics (ArrayList<int>)
specialized generics (backed by int[])
value class ComplexNumber {
private double real;
private double imaginary;
// constructor, etc.
}Codes (almost) like a class - exceptions:
class and fields are implicitly final
superclasses are limited
No identity:
some runtime operations throw exceptions
"identity" check == compares by state
null is default value
Benefits:
guaranteed immutability
more expressiveness
more optimizations
The JDK (as well as other libraries) has many value-based classes, such as
OptionalandLocalDateTime. […] We plan to migrate many value-based classes in the JDK to value classes.
In general, value types have references:
allow null
prevent flattening
How do we get rid of them?
Details are in flux, but possibly:
null-restructed variables and fields:
// number can't be null
ComplexNumber! number = // ...implicit constructor marks good default instance
value class ComplexNumber {
private double real;
private double imaginary;
// implicitly sets all fields to default values
public implicit ComplexNumber();
public ComplexNumber(double r, double i) {
// ...
}
// etc.
}The just-in-time compiler can
inline/flatten variables …
of a value type
with implicit constructor
that are null-restricted
Performance comparable to today’s primitives! 🚀
Don’t create a type in order to get performance.
Instead:
"Is the type value-ish?" ⇝ value type
"Is all-fields-default usable?" ⇝ implicit constructor
"Is no null needed?" ⇝ restrict nullness
Performance emerges from domain decisions!
For value types to feel like primitives,
we need to use them with operators.
Maybe (!) Java will let us define common operations
for suitale types (with type classes):
var one = new ComplexNumber(1, 0);
var i = new ComplexNumber(0, 1);
var x = one + i; // maybe
var y = one * i; // maybe
var z = one $ i; // NO!When everybody creates their own value classes,
boxing becomes omni-present and very painful!
Universal generics allow value classes
as type parameters:
List<long> ids = new ArrayList<>();
List<RationalNumber> numbers = new ArrayList<>();Healing the rift in the type system is great!
But if ArrayList<int> is backed by Object[],
it will still be avoided in many cases.
Specialized generics will fix that:
Generics over primitives will avoid references!
Value types, implicit constructors, null-restriction
plus universal and specialized generics:
fewer trade-offs between
design and performance
no more manual specializations
better performance
can express design more clearly
more robust APIs
Makes Java more expressive and performant.
🤷🏾♂️
(All effort is focussed on JEP 401.)
📝 State of Valhalla
🎥 Valhalla - Java’s Epic Refactor (Dec 2024)
🎥 Growing the Java Language (Aug 2025)
Extend the reach of Java to foreign programming models such as SQL, differentiable programming, machine learning models, and GPUs
Profile:
launched January 2024
led by Paul Sandoz
Java is adjacent to other programmable systems:
GPUs and FPGAs
SQL databases
differentiable functions
Allow programming them with Java code.
Don’t adapt to each realm in a separate project.
Instead:
make Java code accessible
provide API to read and transform it
let ecosystem provide adaptions
Babylons’s central mechanism is code reflection:
enhancement of "regular" reflection
reaches down into methods/lambdas
symbolic representation of (Java) code
These are called code models.
Abstract syntax tree:
constructed during compilation
closely aligned with Java grammar
too much syntactic info
Bytecode:
created by compiler
specified by JVM Specification
too little important info
The code model design is heavily influenced by the design of data structures used by many modern compilers to represent code. These data structures are commonly referred to as Intermediate Representations (IRs). The design is further influenced by Multi-Level Intermediate Representation (MLIR), a sub-project of the LLVM Compiler Infrastructure project.
Identify code (e.g. with annotation):
@CodeReflection
static double sub(double a, double b) {
return a - b;
}Then:
compiler creates code model
stored in class files
accessible via reflection API
can be transformed by Java code
"Direct" GPU programming:
transform to GPU kernels (OpenCL C or CUDA C)
compile with GPU-specific toolchain
Triton-style:
offer class Triton with static methods
transform to Triton code model
compile with Triton toolchain
@CodeReflection
static void add_kernel2(
Ptr x, Ptr y, Ptr result, int n, int size) {
var pid = Triton.programId(0);
var block_start = pid * size;
var range = Triton.arange(0, size);
var offsets = Triton.add(block_start, range);
var mask = Triton.compare(
offsets, n, Triton.CompareKind.LessThan);
var x = Triton.load(Triton.add(x, offsets), mask);
var y = Triton.load(Triton.add(y, offsets), mask);
var output = Triton.add(x, y);
Triton.store(
Triton.add(result, offsets), output, mask);
}introduces code reflection & code models
allows their transformation
expands Java to foreign programming models
spearheads Java-on-GPU efforts (HAT)
🤷🏾♂️
📝 Accelerating Java on Parallel Architectures (Oct 2024)
🎥 Java for AI (Oct 2025)
🎥 Writing GPU-Ready AI Models in Pure Java (Oct 2025)
🎥 ONNX Based Generative AI LLMs in Java (Nov 2025)