Data-Oriented Programming

In Java (21)

Nicolai Parlog

nipafx.dev / @nipafx

Voxxed Days Bucharest

@VoxxedBucharest

Developer Advocate

Java Team at Oracle

Data-Oriented Programming

What is DOP?

A Lengthy Example

That was DOP!

Slides at slides.nipafx.dev.

Data-Oriented Programming

What is DOP?

A Lengthy Example

That was DOP!

Programming Paradigms

Paradigms often come with an
"Everything is a …" sentence.

The goal of any programming paradigm is to manage complexity.

complexity comes in many forms
not all paradigms handle all forms equally well

⇝ "It depends"

Object-Oriented Programming

Everything is an object

combines state and behavior
hinges on encapsulation
polymorphism through inheritance

Works best when defining/upholding boundaries.

Mixed Programming

Great use cases for OOP:

boundaries between libraries and clients
in large programs to enable modular reasoning

Consider a data-oriented approach for:

smaller (sub)systems
focused on data

Data-Oriented Programming

Guiding principles:

model the data, the whole data,
and nothing but the data
data is immutable
validate at the boundary
make illegal states unrepresentable

From Brian Goetz' seminal article:
Data Oriented Programming in Java

Data-Oriented Programming

What is DOP?

A Lengthy Example

That was DOP!

Crawling GitHub

Starting with a seed URL:

connect to URL
identify kind of page
identify interesting section
identify outgoing links
for each link, start at 1.

Crawling GitHub

That logic is implemented in:

public class PageTreeFactory {

	public static Page loadPageTree(/*...*/) {
		// [...]
	}

}

What does Page look like?

Requirements

Pages:

all pages have a url
unresolved pages have an error
resolved pages have content
GitHub pages have:
- outgoing links
- issueNumber or prNumber

Requirements

Operations on pages
(and their subtree):

pretty print
evaluate statistics

A Possible Implementation

A single Page class with this API:

public URL url();
public Exception error();
public String content();
public int issueNumber();
public int prNumber();
public Set<Page> links();
public Stream<Page> subtree();

public Stats evaluateStatistics();
public String printPageList();

A Possible Implementation

Problems:

page "type" is implicit
legal combination of fields is unclear
clients must "divine" the type
disparate operations on same class

Applying DOP

Model the data, the whole data,
and nothing but the data.

There are four kinds of pages:

error page
external page
GitHub issue page
GitHub PR page

⇝ Use four records to model them!

Detour: Records

[Finalized in Java 16]

[T]ransparent carriers for immutable data

opt out of encapsulation
allow compiler to understand internals

Most obvious consequence: less boilerplate.

Detour: Records

record ExternalPage(URI url, String content) { }

ExternalPage is final
private final fields: URI url and String content
constructor: ExternalPage(URI url, String content)
accessors: URI url() and String content()
equals(), hashCode(), toString() that use the two fields

All method/constructor bodies can be customized!

Modeling The Data

public record ErrorPage(
	URI url, Exception ex) { }

public record ExternalPage(
	URI url, String content) { }

public record GitHubIssuePage(
	URI url, int issueNumber,
	String content, Set<Page> links) { }

public record GitHubPrPage(
	URI url, int prNumber,
	String content, Set<Page> links) { }

Applying DOP

Model the data, the whole data,
and nothing but the data.

There are additional relations between them:

a page (load) is either successful or not
a successful page is either external or GitHub
a GitHub page is either for a PR or an issue

⇝ Use sealed types to model the alternatives!

Detour: Sealed Types

[Finalized in Java 17]

Sealed types limit inheritance,
by only allowing specific subtypes.

communicates intention to developers
allows compiler to check exhaustiveness

Detour: Sealed Types

public sealed interface Page
		permits ErrorPage, SuccessfulPage {
	// ...
}

Only ErrorPage and SuccessfulPage
can implement/extend Page.

⇝ interface MyPage extends Page doesn’t compile

Detour: Sealed Types

public sealed interface Page
		permits ErrorPage, SuccessfulPage {
	// ...
}

Inheriting types must be:

in the same module (package) as sealed type
directly inherit from sealed type
final, sealed, or non-sealed

Modeling Alternatives

public sealed interface Page
		permits ErrorPage, SuccessfulPage {
	URI url();
}

public sealed interface SuccessfulPage
		extends Page permits ExternalPage, GitHubPage {
	String content();
}

public sealed interface GitHubPage
		extends SuccessfulPage
		permits GitHubIssuePage, GitHubPrPage {
	Set<Page> links();
	default Stream<Page> subtree() { ... }
}

Applying DOP

Make illegal states unrepresentable.

Many are already, e.g.:

with error and with content
with issueNumber and prNumber
with isseNumber or prNumber but no links

Validation

Validate at the boundary.

⇝ Reject other illegal states in constructors.

record ExternalPage(URI url, String content) {
	// compact constructor
	ExternalPage {
		Objects.requireNonNull(url);
		Objects.requireNonNull(content);
		if (content.isBlank())
			throw new IllegalArgumentException();
	}
}

Applying DOP

Data is immutable.

Records are shallowly immutable,
but field types may not be.

⇝ Fix that during construction.

// compact constructor
GitHubPrPage {
	// [...]
	links = Set.copyOf(links);
}

Where Are We?

page "type" is explicit in Java’s type
only legal combination of fields are possible
API is more self-documenting
code is easier to test

But where did the operations go?

Operations On Data

Model the data, the whole data,
and nothing but the data.

⇝ Operations should be limited to derived quantities.

public Stats evaluateStatistics();
public String printPageList();

This actually applies to our operations.

But what if it didn’t? 😁

Operations On Data

Pattern matching on sealed types is perfect
to apply polymorphic operations to data!

And records eschew encapsulation,
so everything is accessible.

Detour: Type Patterns

[Finalized in Java 16]

Typecheck, cast, and declaration all in one.

if (rootPage instanceof GitHubPage ghPage)
	// do something with `ghPage`

checks rootPage instanceof GitHubPage
declares variable GitHubPage ghPage

Only where the check is passed, is ghPage in scope.
(Flow-scoping)

Detour: Flow Scoping

Only where the check is passed,
is ghPage in scope.

if (!(rootPage instanceof GitHubPage ghPage))
	// can't use `ghPage` here
	return;

// do something with `ghPage` here 😈

Detour: Patterns in Switch

[Preview since Java 17; probably final in 21 - JEP 441]

All patterns can be used in switches

switch (page) {
	case ExternalPage ext -> // use `ext`
	case GitHubPrPage pr -> // use `pr`
	// ...
};

checks page against all listed types
executes matching branch with respective variable

Gathering Statistics

In class Statistician:

public static Stats evaluate(Page rootPage) {
	Statistician statistician = new Statistician();
	statistician.evaluateTree(rootPage);
	return statistician.result();
}

private void evaluateTree(Page page) {
	if (page instanceof GitHubPage ghPage)
		ghPage.subtree().forEach(this::evaluatePage);
	else
		evaluatePage(page);
}

Gathering Statistics

In class Statistician:

private void evaluatePage(Page page) {
	// `numberOf...` are fields
	switch (page) {
		case ErrorPage __ -> numberOfErrors++;
		case ExternalPage __ -> numberOfExternals++;
		case GitHubIssuePage __ -> numberOfIssues++;
		case GitHubPrPage __ -> numberOfPrs++;
	}
}

Printing A Page List

In class Pretty:

public static String printPageList(Page rootPage) {
	if (!(rootPage instanceof GitHubPage ghPage))
		return createPageName(rootPage);

	return ghPage
			.subtree()
			.map(Pretty::createPageName)
			.collect(joining("\n"));
}

Printing A Page List

In class Pretty:

private static String createPageName(Page page) {
	return switch (page) {
		case ErrorPage err
			-> "💥 ERROR: " + err.url().getHost();
		case ExternalPage ext
			-> "💤 EXTERNAL: " + ext.url().getHost();
		case GitHubIssuePage issue
			-> "🐈 ISSUE #" + issue.issueNumber();
		case GitHubPrPage pr
			-> "🐙 PR #" + pr.prNumber();
	};
}

⇝ Simpler access with record/deconstruction patterns.

Detour: Record Patterns

[Preview since Java 19; probably final in 21 - JEP 440]

Records are transparent, so you can
deconstruct them in if and switch:

record ExternalPage(URI url, String content) { }

// elsewhere
Object obj = // ...
if (obj instanceof ExternalPage(var url, var content))
	// use `url` and `content` here
switch (obj) {
	case ExternalPage(var url, var content) ->
		// use `url` and `content` here
}

Deconstructing Data

Use deconstruction patterns:

public static String createPageName(Page page) {
	return switch (page) {
		case ErrorPage(var url, var ex)
			-> "💥 ERROR: " + url.getHost();
		case GitHubIssuePage(
				var url, var content, var links,
				int issueNumber)
			-> "🐈 ISSUE #" + issueNumber;
		// ...
	};
}

⇝ Eve simpler access with unnamed patterns.

Detour: Unnamed Patterns

[Maybe preview in Java 21 - JEP 443]

Replace variables you don’t need with _:

case ErrorPage(var url, _)
	-> "💥 ERROR: " + url.getHost();
case GitHubIssuePage(_, _, _, int issueNumber)
	-> "🐈 ISSUE #" + issueNumber;

Deconstructing Data

Use record and unnamed patterns for simple access:

private static String createPageName(Page page) {
	return switch (page) {
		case ErrorPage(var url, _)
			-> "💥 ERROR: " + url.getHost();
		case ExternalPage(var url, _)
			-> "💤 EXTERNAL: " + url.getHost();
		case GitHubIssuePage(_, _, _, issueNumber)
			-> "🐈 ISSUE #" + issueNumber;
		case GitHubPrPage(_, _, _, prNumber)
			-> "🐙 PR #" + prNumber;
	};
}

Operations On Data

Looks good?

"Isn’t switching over types icky?"

Yes, but why?

Extending Operations On Data

What happens when we add:

public record GitHubCommitPage(
		URI url, String hash,
		String content, Set<Page> links)
	implements GitHubPage {

	// [...]

}

Follow the compile errors!

Extending Operations On Data

First stop: the sealed supertype.

⇝ Permit the new subtype!

public sealed interface GitHubPage
		extends SuccessfulPage
		permits GitHubIssuePage, GitHubPrPage,
			GitHubCommitPage {
	// [...]
}

Extending Operations On Data

Next stop: all switch without default.

// non-exhaustive ⇝ compile error
switch (page) {
	case ErrorPage __ -> numberOfErrors++;
	case ExternalPage __ -> numberOfExternalLinks++;
	case GitHubIssuePage __ -> numberOfIssues++;
	case GitHubPrPage __ -> numberOfPrs++;
}

Extending Operations On Data

⇝ Handle the new subtype!

switch (page) {
	case ErrorPage __ -> numberOfErrors++;
	case ExternalPage __ -> numberOfExternalLinks++;
	case GitHubIssuePage __ -> numberOfIssues++;
	case GitHubPrPage __ -> numberOfPrs++;
	case GitHubCommitPage __ -> numberOfCommits++;
}

Operations On Data

To keep operations maintainable:

switch over sealed types
enumerate all possible types
(even if you need to ignore some)
avoid default branch

⇝ Compile error when new type is added.

Where Are We?

operations separate from data
adding new operations is easy
adding new data types is more work,
but supported by the compiler

⇝ Like the visitor pattern, but less painful.

Data-Oriented Programming

What is DOP?

A Lengthy Example

That was DOP!

Algebraic Data-Types

records are product types
sealed types are sum types

This simple combination of mechanisms — aggregation and choice — is deceptively powerful

Applications

ad-hoc data structures
complex return types
complex domains

Ad-hoc Data Structures

Often local, throw-away types used in one class or package.

record PageWithLinks(Page page, Set<URI> links) {

	PageWithLinks {
		requireNonNull(page);
		requireNonNull(links);
		links = new HashSet<>(links);
	}

	public PageWithLinks(Page page) {
		this(page, Set.of());
	}

}

Complex Return Types

Return values that are deconstructed immediately:

// type declaration
sealed interface Match<T> { }

record None<T>() implements MatchResult<T> { }
record Exact<T>(T entity) implements Match<T> { }
record Fuzzies<T>(Collection<Fuzzy<T>> entities)
	implements MatchResult<T> { }

record Fuzzy<T>(T entity, int distance) { }

// method declaration
Match<User> findUser(String userName) { ... }

Complex Return Types

Return values that are deconstructed immediately:

// calling the method
switch (findUser("John Doe")) {
	case None<> none -> // ...
	case Exact<> exact -> // ...
	case Fuzzies<> fuzzies -> // ...
}

Complex Domains

Long-living objects that are part
of the program’s domain.

For example Page.

Functional Programming?!

immutable data structures
methods (functions?) that operate on them

Isn’t this just functional programming?!

Kind of.

DOP vs FP

Functional programming:

Everything is a function

⇝ Focus on creating and composing functions.

Data-oriented programming:

Model data as data.

⇝ Focus on correctly modeling the data.

DOP vs OOP

OOP is not dead (again):

valuable for complex entities or rich libraries
use whenever encapsulation is needed
still a good default on high level

DOP — consider when:

mainly handling outside data
working with simple or ad-hoc data
data and behavior should be separated

Data-Oriented Programming

Use Java’s strong typing to model data as data:

use classes to represent data, particularly:
- data as data with records
- alternatives with sealed classes
use methods (separately) to model behavior, particularly:
- exhaustive switch without default
- pattern matching to destructure polymorphic data

Guiding Principles

model the data, the whole data,
and nothing but the data
data is immutable
validate at the boundary
make illegal states unrepresentable

Data Oriented Programming in Java

So long…

37% off with
code fccparlog

bit.ly/the-jms

Slides at slides.nipafx.dev
⇜ Get my book!

Follow Nicolai

nipafx.dev
/nipafx

Follow Java

inside.java // dev.java
/java // /openjdk

Data-Oriented Programming

In Java (21)

Data-Oriented Programming

Data-Oriented Programming

Programming Paradigms

Object-Oriented Programming

Mixed Programming

Data-Oriented Programming

Data-Oriented Programming

Crawling GitHub

Crawling GitHub

Requirements

Requirements

A Possible Implementation

A Possible Implementation

Applying DOP

Detour: Records

Detour: Records

Modeling The Data

Applying DOP

Detour: Sealed Types

Detour: Sealed Types

Detour: Sealed Types

Modeling Alternatives

Applying DOP

Validation

Applying DOP

Where Are We?

Operations On Data

Operations On Data

Detour: Type Patterns

Detour: Flow Scoping

Detour: Patterns in Switch

Gathering Statistics

Gathering Statistics

Printing A Page List

Printing A Page List

Detour: Record Patterns

Deconstructing Data

Detour: Unnamed Patterns

Deconstructing Data

Operations On Data

Extending Operations On Data

Extending Operations On Data

Extending Operations On Data

Extending Operations On Data

Operations On Data

Where Are We?

Data-Oriented Programming

Algebraic Data-Types

Applications

Ad-hoc Data Structures

Complex Return Types

Complex Return Types

Complex Domains

Functional Programming?!

DOP vs FP

DOP vs OOP

Data-Oriented Programming

Guiding Principles

So long…​

More

Follow Nicolai

Follow Java

Image Credits

So long…