Data-Oriented Programming

In Java 19

Developer Advocate

Java Team at Oracle

Data-Oriented Programming

What is DOP?
A Lengthy Example
That was DOP!

Das Ist Alles Nur Geklaut 🎶

(This is all just snitched.)

Seminal article by Brian Goetz on InfoQ:
Data Oriented Programming in Java

Data-Oriented Programming

What is DOP?
A Lengthy Example
That was DOP!

Programming Paradigms

Paradigms often come with an
"Everything is a …​" sentence.

The goal of any programming paradigm is to manage complexity.

  • complexity comes in many forms

  • not all paradigms handle all forms equally well

⇝ "It depends"

Object-Oriented Programming

Everything is an object

  • combines state and behavior

  • works best when defining/upholding boundaries

Great use cases:

  • boundaries between libraries and clients

  • in large programs to enable modular reasoning

Smaller programs/subsystems have less need for boundaries.

Data-Oriented Programming

Use Java’s strong typing to model data as data:

  • use classes to represent data, particularly:

    • data as data with records

    • alternatives with sealed classes

  • use methods (separately) to model behavior, particularly:

    • exhaustive switch without default

    • pattern matching to destructure polymorphic data

Guiding Principles

  • model the data, the whole data,
    and nothing but the data

  • data is immutable

  • validate at the boundary

  • make illegal states unrepresentable

Data-Oriented Programming

What is DOP?
A Lengthy Example
That was DOP!

Crawling GitHub

Starting with a seed URL:

  1. connect to URL

  2. identify kind of page

  3. identify interesting section

  4. identify outgoing links

  5. for each link, start at 1.

Crawling GitHub

That logic is implemented in:

public class PageTreeFactory {

	public static Page loadPageTree(/*...*/) {
		// [...]
	}

}

What does Page look like?

Page Requirements

  • all pages have a url

  • pages that couldn’t be resolved, have an error

  • pages that could be resolved have content

  • GitHub pages have:

    • outgoing links

    • issueNumber or prNumber

Requirements

Operations on pages and their subtree:

  • pretty print

  • collect statistics

A Possible Implementation

A single Page class with this API:

public URL url();
public Exception error();
public String content();
public int issueNumber();
public int prNumber();
public Set<Page> links();
public Stream<Page> subtree();

public Stats evaluateStatistics();
public String printPageList();

A Possible Implementation

Problems:

  • page "type" is implicit

  • legal combination of fields is unclear

  • clients must "divine" the type

  • disparate operations on same class

Applying DOP

Model the data, the whole data, and nothing but the data.

There are four kinds of pages:

  • error page

  • external page

  • GitHub issue page

  • GitHub PR page

⇝ Use four records to model them!

Modeling The Data

public record ErrorPage(
	URI url, Exception ex) { }

public record ExternalPage(
	URI url, String content) { }

public record GitHubIssuePage(
	URI url, int issueNumber,
	String content, Set<Page> links) { }

public record GitHubPrPage(
	URI url, int prNumber,
	String content, Set<Page> links) { }

Applying DOP

Model the data, the whole data, and nothing but the data.

There are additional relations between them:

  • a page (load) is either successful or not

  • a successful page is either external or GitHub

  • a GitHub page is either for a PR or an issue

⇝ Use sealed types to model the alternatives!

Modeling Alternatives

public sealed interface Page
		permits ErrorPage, SuccessfulPage {
	URI url();
}

public sealed interface SuccessfulPage
		extends Page permits ExternalPage, GitHubPage {
	String content();
}

public sealed interface GitHubPage
		extends SuccessfulPage
		permits GitHubIssuePage, GitHubPrPage {
	Set<Page> links();
	default Stream<Page> subtree() { ... }
}

Applying DOP

Make illegal states unrepresentable.

Many are already, e.g.:

  • with error and with content

  • with issueNumber and prNumber

  • with isseNumber or prNumber
    but no links

Validation

Validate at the boundary.

⇝ Reject other illegal states in constructors.

public ExternalPage {
	Objects.requireNonNull(url);
	Objects.requireNonNull(content);
	if (content.isBlank())
		throw new IllegalArgumentException();
}

Applying DOP

Data is immutable.

Records are shallowly immutable,
but field types may not be.

⇝ Fix that during construction.

public GitHubPrPage {
	// [...]
	links = Set.copyOf(links);
}

Where Are We?

  • page "type" is explicit in Java’s type

  • only legal combination of fields are possible

  • API is more self-documenting

  • code is easier to test

But where did the operations go?

Operations On Data

Model the data, the whole data, and nothing but the data.

⇝ Operations should be limited to derived quantities.

public Stats evaluateStatistics();
public String printPageList();

This actually applies to our operations.

But what if it didn’t? 😁

Operations On Data

Pattern matching on sealed types is perfect
to apply polymorphic operations to data!

And records eschew encapsulation,
so everything is accessible.

Printing A Page List

In class Pretty:

public static String printPageList(Page rootPage) {
	if (!(rootPage instanceof GitHubPage ghPage))
		return createPageName(rootPage);

	return ghPage
			.subtree()
			.map(Pretty::createPageName)
			.collect(joining("\n"));
}

Printing A Page List

In class Pretty:

private static String createPageName(Page page) {
	return switch (page) {
		case ErrorPage err
			-> "💥 ERROR: " + err.url().getHost();
		case ExternalPage ext
			-> "💤 EXTERNAL: " + ext.url().getHost();
		case GitHubIssuePage issue
			-> "🐈 ISSUE #" + issue.issueNumber();
		case GitHubPrPage pr
			-> "🐙 PR #" + pr.prNumber();
	};
}

Gathering Statistics

In class Statistician:

public static Stats evaluate(Page rootPage) {
	Statistician statistician = new Statistician();
	statistician.evaluateTree(rootPage);
	return statistician.result();
}

private void evaluateTree(Page page) {
	if (page instanceof GitHubPage ghPage)
		ghPage.subtree().forEach(this::evaluatePage);
	else
		evaluatePage(page);
}

Gathering Statistics

In class Statistician:

private void evaluatePage(Page page) {
	// `numberOf...` are fields
	switch (page) {
		case ErrorPage __ -> numberOfErrors++;
		case ExternalPage __ -> numberOfExternalLinks++;
		case GitHubIssuePage __ -> numberOfIssues++;
		case GitHubPrPage __ -> numberOfPrs++;
	}
}

Operations On Data

Yes, switching over types is icky.

But switching over sealed types is safe.

What happens when we add:

public record GitHubCommitPage(
		URI url, String hash,
		String content, Set<Page> links)
	implements GitHubPage {

	// [...]

}

Follow the compile errors!

Extending Operations On Data

First stop: the sealed supertype.

⇝ Permit the new subtype!

public sealed interface GitHubPage
		extends SuccessfulPage
		permits GitHubIssuePage, GitHubPrPage,
			GitHubCommitPage {
	// [...]
}

Extending Operations On Data

Next stop: all switch without default.

⇝ Handle the new subtype!

switch (page) {
	case ErrorPage __ -> numberOfErrors++;
	case ExternalPage __ -> numberOfExternalLinks++;
	case GitHubIssuePage __ -> numberOfIssues++;
	case GitHubPrPage __ -> numberOfPrs++;
	case GitHubCommitPage __ -> numberOfCommits++;
}

Where Are We?

  • operations separate from data

  • adding new operations is easy

  • adding new data types is more work,
    but supported by the compiler

⇝ Like the visitor pattern, but less painful.

Data-Oriented Programming

What is DOP?
A Lengthy Example
That was DOP!

Algebraic Data-Types

  • records are product types

  • sealed types are sum types

This simple combination of mechanisms — aggregation and choice — is deceptively powerful

Applications

  • ad-hoc data structures

  • complex return types

  • complex domains

Ad-hoc Data Structures

Often local, throw-away types used in one class or package.

record PageWithLinks(Page page, Set<URI> links) {

	PageWithLinks {
		requireNonNull(page);
		requireNonNull(links);
		links = new HashSet<>(links);
	}

	public PageWithLinks(Page page) {
		this(page, Set.of());
	}

}

Complex Return Types

Return values that are deconstructed immediately:

sealed interface MatchResult<T> {
    record NoMatch<T>() implements MatchResult<T> { }
    record ExactMatch<T>(T entity)
		implements MatchResult<T> { }
    record FuzzyMatches<T>(
			Collection<FuzzyMatch<T>> entities)
        implements MatchResult<T> { }

    record FuzzyMatch<T>(T entity, int distance) { }
}

MatchResult<User> findUser(String userName) { ... }

Complex Domains

Long-living objects that are part
of the program’s domain.

For example Page.

Functional Programming?!

  • immutable data structures

  • methods (functions?) that operate on them

Isn’t this just functional programming?!

Kind of.

FP vs DOP

Functional programming:

Everything is a function

⇝ Focus on creating and composing functions.


Data-oriented programming:

Model data as data.

⇝ Focus on correctly modeling the data.

OOP vs DOP

OOP is not dead (again):

  • valuable for complex entities
    or rich libraries

  • use whenever encapsulation is needed

  • still a good default on high level

DOP —  consider when:

  • handling outside data (like JSON)

  • working with simple or ad-hoc data

  • data and behavior should be separated

Guiding Principles

  • model the data, the whole data,
    and nothing but the data

  • data is immutable

  • validate at the boundary

  • make illegal states unrepresentable

So long…​

37% off with
code fccparlog

bit.ly/the-jms

More

Slides at slides.nipafx.dev
⇜ Get my book!

Follow Nicolai

nipafx.dev
/nipafx

Follow Java

inside.java // dev.java
/java    //    /openjdk

Image Credits