public sealed interface Page
permits ErrorPage, SuccessfulPage {
URI url();
}
we’ll implement a GitHub crawler
we’ll aggressively use, abuse, and overuse
modern Java features
this is a showcase, not a tutorial
⇝ go to youtube.com/@java for more
slides at slides.nipafx.dev
(hit "?" to get navigation help)
ask questions at any time
Starting with a seed URL:
connect to URL
identify kind of page
identify interesting section
identify outgoing links
for each link, start at 1.
Then:
print statistics
print page list
show pages on localhost
Domain model:
create with records and sealed interfaces
operate on with pattern matching
Fetching pages:
HTTP client to fetch from GitHub
virtual threads via structured concurrency
Present results:
format with text blocks and string templates
host with simple file server
(And modules for reliability.)
JDK 23 EA with preview features!
Features that aren’t final on JDK 21:
unnamed patterns ㉒ (JEP 456)
StructuredTaskScope
string templates
public sealed interface Page
permits ErrorPage, SuccessfulPage {
URI url();
}
public record GitHubPrPage(
URI url, String content, Set<Page> links, int number)
implements GitHubPage {
public GitHubPrPage {
// argument validation
}
public GitHubPrPage(
URI url, String content, int number) {
this(url, content, new HashSet<>(), number);
}
// `equals` and `hashcode` based on `url`
}
public static String pageName(Page page) {
return switch (page) {
case ErrorPage(var url, _)
-> "💥 ERROR: " + url.getHost();
case ExternalPage(var url, _)
-> "💤 EXTERNAL: " + url.getHost();
case GitHubIssuePage(_, _, _, int number)
-> "🐈 ISSUE #" + number;
case GitHubPrPage(_, _, _, int number)
-> "🐙 PR #" + number;
};
}
// creation
var client = HttpClient.newHttpClient();
// use
var request = HttpRequest
.newBuilder(url)
.GET()
.build();
return client
.send(request, BodyHandlers.ofString())
.body();
try (var scope =
new StructuredTaskScope.ShutdownOnFailure()) {
var futurePages = new ArrayList<Subtask<Page>>();
for (URI link : links)
futurePages.add(
scope.fork(() -> createPage(link, depth)));
scope.join();
scope.throwIfFailed();
return futurePages.stream()
.map(Subtask::get)
.collect(toSet());
} catch (ExecutionException ex) {
// [...]
}
var html = HTML."""
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>\{Pretty.pageName(rootPage)}</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<div class="container">
\{pageTreeHtml(rootPage)}
</div>
</body>
</html>
""";
public class Html {
public static final Processor<Document, ...> HTML =
new JsoupHtmlProcessor();
private static class JsoupHtmlProcessor
implements Processor<Document, ...> {
@Override
public Document process(StringTemplate template) {
return Jsoup.parse(template.interpolate());
}
}
}
SimpleFileServer.createFileServer(
address,
serverDir.toAbsolutePath(),
OutputLevel.INFO)
.start();
Let’s watch Jose’s exploration…