Tabs and Brackets: Mixing Java and Python using GraalPy

Java and Python are both excellent tools, and even though programmers usually like nothing better than debating which language, framework, or editor is best, that is not the goal of this post. Each language has strengths and weaknesses, but what I consider interesting is that Java and Python complement each other so well. However, systems using multiple languages face increased complexity in deployment, maintenance, testing, integration, and interoperability. This post aims to answer two main questions through a set of practical hands-on experiments.

First, we explore how we might partly approach the latter problem by using GraalVM and GraalPy to create a single standalone binary containing both Java and Python application code, their respective dependencies, and runtimes!

Second, early adoption always comes with risks and while GraalVM has achieved a fair level of maturity, GraalPy is still a relatively new technology. Rather than making definitive recommendations regarding its general suitability for production, this post aims to share my thought and practical experiences using it, possibly providing you with extra insights into its capabilities and limitations when evaluating whether GraalPy aligns well with your needs.

This post is about bridging two languages and aims to help both Java and Python programmers. To minimize confusion, the post is augmented with extra background information and explanations, requiring only basic Java or Python knowledge.

We’ll start with a brief introduction to each technology, followed by some code examples, and finally, a conclusion of our findings.

Some Background

Java is used to build everything from web services, mobile apps, desktop applications, and games. Java code is compiled into bytecode, which can be executed on any device equipped with a Java Virtual Machine (JVM), following the "write once, run anywhere" (WORA) principle. Its reliability, backward compatibility, and mature ecosystem have made it popular for large-scale enterprise software. However, being statically typed, focused on the object-oriented programming (OOP) paradigm, and arguably somewhat expressive syntax makes it less popular for scripting. On the other hand, Python uses JIT compilation, is dynamically typed, has a more compact syntax, and allows the mixing of different programming paradigms, leaving the user with great freedom. Moreover, adding an ecosystem consisting of some of the most popular AI/ML and data science libraries, such as TensorFlow, PyTorch, SciKit-Learn, Pandas, and Numpy, has made Python the language of AI/data science. Hence, a toolbox containing both Java and Python can solve a broad spectrum of problems.

So, how can we achieve this magic team-up using GraalVM and GraalPy, and what are they exactly? In their own words, the former is "an advanced JDK with ahead-of-time native image compilation" [1]. This means that instead of compiling Java to bytecode and running that on an already installed Java runtime, GraalVM creates a standalone binary. This breaks the WORA principle, as these binaries are system/OS dependent but also yield the benefit of a reduced memory footprint, improved performance, faster startup, and has well established support for major frameworks such as SpringBoot, Micronaut, Helidon, and Quarkus.

GraalPy is a Python implementation for the JVM built on GraalVM. It provides a high-performance embeddable runtime compatible with CPython 3.11 [2], enabling the transformation of Python applications into standalone binaries. Moreover, they claim that pure Python (without libraries), often runs faster in GraalPy than on CPython.

This all sounds amazing, yet it should be noted that not all PyPI packages are supported, and native extension support is still experimental. While this is not optimal, most popular libraries are supported fully (or close to). Moreover, the GraalVM team continuously performs and publishes compatibility testing for the top packages [3].

Installation and Project Setup

Before we can get our hands dirty, we need to install the required tooling and setup a project.

Note: While we cover how to set everything up yourself, this post comes with a repo containing build configurations for both Gradle and Maven, as well as all example code.

Minimal Project Setup

First, set up a minimal Java application structure:

 └── app
     ├── build.gradle.kts (For Gradle)
     ├── pom.xml (For Maven)
     └── src
         └── main
             ├── java
             │   └── demo
             │       └── App.java
             └── resources

You can create this manually or automagically generate it using Gradle or Maven; In this post I will be using Gradle.

gradle init --type java-application  --project-name demo  \
             --package demo --no-split-project

If you are using Maven, there is also a archetype available to generate a GraalPy starter project:

mvn archetype:generate \
  -DarchetypeGroupId=org.graalvm.python \
  -DarchetypeArtifactId=graalpy-archetype-polyglot-app \
  -DarchetypeVersion=24.1.0

Installing GraalVM

The GraalVM website provides detailed instructions on the different installation methods. Personally, I am a big fan of sdkman, a version manager for SDKs on Unix systems that makes installing GraalVM, and other JDK, no more than a one-liner. Keep in mind that not all GraalPy/GraalVM versions are compatible. I chose the latest stable release for Java 23, which at the time of writing is 23.0.2.

Installing GraalPy

Installation of GraalPy can be done in a number of ways. By far the most straightforward is through either the Maven or Gradle plugin. Upon building, the GraalVM plugin will automagically set up GraalPy and create a virtual environment for your project. To setup the project using Gradle, make the following changes to build.gradle.kts.

Apply the Gradle plugin for GraalVM and GraalPy:

id("org.graalvm.buildtools.native:0.10.5")
id("org.graalvm.python:24.1.2")

Include the relevant dependencies:

implementation("org.graalvm.polyglot:polyglot:24.1.2")
implementation("org.graalvm.polyglot:python:24.1.2")

Alternatively, all GraalPy releases can be downloaded from its Github.

Hello World

GraalVM supports polyglot programming to exchange value between different languages (not just Python). Effectively, polyglot programming forms the bridge between GraalVM and GraalPy. Let’s see this in practice! The following snippet shows the basic pattern; We create a GraalVM Polyglot context and subsequently call its eval function with a language id, in this case python, and a greeting statement - it’s that simple!

try (Context context = Context.newBuilder().allowAllAccess(true).build()) {
    context.eval("python", "print('Hello to you!')");
}

To perform the native compilation and create binaries (which can take a few minutes) run gradle nativeCompile. The generated binary can be found in build/native/nativeCompile/application.

Let’s see what happens when we execute it!

$ build/native/nativeCompile/application
> Hello to you!

While you can run the executable directly from the terminal, the gradle nativeRun task will do it for you and, if required, (re)compile the code.

Basic Language Interoperability

You might have noticed the Python and Java code do not really interact yet; instead, we simply run Python code from within Java. The following example shows how to call a Python function from Java. The Python function in the following snippet accepts a message and returns a dictionary with the longest word and the message’s total length. Next, we define a Java functional interface matching the signature of the Python function. Fun fact: There are no requirements on the method, parameter, or interface name (they do not need to match those in Python). Here is an example of a functional interface matching the latter Python function.

@FunctionalInterface
public interface CoolFunction {
    Map<String, Object> apply(String message);
}

As before, we call context.eval with our Python code, but now convert the result into an instance of CoolFunction. The call apply method uses a string of choice to execute the Python function and yield a java Map. In code:

String pythonCode = """
    def cool_function(message):
        words = message.split()
        longest_word = max(words, key=len)
        total_length = len(message)
        return {'longest_word': longest_word, 'total_length': total_length}

    cool_function
    """;

try (Context context = Context.newBuilder().allowAllAccess(true).build()) {
    // Evaluate Python code and retrieve the function
    Value coolFunctionValue = context.eval("python", pythonCode);

    // Convert to Java interface
    CoolFunction coolFunction = coolFunctionValue.as(CoolFunction.class);

    // Call the function
    Map<String, Object> result = coolFunction.apply("Hello GraalPy and Java interoperability!");

    System.out.println("Longest word: " + result.get("longest_word"));
    System.out.println("Total length: " + result.get("total_length"));
}

We didn’t not need to define any mapping for the types of the argument and return values, as for most primitive types and common collections GraalPy type mappings have already been defined; For example, in this case, the return Python dict type is automagically mapped to a Java Map<T, K>. Custom types are possible, through interoperability [4] extension API.

You might also have noticed the last line of the Python code in the previous snippet contains just cool_function. Like code executed in the Python REPL, the return value of context.eval is the last evaluated expression; hence, to make eval return a reference to the function, we need to evaluate it after its definition.

# Example of Python REPL output
>>> def a(): return 1
... # No output
>>> a
<function a at 0x101567100>

So, what if we want to export more functions - or even classes? Hardcoding all Python code inside a Java string is not exactly optimal. Luckily, there are some solutions, which our next example will illustrate.

The goal: parse CSV data from Java by using CSVParser class. The parser reads the data and creates a Dataset object with the parsed rows, column names, and some basic dataset statistics. Both classes (CSVParser and Dataset) are implemented in Python. Once the parsing is done, we want to use the resulting Dataset instance directly from Java.

As in the previous example, we first define their respective interfaces in Java:

public interface CSVParser {
    Dataset parse(String csvData);
}

public interface Dataset {
    List<String> getColumnNames();
    List<Object> getRows();
    Map<String, Object> getStats();
}

Next, to avoiding hardcoding Python in Java strings, we create a Python file called demo.py. In Python a module is a set of statements and definitions in a .py file. The name of the file becomes module name, in this case demo. Place the file in src/main/resources/org.graalvm.python.vfs/src. The directory name is important as Python files (modules) placed will be exposed GraalPy. Here is an example of a toy Python implementation of the Dataset and CSVParser:

import polyglot

class Dataset:
    def __init__(self, rows, column_names):
        self.rows = rows
        self.column_names = column_names

    def get_column_names(self):
        return self.column_names

    def get_rows(self):
        return self.rows

    def get_stats(self):
        return {"row_count": len(self.rows), "column_count": len(self.column_names)}


class CSVParser:
    def parse(self, data: str):
        rows = [row.split(",") for row in data.split("\n")]
        column_names = rows[0]
        rows = rows[1:]
        return Dataset(rows, column_names)


polyglot.export_value(CSVParser, "CSVParser")
polyglot.export_value(Dataset, "Dataset")

The most important thing to note in this code is the use of the polyglot module. This module allows us to expose Python objects to Java (and other Graal languages). By calling polyglot.export_value function, we export the Python CSVParser and Dataset classes and bind them to the context object. This brings us to the final step: using them in Java. The following code snippet shows we can import the demo module by evaluating a Python import statement in context. After importing the “demo” module into the context, we can retrieve CSVParser from the registered polyglot bindings. As we exported a class definition, we first need to create a new instance of it. Subsequently, we cast the instance to the corresponding Java interface defined earlier.

try (var context = GraalPyResources.createContext())
    // Execute an regular Python import statement of the demo module
    context.eval("python", "import demo");

    // Retrieve the CSVParser class from the Python context
    Value pyCSVParserCls = context.getPolyglotBindings().getMember("CSVParser");

    // Create a new instance of the CSVParser class
    CSVParser pyCSVParserInst = pyCSVProcessorCls.newInstance().as(CSVParser.class);
}

All that remains now is to parse some csv data, for example using:

String csvData = """
        "a","b","c"
        1,2,3
        4,5,6
        7,8,9
        10,11,12""";

Dataset dataset = pyCSVParserInst.parse(csvData);

System.out.println("Dataset has columns: " + dataset.get_column_names());
System.out.println("Dataset has rows: " + dataset.get_rows());
System.out.println("Dataset has stats: " + dataset.get_stats());

The latter code involves some reflection, which requires extra configuration before we can build native binary with GraalVM; More details on this can be found here [5]. But for this example, all we need is to add a file resources/META-INF/native-image/proxy-config.json with the names of the two interfaces:

[
    ["com.demo.graalpy.CSVParserExample$CSVParser"],
    ["com.demo.graalpy.CSVParserExample$Dataset"]
]

To finalize, we run the application, which will yield the output:

Dataset has columns: ['"a"', '"b"', '"c"']
Dataset has rows: [['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9'], ['10', '11', '12']]
Dataset has stats: {'row_count': 4, 'column_count': 3}

It works!

While we are using Python code in Java, exporting Java types to Python using the same method is also possible.

Packages

GraalPy allows you to install packages by specifying them in your Gradle or Maven build file. For instance, for Gradle, adding Pandas 1.5.2 looks like this:

packages = setOf(
    "pandas==1.5.2"
)

Python has an unrivaled ecosystem of AI and data science libraries, and after achieving basic interoperability with relative ease, the possibilities of connecting these technologies made me rather enthusiastic. As you might have guessed by now, this is where things start to get tricky!

Many C-Python packages, such as NumPy, rely on native extensions, often C-based extensions, an OS-dependent library, or a specific compiler. Conveniently, for most popular packages, a precompiled binary, called wheels, for different operating systems, architectures, and Python versions. Hence, you will hardly ever need to worry about compiling your packages or require the tools to do so. While GraalPy’s landing page states it is “Compatible with many Python packages,” the reality is not straightforward. In my opinion, a more accurate statement would be: GraalPy is (often partially) compatible with a specific version of some packages, particularly those with native extension.

When you install a package, the GraalPy implementation of pip (Python’s packages manager) will automatically look for package "patches". GraalPy provides these patches to remedy incompatible code; which is very nice. However, patches tend to be available only for very specific versions of the original package, which is why, as I mentioned, the package version matters. Moreover, recompilation is often required after applying a patch, which requires you to have all the tooling and dependencies for its compilation available and can take a long time. Identifying what you are missing (or that what you have is incompatible) usually occurs only halfway through the compilation process via a cryptic stack trace. Hence, I became quite frustrated after spending considerable time in a transitive hell determining which libraries play nice together. For example, at some point, I required a specific version of library X on my PATH for package A to compile but needed another version on my PATH for package B. Since the GraalPy package manager builds, compiles, and installs every dependency specified in the build requirements file in one go…things break. This was easy to solve by manually building and caching each separately with their respective configuration. Sidenote: If you are reading this and have some suggestions on how to handle this more effectively, please share!

Let’s stay pragmatic; the latter experience, while frustrating, is not limited to GraalPy and will eventually be encountered by virtually every type of software developer, especially if you are working with native code. Many packages play nice with GraalPy, but as stated before, choosing the right version is important. So how do you select the right version and install a package? Conveniently, the GraalPy team performs compatibility testing for the top 500 packages. Yet, the compatibility testing results leave some room for surprise as its result is not binary but a percentage. So, suppose we want to install Pandas. After looking up its compatibility, we find that 1.5.2 is our best bet with 86.84 percent compatibility.

However, Pandas has a transitive dependency on NumPy, specified as numpy>=1.21.0; the package manager resolved this to numpy==2.0.2, which is incompatible with GraalPy. After checking the compatibility index again, we find NumPy is compatible with GraalPy, but we should instead use numpy==1.24.3. After explicitly specifying this, all was well, and I could use Pandas without hassle!

Conclusion and Recommendations

GraalVM and GraalPy offer Java-Python interoperability with a surprisingly straightforward setup for many basic use cases. I found interoperability reliable when the interop types are restricted to core types for which default type mappings have been defined by GraalPy.

The ability to combine the strengths of both languages shows great promise. The ability to use libraries from Python, such as those for AI/ML, is incredibly powerful yet also the biggest challenge. While I believe enough GraalPy-compatible packages are available to make it viable for real-world applications, finding compatible versions and possibly recompiling these can take a lot of time, can be frustrating, and there is no guarantee of success. Yet, consider that other solutions also introduce overhead, which can add up over time. For instance, implementing (and testing) the required functionality in Java costs time too. Alternatively, a microservice might be straightforward but needs to be maintained, tends to be slower, and can make traceability more challenging. And then there is deployment. For me, this is one of the areas where GraalPy really shines. The self-contained binary streamlines deployments by removing the need for a Python or Java runtime, offers performance improvements, and has reduced storage requirements.

As we have seen earlier in this post, adding GraalPy to your build is straightforward. But this does assume that your Java application already runs on GraalVM. If not, migration to GraalVM might require additional configuration, for example, for code relying on reflection, and some libraries may not be compatible with GraalVM. Hence, if you are not already using GraalVM, I highly recommend first assessing any challenges that might occur there.

As always, no solution is perfect, and each comes with trade-offs. With this post, I hope I provided you with a picture of what GraalPy can do, the challenges you might face, and helped you gain more insight when assessing its suitability for your product.

Thank you for reading.