Skip to content

Commit

Permalink
Adding new enable_java_integration method + clean up
Browse files Browse the repository at this point in the history
  • Loading branch information
horakivo committed Jan 23, 2025
1 parent 4609369 commit ac8f326
Show file tree
Hide file tree
Showing 4 changed files with 65 additions and 17 deletions.
63 changes: 57 additions & 6 deletions graalpy/graalpy-apache-arrow-guide/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,14 @@ Add the required dependencies for GraalPy and JArrow in the dependency section o
</dependency>
```

or

`build.gradle`
```
implementation "org.graalvm.python:python-community:$pythonVersion" // ①
implementation "org.graalvm.python:python-embedding:$pythonVersion" // ③
```

❶ The `python-community` dependency is a meta-package that transitively depends on all resources and libraries to run GraalPy.

❷ Note that the `python-community` package is not a JAR - it is simply a `pom` that declares more dependencies.
Expand All @@ -56,6 +64,14 @@ Add the required dependencies for GraalPy and JArrow in the dependency section o
</dependency>
```

or

`build.gradle`
```java
implementation "org.apache.arrow:arrow-vector:$arrowVersion" //
implementation "org.apache.arrow:arrow-memory-unsafe:$arrowVersion" //
```

❶ The `arrow-vector` dependency is used for managing in-memory columnar data structures.

❷ The `arrow-memory-unsafe` data structures defined in the `arrow-vector` will be backed by `sun.misc.Unsafe` library.
Expand Down Expand Up @@ -94,6 +110,27 @@ There is also another option `arrow-memory-netty`. You can read more about Apach
</build>
```

or

`build.gradle`
```
plugins {
id 'org.graalvm.python' version '25.0.0'
// ...
}
```

`build.gradle`
```
graalPy {
community = true
packages = [ // ①
'pandas', // ②
'pyarrow' // ③
]
}
```

❶ The `packages` section lists all Python packages optionally with [requirement specifiers](https://pip.pypa.io/en/stable/reference/requirement-specifiers/).

❷ Python packages and their versions can be specified as if used with pip. You can either install the latest version or you can specify the version e.g.`pandas==2.2.2`.
Expand Down Expand Up @@ -136,20 +173,22 @@ All Python source code should be placed in `src/main/resources/org.graalvm.pytho
Let's create a `data_analysis.py` file to calculate the mean and median for the Float8Vector using Pandas:
```python
import pandas as pd
from polyglot.arrow import Float8Vector #
from polyglot.arrow import Float8Vector, enable_java_integration

enable_java_integration() #

def calculateMean(valueVector: Float8Vector) -> float:
series = pd.Series(valueVector, dtype="float64[pyarrow]") #
return series.mean()
series = pd.Series(valueVector, dtype="float64[pyarrow]") #
return series.mean()


def calculateMedian(valueVector: Float8Vector) -> float:
series = pd.Series(valueVector, dtype="float64[pyarrow]")
return series.median()
series = pd.Series(valueVector, dtype="float64[pyarrow]")
return series.median()

```

This import is crucial. Without it zero copy memory won't be achieved.
You need to call this method to enable the zero copy integration.

❷ In pandas you need to specify that the series should be backed by pyarrow, therefore adding `[pyarrow]` to the dtype.

Expand Down Expand Up @@ -240,7 +279,19 @@ To compile the application:
./mvnw package
```

or

```bash
./gradlew build
```

To run the application:
```bash
./mvnw exec:java -Dexec.mainClass="com.example.Main"
```

or

```bash
./gradlew run
```
12 changes: 6 additions & 6 deletions graalpy/graalpy-apache-arrow-guide/build.gradle
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
plugins {
id 'org.graalvm.python' version '25.0.0'
id 'application'
id 'java'
id 'org.graalvm.python' version '25.0.0'
}

group = 'com.example'
Expand Down Expand Up @@ -34,17 +34,17 @@ compileJava {
}

tasks.withType(JavaExec) {
jvmArgs = ['--enable-preview']
jvmArgs = ['--enable-preview', '--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED']
}

dependencies {
// Apache Arrow
implementation "org.apache.arrow:arrow-vector:$arrowVersion"
implementation "org.apache.arrow:arrow-memory-unsafe:$arrowVersion"
implementation "org.apache.arrow:arrow-vector:$arrowVersion" //
implementation "org.apache.arrow:arrow-memory-unsafe:$arrowVersion" //

// GraalPy
implementation "org.graalvm.python:python-community:$pythonVersion"
implementation "org.graalvm.python:python-embedding:$pythonVersion"
implementation "org.graalvm.python:python-community:$pythonVersion" //
implementation "org.graalvm.python:python-embedding:$pythonVersion" //
}

application {
Expand Down
4 changes: 0 additions & 4 deletions graalpy/graalpy-apache-arrow-guide/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,6 @@
<package>pandas</package> <!---->
<package>pyarrow</package> <!---->
</packages>
<pythonHome>
<includes></includes>
<excludes>.*</excludes>
</pythonHome>
</configuration>
<goals>
<goal>process-graalpy-resources</goal>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import pandas as pd
from polyglot.arrow import Float8Vector # ①
from polyglot.arrow import Float8Vector, enable_java_integration

enable_java_integration() # ①

def calculateMean(valueVector: Float8Vector) -> float:
series = pd.Series(valueVector, dtype="float64[pyarrow]")
Expand Down

0 comments on commit ac8f326

Please sign in to comment.