Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix incorrect Javadoc #24

Closed
wants to merge 1 commit into from

Conversation

Marcono1234
Copy link
Contributor

For loading a Language cannot use Arena.ofConfined() because that will directly unload the library again and then seems to crash the JVM afterwards, at least on Windows.

And SymbolLookup#libraryLookup does not consider java.library.path, see https://bugs.openjdk.org/browse/JDK-8311090 and my comment on tree-sitter/tree-sitter-java#182 (comment).

For loading a `Language` cannot use `Arena.ofConfined()` because that will
directly unload the library again and then seems to crash the JVM afterwards,
at least on Windows.

And `SymbolLookup#libraryLookup` does not consider `java.library.path`,
see https://bugs.openjdk.org/browse/JDK-8311090.
* language = Language.load(symbols, "tree_sitter_java");
* }
* String library = System.mapLibraryName("tree-sitter-java");
* SymbolLookup symbols = SymbolLookup.libraryLookup(library, Arena.ofAuto());
Copy link
Contributor Author

@Marcono1234 Marcono1234 Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope Arena.ofAuto() is actually safe here and jtreesitter always holds a strong reference to the Language. Otherwise this could crash the JVM as well in case the Arena is not reachable anymore and is garbage collected (?).

Arena.ofAuto() is currently also used in the TreeSitterJava binding (here in the tests and in tree-sitter/tree-sitter-java#182; but there at least the TreeSitterJava class has a strong reference to the Arena.

@ObserverOfTime
Copy link
Member

For loading a Language cannot use Arena.ofConfined() because that will directly unload the library again and then seems to crash the JVM afterwards, at least on Windows.

The language pointer is copied into the library arena anyway, so the initial arena shouldn't affect it.

this.self = self.reinterpret(LIBRARY_ARENA, TreeSitter::ts_language_delete);

And SymbolLookup#libraryLookup does not consider java.library.path, see https://bugs.openjdk.org/browse/JDK-8311090 and my comment on tree-sitter/tree-sitter-java#182 (comment).

But SymbolLookup#loaderLookup does. I'll figure out how to deal with the libraryLookup failure.

@ObserverOfTime
Copy link
Member

The issues should be fixed in v0.23.1.

@Marcono1234
Copy link
Contributor Author

The issues should be fixed in v0.23.1.

Thanks!

This only applies to the Java parser bindings though, not to the jextract generated code, right? That will not respect java.library.path for libtree-sitter.so / tree-sitter.dll. Should the package documentation mention that?

Library loading for jextract can be configured, but it seems there is no mode where it uses SymbolLookup#libraryLookup and otherwise falls back to SymbolLookup.loaderLookup() (but I might be wrong).


Regarding the Arena.ofConfined() usage: I tested it now in a Linux Docker container and it is crashing there as well with a SIGSEGV. To be safe, could you please try as well?
My code looked like this:

Language language;
try (Arena arena = Arena.ofConfined()) {
    SymbolLookup symbols = SymbolLookup.libraryLookup(Path.of("libtree-sitter-java.so"), arena);
    language = Language.load(symbols, "tree_sitter_java");
}
try (var parser = new Parser(language)) {
    try (var tree = parser.parse(source).get()) {
        var rootNode = tree.getRootNode();
    }
}

The language pointer is copied into the library arena anyway, so the initial arena shouldn't affect it.

Do you mean this code in the Language constructor?

public Language(MemorySegment self) throws IllegalArgumentException {
this.self = self.reinterpret(LIBRARY_ARENA, TreeSitter::ts_language_delete);

The documentation of reinterpret says (emphasis mine):

Returns a new memory segment with the same address and size as this segment, but with the provided scope.

So maybe this is not actually performing a copy, and therefore after the library was unloaded still refers to that (now unloaded) memory area?
(maybe applies to the other reinterpret calls in this library as well then)

@ObserverOfTime
Copy link
Member

Why can't Java ever do anything right…

@Marcono1234
Copy link
Contributor Author

Marcono1234 commented Sep 15, 2024

Thanks for adjusting the documentation!

Are some the reinterpret calls in jtreesitter then still needed when they don't actually behave as desired (not only for Language but also in the other parts). As seen here when the original Arena is closed the java.lang.foreign API is unable to detect this and can crash the JVM.

Maybe also the TreeSitterJava class under src/test/java should use Arena.global() as well then?

If I understand it correctly, the main problem here is that you cannot perform a 'real' copy because the actual size, at least for the Language, is unknown; it is Long.MAX_VALUE.

But for the other structs I think the actual size is known (through the jextract generated code), so you could implement copying there? (not sure how well this would work for pointers though)


Why can't Java ever do anything right…

Yes some parts of java.lang.foreign still seem cumbersome or error-prone. Though if you have specific questions or suggestions, might be worth mentioning them on the jextract-dev or panama-dev mailing list.

@ObserverOfTime
Copy link
Member

Are some the reinterpret calls in jtreesitter then still needed when they don't actually behave as desired (not only for Language but also in the other parts).

They are needed for the cleanup functions, not for actually copying the data.

Maybe also the TreeSitterJava class under src/test/java should use Arena.global() as well then?

The Arena used there will only be closed when the singleton instance gets garbage collected.

@Marcono1234 Marcono1234 deleted the javadoc-fixes branch September 15, 2024 14:47
@Marcono1234
Copy link
Contributor Author

They are needed for the cleanup functions, not for actually copying the data.

Ah ok, thanks for the explanation.

But for Language it currently uses reinterpret(LIBRARY_ARENA, ...) which therefore likely never actually invokes the cleanup since LIBRARY_ARENA is probably never closed, and it seems ts_language_delete is no-op at the moment for non-WASM.

So would it make sense to remove the reinterpret call in the Language constructor to prevent JVM crashes in case the user (or the garbage collector) accidentally unloads the library while it is still in use? Then (hopefully) java.lang.foreign throws an exception instead of crashing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants