Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce cli for invoking spotless from shell without any build tool #2419

Draft
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

simschla
Copy link
Contributor

@simschla simschla commented Feb 7, 2025

Motivation

Sometimes, I would love to be able to quickly format some files from my command line, like I'm used to when working on bigger projects, where we've introduced spotless-gradle or spotless-maven. There was no easy way to get the modularity or functionality of spotless without introducing a build for applying the plugin.

So I've asked myself: What if there were a command line interface, a binary that runs superfast and can simply be invoked from the shell without any further setup and even for single files?

This is my proposal to start building such a spotless cli tool - I'm very interested in what @nedtwigg (and of course other contributors and users) think about this approach.

Current State

The current code is still a WIP and has some (very 😉) rough edges that we'd need to smooth out before any of this could be merged, but I think the existing code shows that this could indeed work - and I love it already from using it in my local tests.

The general Idea is to build a cli, that can be used like a "pipeline" (similar to how current spotless-build-plugins are working).

E.g. I'd like to be able to invoke it somehow like this

spotless --generic-params-like-target step-1-eg-like-gjf --params-for-step1 step-2-eg-copyright --params-for-step-2 ...

The current WIP provides the ability to set target-globs for selecting files, allows google-java-format, prettier and copyright/license-header. Adding further formatters is most probably a notable amount of (diligent) work, but I think the current code shows that we can get there.

Example Usage Output

Current API looks like this (showing --help usage output):

  1. General Usage:
    SCR-20250207-sxdk

  2. License Header:
    SCR-20250207-sxgw

  3. Google Java Format
    SCR-20250207-sxkz

  4. Prettier
    SCR-20250207-sxoh

@jbduncan
Copy link
Member

jbduncan commented Feb 7, 2025

Wow, I love the idea! I've been dreaming of something like this on and off over the last few years - I imagine it would be useful for Bazel and make integration - and there's a 5-year-old issue on this very idea: #527.

I'll have to find the time to review this in the foreseeable future. Stay tuned.

@jbduncan
Copy link
Member

jbduncan commented Feb 7, 2025

Turns out, I've been dreaming of this for longer than I realised; all the way back since 2017! #76

If this becomes a reality, it could be the foundation for a lightweight formatting tool in my non-JVM-based projects. 😁

@nedtwigg
Copy link
Member

This looks great! I think you have built something useful that should get published. I'm think it makes sense for diffplug/spotless to point to it as the official CLI. I'm not sure if it should live in this repo or on its own

Reasons to keep it in this repo:

  • so that it can interact and sync with the gradle or maven plugins somehow
  • so that people making changes to the steps have to deal with accidentally breaking the cli

Reasons to keep it separate

  • to make sure it stays isolated from the gradle and maven plugins
  • so that people making changes to steps only have 2 downstream projects to worry about instead of 3

I am open to being overruled and putting it into this repo directly, but my preference is for it to live on its own, at least for now. I'm happy to make diffplug/spotless-cli if you want and set it up with the same CI and maven group and all that. Or I'm happy to point to simschla/spotless-cli and you can publish it however you see fit.

@jbduncan
Copy link
Member

This looks great! I think you have built something useful that should get published. I'm think it makes sense for diffplug/spotless to point to it as the official CLI. I'm not sure if it should live in this repo or on its own

Reasons to keep it in this repo:

  • so that it can interact and sync with the gradle or maven plugins somehow

  • so that people making changes to the steps have to deal with accidentally breaking the cli

Reasons to keep it separate

  • to make sure it stays isolated from the gradle and maven plugins

  • so that people making changes to steps only have 2 downstream projects to worry about instead of 3

I am open to being overruled and putting it into this repo directly, but my preference is for it to live on its own, at least for now. I'm happy to make diffplug/spotless-cli if you want and set it up with the same CI and maven group and all that. Or I'm happy to point to simschla/spotless-cli and you can publish it however you see fit.

I'm not too fussed if it lives here or in a spotless-cli repo either. My preference would be a diffplug repo so it's more official, but I'm happy to leave it to @simschla to decide.

I was about to query if having it in a separate repo was even possible, until I remembered that we publish spotless-lib to Maven Central. 😅

@simschla
Copy link
Contributor Author

Thank you, @jbduncan and @nedtwigg for the positive feedback, glad to hear you like the idea too 😍

To be honest, I'd also thought about where and how to integrate the cli with current spotless build plugins - a lot of (build and release) stuff is similar, but not quite the same - starting but not limited to the different JVM (Graal for native compilation) and release preparation and distribution (building native for various platforms, maybe delivering via homebrew and friends).

So moving the cli to its own repo makes great sense to me. Even though this might complicate things in the beginning (e.g. when we need to adapt spotless-lib code so it can also be used with the cli).

If you are ok with that, I'd love to host the spotless-cli project under the diffplug umbrella. It is the home of spotless after all and if you are willing to welcome this addition to the family, that would be spectacular! 🎉

Nevertheless: for initial discussions, it might make sense to keep the PR here in draft state open, so we may discuss the poc code easily, if that is OK with you.

@nedtwigg
Copy link
Member

Sounds good, happy to keep this open. New repo is open at https://github.com/diffplug/spotless-cli, the same Spotless team that has write access here has write access there.

@simschla simschla force-pushed the feature/introduce-cli-for-shellinvocation-poc branch from 6a94d57 to c086333 Compare February 12, 2025 20:05
@@ -37,6 +37,13 @@
* catch changes in a SNAPSHOT version.
*/
public final class JarState implements Serializable {

private static ClassLoader OVERRIDE_CLASS_LOADER = null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nedtwigg / @jbduncan This here is one of the spots, where the cli needs to change things in order to be able to compile to a nativeImage -> loading stuff dynamically via JarState (e.g. choosable google-java-format) is not possible. Things need to be pinned down at build time. So here I took a nasty shortcut to just override the class loader in JarState - that is definitively not the way to go.

To actually allow the cli to change the strategy for retrieving a class loader (which contains classes that have been burned into the nativeImage) here, maybe we could use Java's ServiceLoader approach? That way maven/gradle plugin can keep using the flexible and dynamic approach, while cli could provide an implementation, that returns a fixed ClassLoader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have a static variable that the CLI sets, rather than ServiceLoader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify: something along the lines of the current solution is fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, what you have is great!

// server instances.
// I'm not sure if this is intended/expected or if it is a bug, will have to check with the team.
// For now, I will cache the formatter function based on the state, so that it is only initialized once.
private static final ConcurrentHashMap<String, FormatterFunc> CACHED_FORMATTERS = new ConcurrentHashMap<>();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nedtwigg / @jbduncan This here is a second piece of code that needs some thinking.

I've tried to design the cli in a way, that it can make use of parallel processing, allowing to spread formatting over a threadpool, so that multiple files can be processed in parallel.

Here I ran into a problem: When creating a FormatterStep using createLazy using multiple threads, I thought that the actual FormatterStep would be created/initialized only once and that instance would then be reused across threads - but that was not the case. Actually, the FormatterStep has been initialized multiple times (in parallel) which lead to multiple node-processes running a npm install-call on the same directory at the same time - which obviously was doomed to fail 😉

This right here is a hack to make sure each FormatterFunc is only created once per State, preventing multiple npm installs on the same folder. I'm not quite sure, but I think this might actually be a bug in the spotless-lib as of now. Or maybe the cli is the first client of the lib that instantiates the formatters in this way? Maybe one of you could shed some light? 🔦

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single formatter should only be used on a single thread. We integrate with a ton of third party libraries, who knows what kinds of machinery they might be using - ThreadLocal, all kinds of stuff.

If you want to use 8 threads, you will need to instantiate a given step 8 times, one on each thread, and only use that step on that thread.

I would advise starting with single-threaded, and make it multi-thread later after there is a performance problem. I would expect that start-up time is going to be the biggest issue for this CLI, and multiple threads are going to hurt that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the number of threads is basically dependent on the number of files to be formatted. If there are many files to be formatted, then it is worth paying the price for setting up several threads (and - as I understand now - per-thread copies of the formatters), as the overhead for the setup is compensated by the faster processing.
Local tests with the CLI show that the speed gain is significant when formatting one of my larger projects, especially since there is no incremental formatting or caching in the CLI.

So I think I will try keeping the parallel approach in general. Later, we could introduce either a configuration option or a heuristic that enables parallelism only if the number of files to be formatted is above a threshold.

Regarding start-up time: I'm rather impressed how fast the CLI starts in the binary form.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local tests with the CLI show that the speed gain is significant when formatting one of my larger projects,

Performance in a real-world use-case is the most important thing, so if it's worth it for you then I'm all for it.

@simschla simschla closed this Feb 20, 2025
@simschla simschla deleted the feature/introduce-cli-for-shellinvocation-poc branch February 20, 2025 19:57
@jbduncan
Copy link
Member

@simschla Sorry for the radio silence! I was planning to review things this weekend, but I see you've just closed the PR, so will you be reopening things on spotless-cli or have you lost interest or something? :)

@simschla
Copy link
Contributor Author

simschla commented Feb 20, 2025

@jbduncan ah sorry, deleting it was a mistake (too much enthusiasm when housekeeping my fork). I'm working on porting the code over to spotless-cli atm. Will report back as soon as I have something over there 😄

@simschla simschla restored the feature/introduce-cli-for-shellinvocation-poc branch February 20, 2025 20:15
@simschla simschla reopened this Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants