Metadata | |
---|---|
cEP | 22 |
Version | 1.0 |
Title | Quickstart Green Mode |
Authors | Ishan Srivastava mailto:ishan.srt@gmail.com |
Status | Implementation Due |
Type | Feature |
This document describes what is the Green Mode for coala-quickstart
and
how will this be implemented.
This cEP describes new type of config files which will be generated by
coala-quickstart
, how will they be different from the ones currently being
created, how is this helpful and the action-plan for adding this feature.
- Currently
coala-quickstart
generates config files which point out too many errors in the code-base of the project. - It asks too many questions in the "interactive mode" while generating the config file.
- The config files generated in "non-interactive mode" are not that specific to the project i.e. many optional bear settings are skipped and even many bears are skipped due to non-availability of "guessing" features.
- The errors pointed out by coala using generated config files are generally undesirable for the project owners due to many organizations following their own custom code styles rather than following pre-defined standards implemented by most linters and bears.
- Preparing "green" configuration files which agree to the project's code style and adapt to their standards, with least user interaction, producing absolutely zero errors in the code base.
- The configuration file should be as specific as possible i.e. can point out maximum errors with least amount of manual edits in the future, for future inconsistencies in commits.
- Storing metadata in the new types of files and classes proposed to enhance the config file produced for future runs.
- Maintaining a GitHub repo for storing metadata of combinations of bear setting values already in use by existing organisations and using them as initial checks before resorting to Brute Force.
- Automating the task of Adding
.coafile
and CI to enforce it (example) to the level of generating config files so it is easier for newcomers to get involved to create easily mergeable Pull Request in other communities, which leads to easier adoption of coala.
This project will be implemented in five phases:
- Classification of settings and Metadata Generation
- The General Operations
- Acting upon the newly available metadata
- The Brute Force way and further optimization on the Brute Force
- Learning from Real World Projects
We categorize the settings into 4 different types:
- The ones that attain a bool value.(From here onwards referred to as Type 1 or Type bool settings)
- The ones that can possibly attain an infinite number of values.(From here onwards referred to as Type 2 or Type infinite settings)
- The ones that can attain some fixed discrete set of values.(From here onwards referred to as Type 3 or Type discrete settings)
- The settings which require an additional config file to be parsed
by
coala-quickstart
.(From here onwards referred to as Type 4 or Type config settings)
Since these settings can obtain a bool value, either True or False, the Brute Force or the lite-mode will run coala over and over again until it is able to find a "green" setting.
- In case of inability to find a "green" value for a setting the
.quickfile
will store the setting along with a weight in section[severe]
. The weight will determine how many inconsistencies were found in the project on assuming that particular value of the setting. - On successfully finding a combination of setting values as "green", a
QUICKINFO.yaml
file will be generated storing the combination of setting values along with the project Github links.
SpaceConsistencyBear
:
@deprecate_settings(indent_size='tab_width')
def run(self,
filename,
file,
use_spaces: bool,
allow_trailing_whitespace: bool = False,
indent_size: int = SpacingHelper.DEFAULT_TAB_WIDTH,
enforce_newline_at_EOF: bool = True,
):
Here use_spaces
, allow_trailing_whitespace
,
pydocstyle_add_ignore
and enforce_newline_at_EOF
are Type 1 settings.
Since these settings can obtain any integer or other such value types, these
settings will be
guessed with the help of the QuickstartBear
.
This proposed bear does not generate any patches for the file_dict
provided.
It parses the file_dict
and generates QUICKDATA.yaml
of each and every
particular value (except the ignored lines)
each setting attains on every line of the project. Statistical
data about the average, median and standard deviation of the values is also
calculated and stored in QUICKDATA.yaml
for further
analysis.
Most of these settings deal with the maximum limit of these values and a "green" value for the setting can be decided in an instant .
- Suspected errors and a more suitable value of the setting will be guessed
by analysing the probability distribution functions of the setting values
and stored in
.quickfile
under the section[amenable]
.
PEP8Bear
:
@deprecate_settings(indent_size='tab_width')
def run(self, filename, file,
max_line_length: int = 79,
indent_size: int = SpacingHelper.DEFAULT_TAB_WIDTH,
pep_ignore: typed_list(str) = (),
pep_select: typed_list(str) = (),
local_pep8_config: bool = False,
):
Here max_line_length
is a Type 2 setting.
These settings will be guessed by using the metadata from applying Type Annotations to the setting values. (possibly using the contracts package) The Brute Force or the lite-mode will run coala over and over again until it gets a "green" value for the setting.
- In case of inability to find a "green" value for a setting the
.quickfile
will store the setting along with a weight in section[severe]
. The weight will determine how many inconsistencies were found in the project on assuming that particular value of the setting.
PHPMessDetectorBear
:
@staticmethod
def create_arguments(filename, file, config_file,
phpmd_rulesets: typed_list(str)):
"""
:param phpmd_rulesets:
A list of rulesets to use for analysis.
Available rulesets: cleancode, codesize, controversial, design,
naming, unusedcode.
"""
Here phpmd_rulesets
is a Type 3 setting.
These settings will be guessed by parsing the config files for the specific
linters, collecting metadata and then applying the appropriate value to the
setting exactly as already done by coala-quickstart
for .editorconfig
,
.gemfile
, Gruntfile
and Package.json
files. The bear settings already
detected using this method will be appended to the Type 4 category.
PyDocStyleBear
:
def create_arguments(self, filename, file, config_file,
pydocstyle_select: typed_list(str) = (),
pydocstyle_ignore: typed_list(str) = (),
pydocstyle_add_ignore: typed_list(str) = (),
pydocstyle_add_select: typed_list(str) = (),
):
Here pydocstyle_select
, pydocstyle_ignore
,
pydocstyle_add_ignore
and pydocstyle_add_select
are Type 4 settings.
The separation of class of settings, to which type they belong will be done by
using the instances of bears and checking their default values using the
inspect
module at the meta class bearclass
. The operations performed
at the bearclass
will include populating classes (referred to as
Quickclass
class from here onwards) which will store metadata
about the type of settings. Manual checks will be needed for determining which
settings are Type 3 settings.
Many bear settings of different bears achieve the same function. We will
either pick randomly one of the settings or bias towards a particular
setting if we see certain issues arising for eg. PEP8Bear
failing
to do line length checks in some cases is a known issue.
The settings not required will
be removed from Quickclass
.
Manual sorting needs to be performed to
identify such settings.
So we are applying restrictions on bear settings using Type Annotations while distinction of bear settings is done at the meta-class.
These consist of developing classes and method which will be used both by the Brute Force and when we improve upon it.
- coala will be ran over and over again on the given file dict, using
Quickclass
for correct detection of type of setting and then guessing its value the appropriate way as described above. - Undetermined values for Type 1, Type 2 and Type 3 settings go into the
.quickfile
while successful Type 1 combinations of setting values go intoQUICKINFO.yaml
. The data generated by QuickstartBear goes intoQUICKDATA.yaml
- "Green" values for settings are added to
.coafile
The metadata has been generated in the following files:
and the following classes:
Will have 3 sections as created in .coafile
- [severe] This will contain the Type 1 and Type 3 dropped settings along with their values and weights for each value.
- [amenable] This will contain the Type 2 dropped settings along with their weights and values.
- [permanent]
This will contain values to settings that are obtained when the user answers
the questions prepared by
coala-quickstart
. Special string may be used to represent confirmed settings that need to be dropped.
coala-quickstart
will always ask at the end of the run
whether the user is interested in answering
some questions which will lead to more secure creation of config files. If the
user choses to answer them, coala-quickstart
will provide the user with a
bunch of questions from the sections [severe] an [amenable] asking
them whether an inconsistency detected is a mistake in the code-base or there
are no style rules followed by the project regarding the particular setting.
The answered bear-settings
will be moved to the [permanent] section.
These set of questions can be invoked directly without a --green-mode
run
or not provided at all by providing special tags.
(Check out the tags section for details)
The user may also be asked to provide a SEVERITY
value which will be mapped
to the weights of values store in the .quickfile
. This mapping will be
decided after testing the --green-mode
on some projects. All inconsistencies
with lesser SEVERITY
value than the one entered will be displayed.
In case the user wants to check exactly what lines produce the inconsistencies, coala may be ran again with just that bear and setting value to show the user the erroneous lines in the code-base.
All the necessary information regarding the SEVERITY
value will be provided by
the initial question.
Will contain the data generated by QuickstartBear as described earlier used to guess Type 2 settings
Will contain the accepted combinations of values of bear settings along with the project url. These files will be uploaded as Pull Requests to a repository created specifically to store this data.
Encryption will be applied so that any user cannot mess with the data generated
by coala-quickstart
, thus making Pull Request for junk data
and will be available to view only when it has been
merged to the repository.
Tools like gitmate-plugins to accept valid Pull Requests automatically and even uploading data directly to the repository may be created as a Stretch Goal.
The Brute Force will always check for these combinations of settings values before resorting to checking all possible combinations.
Created dynamically by the metaclass bearclass
and groups the settings into
Types 1 to 4.
The Brute Force will be performing the General Operations to generate
the first and formost complete "Green" config file. coala will be ran
over and over again for the entire project with all
instances of bears with all the combinations of settings running in
parallel, as soon as instances of a particular setting is done and we get
a value for the setting for which no errors were generated, it goes directly
into the .coafile
, if no value for a setting matches, it goes into the
metadata and we drop the optional settings or the bears (which will have
to be dropped in case of inability to find a "green" value for a
necessary setting of the bear)
In order to launch bears in such a fashion, some modified methods of
Processing.py
will have to be called instead depending upon whether
the function call stack includes methods from coala-quickstart
.
Premature optimization should not be performed on anything although some of the limitations of the brute force method are clear even at the beginning. We try to rush through the brute force as quick as possible and in the mean time keep on thinking of enhancing this in ways to detect more specific bear settings and reducing the total run time on itself.
From what is evident, the brute-force is going to take a huge amount of time
to run especially on large projects. For this the coala-quickstart
will
have a lite-mode
. We assume that the lowest level in which there can be
variations will be file types as a starting point, we try to guess the
settings not from the entire project, but for each file type, we grab a set
of files at random and run coala again and again looking for "green" values
to settings.
The --lite-mode
will drop settings and bears if it is unable to find a
"green" setting while appending the weights at the same time.
lite-mode
should build upon the data from the previous runs to generate
less error prone .coafile
(less error prone over here indicates not likely
to generate errors in the project), so that running lite-mode
a number of
times is still faster than Brute Force. We choose this method as it is highly
unlikely that different files among the same file types have a different
set of codestyle rules in a project.
If a .quickfile
is present in the directory, the successive
runs will try to correct the configuration files (always acting upon
.coafile.new
over .coafile
as a preference) assuming we intend to find
a config file which is even more specific than the one already present. Absense
of config files will build the config files from scratch.
- Successive runs of
--lite-mode
will check for all settings and will again choose files at random. If it finds a conflicting setting with the config file, it will be appended to the.quickfile
. If a setting is already present in the.quickfile
, weights of the other values to bear settings will be appended. - Successive runs of Brute Force will assume that it is being run
on the results after a few
--lite-mode
runs and will only run for the settings provided in the.quickfile
appending to the weights.
It is clear that successive runs of the --lite-mode
will overestimate
the weights in the .quickfile
, so it seems advantageous to run --lite-mode
several times and then run the Brute Force for the settings dropped by
--lite-mode
which are now in .quickfile
to generate correct value of the
weight. This combination of --lite-mode
and Brute Force is done
collectively by --smart-mode
.
Every run will have the annoying question in the end whether the user wants to answer a few questions about the project to get better results unless changed upon by the provided tags. (For more details check out the tags section)
There can be endless number of possibilities and assumptions:
-
Files having similar kind of names can be having different codestyles.
-
Files in a given directory can be following different codestyles.
-
Different parts of files maybe even following a different codestyle depending on function name which can also be handled by placing
ignore
s at appropriate places in the file dict.Such kind of functionality can only be added to the bears themselves, but instead we stop our train of thought over here.
We save ourselves the additional work for building some feature that may hardly ever be used by any org, instead we stop assuming and start implementing and learning from our available resources.
We now choose a list of orgs for which we try to test our green-mode
ourselves. We check whether brute force is taking too much time or whether
the --lite-mode
or the --smart-mode
is producing the correct results
(i.e. green results). We
choose these orgs in such a way that they are very famous or very well known or
act as upstream repositories for a huge amount of orgs. This way a very large
number of orgs may be mimicking the code style of these orgs itself and if
our further optimizations can fix the problem of generating green config
files for these organizations, we are in turn solving the problem for these
other orgs/communities at the same time.
We learn manually from these orgs, what combinations of these bear-setting
values are they using, whether they are using it all over their project or only
in certain scenarios, whether they have different code styles in different
directories or different nomenclature of file patterns require a different
set of settings in their project. We give coala-quickstart
, the ability
to recognizing these scenarios for further
runs.
We can only look at a finite number of orgs within the coding period so the
last few weeks of the project should deal with writing docs and adding newcomer
tasks for either opening PRs in other orgs of the "green" config files or
feeding our repository with QUICKINFO.yaml
files.
--green-mode
or--green
or-gm
: Invokes the "Green mode" forcoala-quickstart
--lite-mode
or--lite
or-lm
: To be used along side--green-mode
accepts an integer value as parameter indicating number of times the lite-mode will be ran on the project. Will resort to a default value in case of absence of this parameter.--smart-mode
or--smart
or-sm
: To be used along side--green-mode
accepts an integer value as parameter indicating number of times the lite-mode will be ran on the project. Will resort to a default value in case of absence of this parameter.-d
: Don't ask the annoying question-a
: Just ask the annoying question. Takes parameterSEVERITY
value