This repository has been archived by the owner on Dec 16, 2018. It is now read-only.
forked from saltudelft/software-analytics-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path05-ecosystem-analytics.Rmd
420 lines (353 loc) · 38.8 KB
/
05-ecosystem-analytics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
# Ecosystem Analytics
## Motivation
In the modern day and age, it is popular to make use of external software or libraries in software
projects to use the functionality (for example parsing JSON) of these libraries, without having
to develop this functionality itself. For example, in 2016, the package manager npm recorded 18
billion package downloads in one month, according to @Linux2016. Moreover, multiple languages, such as Python
and Rust, provide package managers (pip[^1] and Cargo[^2] respectively) which can be used to
easily manage this third-party functionality, as well as distribute it.
In parallel to this, the popularity of creating open source projects is on the rise as well. At
the package repository npm only, 4,685 libraries were added in one week in 2016, according to @Linux2016. On
platforms such as GitHub[^3], it is easy and quick to create a new software project, which can be
developed, reviewed and used by the whole community. This development leads to more libraries
being developed and being available for public use.
As a result of these two developments, software projects are increasingly becoming interdependent
on each other. Inspecting the dependency relations between projects leads to a graph-like
structure of software projects. In this structure,the nodes are the projects and the edges
represent a dependency between two software projects. This structure is known as a _software
ecosystem_. Messerschmitt and Szyperski in @Messerschmitt2003 state that a _software ecosystem_ is "a
collection of software products that have some given degree of symbiotic relationships." Another,
similar definition is given by Lungu in @Lungu2009: "A software ecosystem is a collection of software
projects which are developed and co-evolve in the same environment." Mens et al. in @Mens2013 extend
this definition, "by explicitly considering the communities involved (e.g. user and developer
communities) as being part of the software ecosystem." Stallman in @Stallman2002 opposes the overall
notion of calling this structure a software ecosystem: "It is inadvisable to describe the free
software community, or any human community, as an ecosystem, because that word implies the absence
of ethical judgment."
Although Stallman in @Stallman2002 disagrees with the use of the term ecosystem, he does not
necessarily disagree with the definition of the term. The definition which will be used in
this chapter is the definition of Mens et al. in @Mens2013, since it captures the essence of the
other two definitions, while adding the notion of the human communities alongside as well.
By performing analysis on these software ecosystems, the aim is to generate meaningful insights.
These insights can then be used to improve the efficiency and effectivity of the software
development process, as well as to learn to identify and inform about potential problems. For
example, a warning could be displayed if a dependency has a security vulnerability.
The field of research on software ecosystems, _ecosystem analytics_, focuses on performing such
analysis. This chapter discovers what the current progress is in this field of research through a
literature survey. This discovery is not limited to the theoretical perspective, but will uncover
practical implications as well as the open challenges of the field. In order to describe each
covered aspect, we have formulated three research questions:
* **RQ1**: What is the current state of the art in software analytics for ecosystem analytics?
* **RQ2**: What are the practical implications of the state of the art?
* **RQ3**: What are the open challenges in ecosystem analytics, for which future research is
required?
Each of these research questions will be answered using recent papers written in this field of
research.
This chapter is structured as follows. First, the research protocol is described in detail. This
includes decisions on which papers are included in the review. After this, the research questions
are answered using the previously stated set of papers.
## Research Protocol
In order to review literature to answer the research questions given in the previous section, the
survey method suggested by Kitchenham is used, conform to @Kitchenham2004. This method creates a systematic
way to select a set of papers, which is relevant to the research question(s).
The search strategy, as described by Kitchenham @Kitchenham2004, are usually iterative and benefit
from consultations with experts in the field, among other things. Our search strategy can be split
in three different types:
* the initial seed, given by an expert in the field, MSc. Joseph Hejderup
* a search using a digital search engine, namely Google Scholar[^3]
* a selection of referenced papers within papers selected before in the above two searches
### Initial seed
MSc. Joseph Hejderup has provided us with a total of thirteen papers, as shown in Table 4.1.
As each of these papers come from an expert in the field, each paper is assumed to be relevant to
at least the field of software ecosystems. Because of this, each of these papers were judged on
their relevance to either of the research questions. In Table 4.1, this relevance judgment is
shown in the left column, since a paper is only selected, if the paper is indeed relevant. Table
4.2 describes the reason for which each particular paper is not selected for the literature
survey.
### Digital Search Engine
The second strategy type which is used to select relevant papers for this literature study, is by
a digital search engine. In this literature survey, Google Scholar[^3] is used. From the initial
seed, common keywords were retrieved and the following queries have been used to search for
relevant papers:
* "engineering software ecosystems" _(2014)_
* "software ecosystems" AND "empirical analysis" _(2018)_
* "software ecosystem" AND "empirical" _(2014)_
* "software ecosystem analytics" _(2014)_
* "software ecosystem" AND "analysis" _(2017)_
For each of these queries, the results were first filtered by the publish year. These are
described by the italic year after each query above. The papers that are filtered are published
earlier than the set publish year. These specific years were chosen since the survey focuses on
the state of the art within the ecosystem analytics. The difference in years per query is a result
from not finding relevant papers in more recent times (e.g. for "software ecosystem analytics",
the year is _2014_ because setting the filter to _2018_ or _2017_ did not result in relevant
papers).
After this filtering, we first determined whether a paper was relevant to the literature survey by
examining the title. If it was unclear whether the paper was indeed relevant by only looking at
the title, the abstract of the paper was examined closely. On these two criteria, each of the
selected papers were judged and ultimately selected. The selected paper using these method can be
found in Table 4.3.
### Referenced papers
The third method we use is by looking at the references found in the selected papers, using the
two methods above. A selection has been made from these references. For these papers, the
selection process is similar to that of the selected papers using the digital search engine; it is
selected when both the title and the abstract are deemed relevant to the research questions. This
has led to the papers in Table 4.4. being selected.
## Answers
In this section, an aggregation of information, found in the papers, is presented. Each subsection
of this section focuses on one of the three research questions posed in Section 1.
### RQ1: What is the current state of the art in software analytics for ecosystem analytics?
<!-- Topics that are being explored, research methods, tools and datasets being used -->
To answer this research question, we examine the explored topics in ecosystem analytics. Moreover,
we summarize which research methods, tools and datasets are being used to explore this topics.
#### Explored Topics
The main topics we explored in ecosystem analytics are the usage of trivial packages, (breaking)
changes in dependencies and their impacts, quality of dependencies and dependency networks.
Trivial packages are a recurring topic in the area of software analytics, notably incidents
like Left-pad, Schleuter @NPM2016, stress the possible impact trivial packages can have on a software ecosystem.
Research in this topic explores the usage of trivial packages both quantitatively and qualitatively
by analysing the usage of the trivial packages and the reasons why developers choose to use them
respectively.
Breaking changes is a major topic researched by some papers. Much like trivial packages research
is done both quantitatively and qualitatively in this topic. The impact of breaking changes and the
way in which developers react to these changes are considered to be the notable problems in
this topic.
Establishing a metric for the health of an ecosystem dependency is also a heavily explored topic.
Researchers are trying to find ways in which a metric can be used to establish the quality of
dependencies.
Dependency networks allow us to gain insight in the way in which ecosystems evolve over time.
#### Used Research Methods
The studied papers cover a plethora of research methods. These methods can be divided into two
categories: quantitative and qualitative.
Many quantitative research papers analyse the data in a statistical manner, using software
ecosystems as their data. The types of data depend heavily on the ecosystem used for analysis.
Some papers go as far as using the source code of packages @Abdalkareem2017. While other
research focusses on the meta-data of software ecosystems, such as dependency networks @Kikas2017.
Another recurring research method is survival analysis, as used by @Decan2018, which can be used
to estimate the survival rate of a population over time. In software engineering this has
been successfully applied to open source projects.
Some qualitative research has been done in software ecosystems to gain a better understanding
in to the behaviour of the interactions between developers and software ecosystems. Some
papers solely rely on the results of qualitative research whereas some papers use both quantitative
research and qualitative research to triangulate their findings.
#### Used Ecosystems
As has been said above, the most common explored topic within the body of studied papers are
topics relating to dependencies between different software projects. Therefore, it does make a lot
of sense that the most common ecosystems used are those of package managers, since these package
managers are central repositories from which the packages, as well as their dependencies, can
easily be retrieved.
The most common used ecosystem in the studied papers is _npm_[^5], as used by Abdalkareem et al.
@Abdalkareem2017, Bogart et al. @Bogart2016, Decan, Mens, and Claes @Decan2017, Kikas et al.
@Kikas2017, and Decan, Mens and Grosjean @Decan2018. There are a few reasons why this is the most
common used ecosystem throughout the papers. Firstly, npm is the largest software registry,
containing more than double of the next most populated package registry in 2016 @Linux2016.
Moreover, npm is the package manager of JavaScript, which is the
most used programming language according to a RedMonk survey @RedMonk2018. Kikas et al.
@Kikas2017 explain that it is beneficial to use this ecosystem, because the majority of their
packages are hosted on GitHub and Developers specify required packages in their project's
dependency files. There are more ecosystems which have the same properties, such as RubyGems
(Ruby) and Crates.io (Rust) @Kikas2017.
However, these are not the only used ecosystems. Decan, Mens, and Grosjean @Decan2018 use the
_libraries.io_ dataset for their research, which includes seven different packaging ecosystems:
Cargo (rust), CPAN (Perl), CRAN (R), npm (JavaScript), NuGet (.NET), Packagist (PHP) and RubyGems
(Ruby). Decan, Mens, and Grosjean @Decan2018 therefore study the most different ecosystems in one
paper, relative to the papers studied in this survey.
Apart from these datasets based on packages in package repositories, there are papers which use
other datasets and therefore ecosystems as well. Bavota et al. @Bavota2014 research the Apache
Community and therefore uses a dataset corresponding to the full Java subset of the Apache
ecosystem. Cox et al. @Cox2015 instead use the dataset of the Apache Maven Project to validate his
created metric. Claes et al. @Claes2015 research package incompatibilities and therefore uses a
ten year period of the Debian i386 testing and stable distributions. Robbes, Lungu, and
Röthlisberger @Robbes2012 opted for the Squeak/Pharo ecosystem, as they stated that this ecosystem
would provide support for answering their research questions.
#### Main research findings
Based on the findings of Abdalkareem et al. @Abdalkareem2017 trivial packages make up 16.8% of the
NPM packages. Even though 10% of NPM uses trivial packages only 45% of these trivial packages have
tests.
Robbes et al. @Robbes2012 mentions the large impact API changes can have on an ecosystem.
Bogart et al. @Bogart2016 studied the attitude of developers towards breaking changes in
dependencies. Their main findings were that an ecosystem plays an essential role in
the way we can deal with breaking changes. Both papers conclude that developers generally
do not respond in time to breaking changes and as a result breaking changes can have
a large impact on a software ecosystem. This conclusion is reinforced by the findings of
Decan et al. @Decan2018, where frequent changes can lead to an unstable dependency network
due to transitive dependencies.
Not only do developers not react in a timely fashion to breaking changes, Robbes et al.
@Robbes2012 also found that developers are also not quick to respond to API deprecation.
Bavota et al. @Bavota2014 suggests that updates should only be done when they consist of bug fixes,
not API changes, to combat this issue.
Attempts have also been made to find a metric that establishes the 'health' of a
dependency. Cox et al. @Cox2015 contributes to this by providing a metric
to establish the freshness of a dependency.
An interesting finding in the topic of package dependency networks by Kikas et al.
@Kikas2017 is that ecosystems, over time, become less dependent on a single popular package.
### RQ2: What are the practical implications of the state of the art?
<!-- Rather than theoretical, actual case studies done with findings -->
In this research question we aim to find out the practical implications of the state of the art as
discussed in the previous section. The papers discussed in this section are mostly case studies
and we will summarize their findings.
From most papers we find that developers are slow when updating their dependencies, or sometimes
they do not even do it at all. Hora et al. @Hora2016 suggest that a main reason for this is that
breaking changes cannot be solved in a uniform manner throughout the ecosystem, but rather need a
specific implementation for each system. We have also found that breaking changes are constantly
introduced when dependencies are updated. According to Raemaekers, van Deursen and Visser
@Raemaekers2017, about 33% of releases, either minor or major, contain a breaking change that
needs looking into. Breaking changes could pose compiling errors, thereby breaking the system that
depends on it.
Developers tend to react poorly to changes in their dependencies; Kula et al. @Kula2017 have found
that, of the 4600 surveyed projects, 81.5% of the projects contain outdated dependencies which
can lead to security risks. Not only do developers not update their dependencies, according to
an empirically study conducted by McDonnell, Ray and Kim @McDonnell2013 on the android API, they
also do not update their codebase with respect to the changes introduced by dependencies.
Regarding ecosystem health, Constantinou and Mens @Constantinou2017 have researched which factors
indicate that a developer is likely to abandon an ecosystem. Their study, which analysed
GitHub[^3] issues and commits, has found that developers are more likely to abandon a system when
they 1) do not communicate with their fellow developers, 2) do not participate often in social or
technical activities and 3) for an extended period of time do not react or commit any more. Another
interesting characteristic about ecosystem health, studied by Kula et al. @Kula2017-2, is the way
in which projects age over time. Their study found that the usage over time of 81.7% of 4,659
popular GitHub projects can be fitted on a function with an order higher than two.
Malloy and Power @Malloy2018 have studied the transition from Python 2 to Python 3. Python 3 has
been out since 2008, and the final Python 2 release was in 2010. Both are (almost) 10 years ago.
Even though, during their study they have found that most Python developers choose to maintain
compatibility with both Python 2 and Python 3 by only using a subset of the Python 3 language.
Malloy and Power @Malloy2018 state that they are severely limiting themselves in their language
capabilities, by not using the newly developed features.
Another interesting topic of research is the impact tools can have on ecosystems. Among these
tools are badges. Badges are annotations on software projects which display some information
about a software project. One of these badges can warn developers about outdated packages.
Based on the results of Trockman @Trockman2018, badges can have a positive impact on the speed at
which developers update their dependencies.
Overall we can conclude that there are a lot of improvements to be made. The current method that
most users use to manage their dependencies is lacking. Whether it be updating late or not
updating at all, there are many risks bound to this. Dietrich, Jezek, and Brada @Dietrich2014 have
also found that there are a lot of problems in the Java ecosystem, and has posed a set of
relatively minor changes to both development tools and the Java language itself that could be very
effective. These improvements are highlighted by answering the last research question.
### RQ3: What are the open challenges in ecosystem analytics, for which future research is required?
<!--List of challenges, research questions and an aggregated set of open research items -->
This research question gives insight in the current open challenges in the field of ecosystem
analytics. It focuses on the challenges described in the studied papers.
The most common open challenge across almost all papers is the generalization of results. Most of
the studied papers only use a single ecosystem on which they base their results. This in turn
means that it is unsure whether these results hold for other ecosystems as well in a similar
fashion. For example, Claes et al. @Claes2015 state that a possibility for future work is to
investigate to what extent the findings can be reused in the framework of other package-based
software distributions.
However, even if multiple ecosystems are researched within one paper, it shows to still not be
enough to provide a generalization for each ecosystem. Decan, Mens, and Grosjean @Decan2018 state,
after researching dependency network evolution for seven ecosystems, that they do not make any
claims that their results can be generalized beyond the main package managers for specific
languages. This is because Decan, Mens, and Grosjean @Decan2018 do not expect similar results for
networks such as WordPress, as these packages tend to be more high-level (e.g. used by end users
instead of reused by other packages). This is shown as well in the different results obtained by
Bogart et al. @Bogart2016, which shows that values differ per ecosystem. This overall shows that
there is a lot of space for future research to be done in generalizing research beyond the already
researched ecosystems.
Another persistent open challenge is the ability to determine the health of an ecosystem. Although
Jansen @Jansen2014 has provided OSEHO, "a framework that is used to establish the health of an
open source ecosystem", Jansen @Jansen2014 notes that "there is surprisingly little literature
available about open source ecosystem health". Kikas et al. @Kikas2017 agree, stating that a
general goal is to provide analytics to maintainers about the overall ecosystem trends.
This challenge is related to determining a systems health. Kikas et al. @Kikas2017 state that "a
measure quantifying dependency health in an ecosystem should be developed". Moreover, according to
Jansen @Jansen2014, determining the health of a system from an ecosystem perspective is required
to determine which systems to use.
Because of this, this challenge ties into the assistance while selecting packages as well. This
challenge is about selecting the best dependency, according to the functionality needs of the
existing application. Abdalkareem et al. @Abdalkareem2017 state that helping developers find the
best packages suiting their needs need to be addressed. Kikas et al. @Kikas2017 agree that another
general goal is to provide maintainers with tools to manage their dependencies.
However, whenever these dependencies are chosen, another open challenge is to assist maintainers
to keep these dependencies up to date. In order to find out when dependencies have to be updated,
metrics have to be defined. Bavota et al. @Bavota2014 state that their observations could be a
starting point to build recommenders for supporting developers in complex dependency upgrade
activity. Cox et al. @Cox2015 provide "a metric to aid stakeholders in deciding on whether the
dependencies of a system should be updated". However, Cox et al. @Cox2015 state multiple
refinements on this metric which could still be researched.
## Appendix
| Selected | Author(s) | Title | Year | Keywords |
|---|-----------------------|-----------------------------------------------------------------------------------------------|------|-------------------------------------------------------------------------------------------------------------|
| - | Abate, Di Cosmo, Boender, Zacchiroli @Abate2009 | Strong dependencies between software components | 2009 | |
| - | Abate, Di Cosmo @Abate2011 | Predicting upgrade failures using dependency analysis | 2011 | |
| + | Abdalkareem, Nourry, Wehaibi, Mujahid, Shihab @Abdalkareem2017 | Why do developers use trivial packages? An empirical case study on NPM | 2017 | JavaScript; Node.js; Code Reuse; Empirical Studies |
| + | Bogart, Kästner, Herbsleb, Thung @Bogart2016 | How to break an api: Cost negotiation and community values in three software ecosystem | 2016 | Software ecosystems; Dependency management; semantic versioning; Collaboration; Qualitative research |
| + | Claes, Mens, DI Cosmo, Vouillon @Claes2015 | A historical analysis of Debian package incompatibilities | 2015 | debian, conflict, empirical, analysis, software, evolution, distribution, package, dependency, maintenance |
| + | Constantinou, Mens @Constantinou2017 | An empirical comparison of developer retention in the RubyGems and NPM software ecosystems | 2017 | Software ecosystem, Socio-technical interaction, Software evolution, Empirical analysis, Survival analysis |
| + | Hejderup, van Deursen, Gousios @Hejderup2018 | Software Ecosystem Call Graph for Dependency Management | 2018 | |
| + | Kikas, Gousios, Dumas, Pfahl @Kikas2017 | Structure and evolution of package dependency networks | 2017 | |
| + | Kula, German, Ouni, Ishio, Inoue @Kula2017 | Do developers update their library dependencies? | 2017 | Software reuse, Software maintenance, Security vulnerabilities |
| - | Mens, Claes, Grosjean, Serebrenik @Mens2013 | Studying Evolving Software Ecosystems based on Ecological Models | 2013 | Coral Reef, Natural Ecosystem, Open Source Software, Ecological Model, Software Project |
| + | Raemaekers, van Deursen, Visser @Raemaekers2017 | Semantic versioning and impact of breaking changes in the Maven repository | 2017 | Semantic versioning, Breaking changes, Software libraries |
| + | Robbes, Lungu, Röthlisberger @Robbes2012 | How do developers react to API deprecation? The case of a smalltalk ecosystem | 2012 | Ecosystems, Mining Software Repositories, Empirical Studies |
| + | Trockman @Trockman2018 | Adding sparkle to social coding: An empirical study of repository badges in the npm ecosystem | 2018 | |
_Table 4.1: Papers provided by MSc. Joseph Hejderup. The first column describes whether the paper
of the row will be used. A '+' means it will be used, a '-' means it will not._
| Paper Reference | Reason not selected |
|---|---|
| @Abate2009 | This paper seems to delve more into one software project itself whereas we are more interested in the relationship between different software projects |
| @Abate2011 | Similarly to @Abate2009, we are more interested in the relationship between different software projects |
| @Mens2013 | We were in doubt over this one, it could be useful but we weren't convinced that it was. Since we already had a lot of material we decided to not use this |
_Table 4.2: Papers from the initial seed that were not selected for the literature survey, along
with a specification of the reason why this is the case._
| Author(s) | Title | Year | Keywords | Query Used |
|-------------------|------------------------------------------------------------------------------------------------------------------|------|-------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|
| Decan, Mens, Grosjean @Decan2018 | An empirical comparison of dependency network evolution in seven software packaging ecosystems | 2018 | Software repository mining, Software ecosystem, Package manager, Dependency network, Software evolution | "software ecosystems" AND "empirical analysis" |
| @Dittrich2014 | Software engineering beyond the project – Sustaining software ecosystems | 2014 | | engineering software ecosystems |
| Hora, Robbes, Valente, Anquetil, Etien, Ducasse @Hora2016 | How do developers react to API evolution? A large-scale empirical study | 2016 | API evolution, API deprecation, Software ecosystem, Empirical study | "software ecosystem" AND "empirical" |
| Izquierdo, Gonzalez-Barahona, Kurth, Robles @Izquierdo2018 | Software Development Analytics for Xen: Why and How | 2018 | Companies, Ecosystems, Software, Measurement, Object recognition, Monitoring, Virtualization | software ecosystem analytics |
| Jansen @Jansen2014 | Measuring the Health of Open Source Software Ecosystems: Beyond the Scope of Project Health | 2014 | | "open source software ecosystems" |
| Kula, German, Ishio, Inoue @Kula2017-2 | An exploratory study on library aging by monitoring client usage in a software ecosystem | 2017 | | "software ecosystem" AND "analysis" |
| Malloy, Power @Malloy2018 | An empirical analysis of the transition from Python 2 to Python 3 | 2018 | Python programming, Programming language evolution, Compliance | “software ecosystem” AND “empirical" |
| Manikas @Manikas2016 | Revisiting software ecosystems Research: A longitudinal literature study | 2016 | Software ecosystems; Longitudinal literature study; Software ecosystem maturity | "Software ecosystems" OR "Dependency management" OR "semantic version" |
| Rajlich @Rajlich2014 | Software evolution and maintenance | 2014 | | Software Evolution and Maintenance |
| Teixeira, Robles, Gonzalez-Barahona @Teixeira2015 | Lessons learned from applying social network analysis on an industrial Free/Libre/Open Source Software Ecosystem | 2015 | Social network analysis Open source Open-coopetition Software ecosystems Business models Homophily Cloud computing OpenStack | "software ecosystem analytics" |
_Table 4.3: Papers selected from searches using Google Scholar. The column "Query Used" describes
which of the queries is used to retrieve the paper._
| Author(s) | Title | Year | Keywords | Referenced In |
|-----------------------------------------|----------------------------------------------------------------------------------------------------------|------|----------------------------------------------------------------------------------|-------------------------------------------------------|
| Bavota, Canfora, Di Penta, Oliveto, Panichella @Bavota2014 | How the Apache community upgrades dependencies: an evolutionary study | 2014 | Software Ecosystems · Project dependency upgrades · Mining software repositories | @Kula2017 |
| Blincoe, Harrison, Damian @Blincoe2015 | Ecosystems in GitHub and a method for ecosystem identification using reference coupling. | 2015 | Referene Coupling, Ecosystems, Technical Dependencies, GitHub, cross-reference | @Constantinou2017 |
| Cox, Bouwers, Eekelen, Visser @Cox2015 | Measuring Dependency Freshness in Software Systems | 2015 | software metrics, software maintenance | @Kikas2017 |
| Decan, Mens, Claes @Decan2017 | An empirical comparison of dependency issues in OSS packaging ecosystems | 2017 | | @Abdalkareem2017, @Constantinou2017, @Decan2018 |
| Dietrich, Jezek, Brada @Dietrich2014 | Broken Promises - An Empirical Study into Evolution Problems in Java Programs Caused by Library Upgrades | 2014 | | @Raemaekers2017 |
| Malloy, Power @Malloy2017 | Quantifying the transition from Python 2 to 3: an empirical study of Python applications. | 2017 | Python, programming language evolution, language features | @Malloy2018 |
| McDonnell, Ray, Kim @McDonnell2013 | An empirical study of api stability and adoption in the android ecosystem | 2013 | | @Manikas2016
_Table 4.4: Papers selected which are referenced in previously selected papers. The column
"Referenced In" describes in which selected paper the paper is referenced._
| Reference | Explored topic(s) | Research method(s) | Tool(s) | Dataset(s) | Ecosystem(s) | Conclusion |
| --------- | ----------------- | ------------------ | ------- | ---------- | ------------ | ---------- |
| @Abdalkareem2017 | Use of trivial packages | Quantitative, Qualitative (Statistics over data, survey)| - | npm, GitHub | npm | Used because it is assumed to be well implemented and tested (only 45% actually has tests) and increases productivity. Quantitative research has shown that 10% of NodeJS uses trivial packages, where 16.8% are trivial packages in npm |
| @Bavota2014 | Upgrading of dependencies | Quantitative, Qualitative (Statistics, Looking through mailing lists) | - | Apache | Apache | Upgrade done when bugfixes, but no API changes |
| @Bogart2016 | Attitude towards breaking changes for different ecosystems | Qualitative (Interviews) | - | - | Eclipse, CRAM, npm | There are numerous ways of dealing with breaking changes and ecosystems play an essential role in the chosen way. |
| @Claes2015 | Debian Package Incompatibilities | Quantitative | - | Debian i386 testing / stable | Debian | - |
| @Cox2015 | Metric for dependency freshness of a system | Qualitative / Quantitative (Statistics, Interviews)| - | Maven | Maven | Metric has been found and verified with Maven |
| @Decan2017 | Dependency Issues in OSS Packaging Ecosystems | Quantitative analysis (Survival analysis, statistics)| - | Eclipse, CRAM, npm | Eclipse, CRAM, npm | In all ecosystems, dependency updates result in issues, however the problems and solutions do vary. |
| @Kikas2017 | Package dependency networks | Quantitative (Statistics)| - | npm, RubyGems, Crates.io | npm, RubyGems, Crates.io | All ecosystems are growing and over time, ecosystems become less dependent on a single popular package. |
| @Robbes2012 | Developers response to API deprecation | Quantitative (Statistics)| - | Squeak, Pharo | Squeak, Pharo | API changes can have a large impact on ecosystem. Projects take a long time to adapt to an API change. |
| @Decan2018 | Quantitative empirical analysis of differences and similarities between the evolution of 7 varying ecosystems | Survival analysis | - | libraries.io | Cargo, CPAN, CRAN, npm, NuGet, Packagist, RubyGems | Package updates, which may cause dependent package failures, are done on average every few months. Many packages in the analyzed package dependency networks were found to have a high number of transitive reverse dependencies, implying that package failures can affect a large number of other packages in the ecosystem. |
| @Dittrich2014 | The article provides a holistic understanding of the observed and reported practices as a starting point to device specific support for the development in software ecosystems | Qualitative interview study | - | - | - | The main contribution of this article is the presentation of common features of product development and evolution in four companies. Although size, kind of software and business models differ |
| @Izquierdo2018 | Code review analysis | Virtualization of process | - | Xen Github data | Xen | Analysis of code review has lead to more reviews and a more thoughtful and participatory review process. Also providing accommodations for new software developers on OSS by easy access is very important. |
_Table 4.5: Papers and findings for RQ1._
| Reference | Explored topic(s) | Research method(s) | Dataset(s) | Ecosystem(s) | Conclusion |
| --------- | ----------------- | ------------------ | ---------- | ------------ | ---------- |
| @Hora2016 | Exploratory study aimed at observing API evolution and its impact | Empirical study | 3600 distinct systems | Pharo | After API changes, clients need time to react and rarely react at all. Replacements cannot be resolved in a uniform manner throughout the ecosystem. API changes and depreciation can present different characteristics.
| @Kula2017 | An Empirical Study on the Impact of Security Advisories on Library Migration | Empirical study | 4,600 GitHub software projects and 2,700 library dependencies | Github, Maven | Currently, developers do not actively update their libraries, leading to security risks. |
| @Constantinou2017 | Empirical comparison of developer retention in the RubyGems and NPM software ecosystems | Measurement of frequency and intensity of activity and retention | Github | NPM and RubyGems | Developers are more likely to abandon an ecosystem when they: 1) do not communicate with other developers, 2) do not have strong social and technical activity intensity, 3) communicate or commit less frequently and 4) do not communicate or commit for a longer period of time. |
| @Raemaekers2017 | To what degree do versioning schemes provide information signals about breaking changes | Snapshot analysis | More than 100.000 jar files from Maven Central | Maven | 1) Minor or major does not matter, both about 33% breaking chances, 2) breaking changes have significant impact and need fix before an upgrade, 3) Bigger libraries introduce more breaking changes, maturity is not a factor |
| @Malloy2018 | Analysis of the transition from Python 2 to Python 3 | Empirical impact study | 51 applications in the Qualitas suite for Python | Python | Most Python developers choose to maintain compatibility with Python 2 and 3, thereby only using a subset of the Python 3 language and not using the new features but instead limiting themselves to a language that is no longer under active development. |
| @Dietrich2014 | Java | Empirical study into evolution problems caused by library upgrades | Qualitas corpus (Java OSS) | Java | There are currently a lot of problems caused, but some relatively minor changes to developments tools and the language could be very effective. |
_Table 4.6: Papers and findings for RQ2._
| Reference | Open Challenges Found |
| --------- | --------------------- |
| @Abdalkareem2017 | Examine relationship between team experience and project maturity and usage of trivial packages |
| @Abdalkareem2017 | Compare use of code snippets on Q&A sites and trivial packages |
| @Abdalkareem2017 | How to manage and help developers choose the best packages |
| @Decan2018 | Findings for one ecosystem cannot necessarily be generalized to another |
| @Decan2018 | Transitive dependencies are very frequent, meaning that package failures can affect a large number of other packages in the ecosystem |
| @Jansen2014 | Determining the health of a system from an ecosystem perspective instead of project level is needed to determine which systems to use. This paper provides an initial approach but a lot more research could and should be done to determine system health. |
_Table 4.7: Papers and findings for RQ1._
[^1]: https://pypi.org/project/pip/
[^2]: https://crates.io/
[^3]: https://github.com
[^4]: https://scholar.google.com
[^5]: https://npmjs.com