Project for Squeak 5.3 (and above) to explore and learn about the history of Squeak and Squeak-related projects.
-
Get a Squeak image (5.3 or newer).
-
Load the Squeak History Project:
Metacello new baseline: 'SqueakHistory'; repository: 'github://hpi-swa/squeak-history/packages'; load.
After the code was loaded, mailing list archives will be downloaded. See BaselineOfSqueakHistory >> #loadData
for details.
The SqhMailmanAggregator
can be used to enumerate all mail messages. Example queries can be found in the queries
protocol. The message cache holds meta-data for each message in the image, that is, instances of SqhMailWrapper
. The message body has to access the archive's file contents on disk, which is slower. Here is an example query, which requires disk access:
countMessageLines
| count |
count := 0.
self messagesCachedDo: [:wrapper |
count := count + wrapper mailMessage bodyText lineCount].
^ count
Note that you should also derive some rules for author-key normalization to further improve the overall quality of query results. Just run this:
SqhMailmanAggregator new
showProgress: true; "optional"
deriveRulesForAuthorKeyNormalization. "ignore the warning"
For more information on normalization, see below.
There are rules to normalize different kinds of information: author names, timestamps, and mail addresses. The goal is to identify contributors and, eventually, relevant discussions. Hand-selected rules can be found in SqhMailmanAggregator >> #rulesForAuthorKeyNormalization
and #rulesForAuthorKeyClarification
. Here is an excerpt:
"rulesForAuthorKeyNormalization"
'alankay' -> 'alancurtiskay'.
'hhirzel' -> 'hanneshirzel'.
"rulesForAuthorKeyClarification"
'squeakdev@reider.net' -> ('squeakdev' -> 'alanreider').
'squeak@bike-nomad.com' -> ('squeak' -> 'nedkonz').
Then, there is a simple algorithm to derive more normalization rules using e-mail addresses as identifier. If two messages came from the same address, then the author's names can be used to identify the same contributor. There is also a check to avoid mapping cycles. See SqhMailmanAggregator >> #deriveRulesForAuthorKeyNormalization
for details. We filter generic addresses such as:
notifications@github.com
noreply@github.com
no-reply@appveyor.com
builds@travis-ci.org
Note that there is also a (hand-crafted) list of generic author names (or keys) in #genericAuthorKeys
including github
or travisci
.