The Long Overdue (Pre-) Release
Pre-releaseAbout the Release
This pre-release is long past due. In the meantime Travis-CI.org services were terminated, and other things in life of the maintainer insisted to be more important than learning about GitHub Actions. For a long time many improvements didn't make it into a binary release.
Now the transition to GitHub Actions is complete. In addition to the old features the volatile release tag will contain the binaries from the latest successful build-test workflow.
New Features
Improved CREATE ... DOES>
Issue #427 provides a much better implementation of DOES>
- and better means both faster and leaner.
The execution speed of the new solution is on par with an ordinary CREATE, VARIABLE or CONSTANT as can be shown in the following example:
: EMPTY CREATE DOES> ; \ just return the address
: CONS CREATE , DOES> @ ; \ return data
CREATE tcreate
VARIABLE tvar
EMPTY tempty
0 CONS tcons
0 CONSTANT tconstant
The following measurements where done with PulseView on a STM8S001J3M3 with 16MHz HSI and code compiled to Flash ROM.
Test | runtime | cycles |
---|---|---|
tcreate |
1.3µs | 21 |
tempty |
1.82µs | 29 |
tvariable |
1.89µs | 30 |
tconstant |
3.2µs | 51 |
tcons |
3.3µs | 53 |
This means that a runtime of an "empty" DOES>
, which returns the address of any data stored by a word definition, is 1.82µs. That's marginally faster than VARIABLE
and just a bit slower than CREATE
.
The simple constant value implementation : CONS CREATE , DOES> @ ;
is also just a bit slower than the literal stored by CONSTANT
. The latter uses the STM8 instruction TRAP
and requires just 3 byte, just like the CALL
to the word defined through CONS
).
The memory requirements compare as follows:
[bytes] | old | new | diff |
---|---|---|---|
: empty CREATE DOES> ; |
22 | 18 | 4 |
empty a |
13 | 7 | 6 |
STM8S001J3 binary | 4697 | 4662 | 35 |
The old implementation needs 4 bytes more for a "defining word" and 6 bytes more for a "defined" (the new DOES>
has the same memory needs as a word defined by CREATE
, CONSTANT
or VARIABLE
).
Also the STM8 eForth binary is 35 bytes smaller than before.
Improved >REL
>REL
is an implementation of IF ... ELSE ... THEN
using relative addressing modes. It's meant to be used as a compiler extension loaded into RAM as a scaffold for, e.g., compiling fast and extra compact ISRs (interrupt service routines) into Flash ROM.
@Eelkhoorn noticed that RAM space for the scaffolding code can be reduced and provided an improved implementation.
Words for Forth Standard compatibility
Issue #430 and #438 added library words for making STM8 eForth a bit more compatible with the Forth Standard. Some of the words are just "No Operation" dummy words (e.g. ALIGN
), some aliases (e.g., INVERT
), some simple definitions (e.g. >BODY
and some genuine extensions (e.g., VALUE ... TO
).
Please be aware that not all of these Forth Standard words will always do what you expect, e.g.:
VALUE ... TO
(likeDEFER ... IS
) assumes a writable dictionary- some words like
STATE
emulate some just of the standard semantics
Forth Standard | STM8 eForth implementation |
---|---|
>BODY |
: >BODY ( xt -- a-addr ) 3 + ; |
ALIGN |
no op |
ALIGNED |
no op |
C" |
' $" ALIAS C" |
CHAR+ |
' 1+ ALIAS CHAR+ ( c-addr1 -- c-addr2 ) |
CHAR |
: CHAR ( "char" -- c ) BL WORD CHAR+ C@ ; |
CHARS |
no op |
[CHAR] |
: [CHAR] ( "name"<spaces -- ) CHAR POSTPONE LITERAL ; IMMEDIATE |
COMPILE, |
' CALL, ALIAS COMPILE, ( xt -- ) |
ENVIRONMENT? |
: ENVIRONMENT? ( c-addr u -- false ) 2DROP 0 ; |
INVERT |
' NOT ALIAS INVERT ( x1 -- x2 ) |
J |
like I (only for DO ... LOOP , not FOR ... NEXT ) |
STATE |
"kludge" using STATE? and a variable stateflag |
TO |
see VALUE |
VALUE |
limited to writable dictionary (RAM or NVM when writable) see lib/VALUE |
Issue #430 refactored CREATE
and VARIABLE
in order to facilitate implementing the Forth Standard words VALUE
and TO
.
The following additional words are already available in volatile and they will be available in the next release (2.2.29):
Forth Standard | STM8 eForth implementation |
---|---|
CELL+ |
' 2+ ALIAS CELL+ ( c-addr1 -- c-addr2 ) |
CELLS |
' 2* ALIAS CELLS ( n1 -- n2 ) |
FALSE |
' 0 ALIAS FALSE ( -- false ) |
RSHIFT |
like LSHIFT ( n1 u -- n2 ) |
TRUE |
' -1 ALIAS TRUE ( -- true ) |
Improved "pictured number" words
While working on optional words for Forth Standard compatibility it became clear that while Forth Standard compliant "pictured number output" with # ( ud -- ud)
instead of # ( u -- u)
(double instead of single math) would increase the code size only marginally but the math would make printing numbers in a background process slower. This might break applications that print numbers in a background task as the limit of 1ms task run-time is exceeded (unless a fast 32bit/8bit division or buffered I/O is used).
Issue #433 explored options for improving the code. It turned out that #
can be made faster by using the instruction DIV X,A
(with the DIV
/DIVW
erratum work-around). The code could also be made leaner by in-lining the code of DIGIT
and EXTRACT
(these are eForth words which are not available in other 16bit Forth implementations, e.g., the well known F83 - they also don't appear in the Forth Standard).
PulseView and the word ..
(which toggles a GPIO with PLo
and PHi
) were used for testing:
: .. ( u -- u ) PLo <# PHi #S PLo #> PHi TYPE ;
For example, here is the timing for DECIMAL 65535 ..
:
The following table shows that #
and #S
are much faster now:
.. | Base | <# #S #> old [µs] |
<# #S #> improved [µs] |
---|---|---|---|
65535 | 10 | 155 | 31 |
6 | 10 | 53 | 22 |
65535 | 16 | 131 | 29 |
65535 | 2 | 446 | 60 |
The toggles around <#
and #>
revealed that about 4µs can be saved by coding the 16bit <literal> +
in PAD
in assembler (13µs to 9µs - the numbers in the table contain this optimization). In a BG
task PAD
is slightly faster as it returns a constant address. When using numeric output in a background task, e.g. for presenting measurements on a LED display with CR .
, the more efficient "pictured number words" makes a real difference.
Note: Forth Standard compatible "pictured number" words with double number output (e.g. D.
) can be provided later through library words. in In a 16bit Forth it's important to keep in mind that a limitation of UM/MOD ( ud un -- ur uq )
- the 16bit result - correct output for double numbers is limited to "65536 x BASE - 1" (e.g., 655359 for base 10). For larger numbers a 32bit division with 32bit result is required (with 8bit divisor).
Bug fixes and other improvements
Improved .0 (3-digit signed number print)
Issue #432 fixes a few edge cases of .0
, the signed number output for 3 digit (LED) displays: numbers smaller than -994 or larger than 9994 had digit overruns - and thus potentially wrong display values.
The updated version was shown to work for the following values:
-999 .0 DEF. ok
-995 .0 -99 ok
-99 .0 -9.9 ok
0 .0 0.0 ok
999 .0 99.9 ok
1000 .0 100 ok
7876 .0 788 ok
9995 .0 DEF. ok
Leaner console text input words
Issue #435 saved some ROM space in the input words ACCEPT
, KTAP
, and QUERY
.
CREATE and VARIABLE refactored
Common functionality from CREATE
and VARIABLE
was refactored into the new word ENTRY
(used by VALUE
).
Set INT_TLI to COLD
@Eelkhoorn ran into a problem when changing ISR code in a development cycle:
Uploading the I2C interrupt service routine to STM8L (both 051F3 and 151K4) can lead to corrupted ITC_SPR registers, persistent even after power cycle. Writing xt of COLD to INT_TLI (reset vector) solved the issue.
The last four entries of the interrupt vector table (0x8070 to 0x8080) seem to be corrupted after boot for STM8L.
Pull request #440 appears to solve the issue. The problem needs further analysis.