Math

CORDIC

CORDIC, or COordinate Rotation DIgital Computer is an efficient algorithm to calculate trigonometric and hyperbolic functions. This is very useful, as trigonometric functions are known to be simple and useful, but yet very hard to compute. While new and more powerful algorithms already exist, the CORDIC algorith has the advantage that it doesn t need any complex operations to work, just addition, substraction, and bit shift.

Most stm32 microchips, including ours, already include a hardware CORDIC acceleration unit inside of them, connected to a data bus that lets you input information on it and read the results. This CORDIC unit will be refered as "CORDIC" from now on, and the CORDIC algorithm would be refered as such. The reason they don t use a faster algorithm is that it would need more logic doors to implement, as it would need float operations, which would include multiplications, divisions, and all of those that the CORDIC uses (but in floating point, which is slower). So the CORDIC ends up being faster by virtue of needing less logic doors and less time per cycle.

The CORDIC can be accessed using the HAL and LL libraries or by directly writting and reading its registers. The last option is the best if you re disposed to put some time to learn how to configurate it from there. The bus itself already has a protection so you cannot read before the operation is completed, and lets you write while operating without changing the current operation (would affect the new ones), so its nearly fool proof.

As the libraries made to handle the CORDIC work directly on the registers, here we have a brief of how the registers work:

CSR

The CSR register lets you input the configuration and read it to know wheter or not the CORDIC is ready to calculate. Its a 32 bit register, and most of the bits have a different purpose. Here will be indicated, with masks and a little bit of abstraction, what does each bit do:

0x0000000F: function bits. 0 is cosine, 1 sine, 2 phase, 3 modulus, 4 arctangent, 5 hcosine, 6 hsine, 7 harctangent, 8 natural logarithm and 9 square root.
0x000000F0: Precision bits. The amount of cycles the calculation will take, depending on the operation adding more would do nothing but add processing time.
0x00000700: Shifting bits. Putting any value but 0 would shift by that amount the input and output values to the right. This will let the CORDIC output values higher than one at the expense of the code needing to re shift to the left and losing some decimal precision.
0x00010000: Interrupt enabled. Lets the CORDIC make an interrupt when it completes the calculations. Useful on DMA situations.
0x00020000: DMA read enable. Lets the CORDIC and DMA connect to read values from CORDIC.
0x00040000: DMA write enable. Lets the CORDIC and DMA connect to issue new operations without changing the configuration.
0x00080000: Amount of out arguments. 0 is 1 and one is 2. Depending on the operation the bit changes.
0x00100000: Amount of in arguments. 0 is 1 and one is 2. Depending on the operation the bit changes.
0x00200000: Half word use for in arguments. 0 uses a 32 bits word, and 1 uses a 16 bit word instead.
0x00400000: Half word use for out arguments. 0 uses a 32 bits word, and 1 uses a 16 bit word instead.
0x80000000: Ready flag. Only part of the register that is used on read and not on write (on normal use). Lets the microcontroller know if the CORDIC is ready to operate.

WDATA

The WDATA register lets you input on the CORDIC the variables for the operation. Unless the operation has two in arguments (Amount of in arguments = 1) and each of them takes a full word (Half word use for inarguments value = 0), writting more than once would result in the first value being used, and the one after that being saved for the next operation.

In the case of two in arguments of full word size two writes are needed to start the operation. The two following these would be saved for the next operation, and the fifth one would be lost.

RDATA

The RDATA register lets you fetch the results of the operation. If an operation uses a full word (Half word use for out arguments value = 0) and outputs two results (Amount of out arguments value = 1) reading two times from it would be needed to get all the information.

RotationComputer class

RotationComputer is the class that directly handles the CORDIC. Its structure is quite simple when you undestand how the CORDIC works. It has one method for each operation, and a variable to know what was the CORDIC last operation configured. If the last configuration isn t the same as the configuration needed for the operation, the method configurates the CORDIC first.

When the invocated method assures that the CORDIC is correctly configurated, it starts operating. A for loop inputs and outputs the values on the given memory pointers using the registers of the CORDIC. The memory bus handles these operations, so these are safe. When the loop has no values left to operate on the arrays, the method ends.

An struct like this operation(int32_t *input1, int32_t *input2, int32_t *output1, int32_t *output2, int32_t size) is what every method follows. Some of them have only one input or output, and each of them uses a different unit.

The units are:

angle: Uses all the int ranges to go from [-pi, pi] angles.
unitary: Uses all the int ranges to go from [-1,1] values. Also used to represent one coordinate on a eucclidean space bounded on the -1,1 range.

And the methods:

cos: Cosine function. one Angle input, one unitary output. Works on all the int range.
sin: Sine function. one Angle input, one unitary output. Works on all the int range.
cos_and_sin: Cosine and Sine function. one Angle input, two unitary outputs; first cosine then sine. Works on all the int range.
phase: Angle between (1,0) vector and input vector. Two coordinates x and y inputs (on unitary), one Angle output. Draws a vector between (0,0) and (x,y), and calculates its angle with (1,0). Works on all the ints range, but losses precisions on vectors with module near to 0
modulus: Modulus of a given vector. Two coordinates x and y inputs (on unitary), one unitary output. If the unitary result its going near one it starts to fail, giving aberrant results. recommended to not surpass the 1 500 000 000 on any of the unitaries if using this method.
phase_and_modulus: Phase and Modulus function. Two unitary (x and y) inputs, two outputs; first angle of phase, then unitary of modulus. Shares weaknesses with phase and modulus functions (separately)

Also, there are two aditional testing methods, that transform angle unitary ints to angle radian floats from one to the other. These are slow, and not intended to be used on continuous calculations, just for testing. These are q31_to_radian_f32 and radian_f32_to_q31.

Math class

Math is the abstraction class for the RotationComputer class. It tries to simplify the use of RotationComputer while losing the least efficiency possible. It also gives some additional new operations taking advantage of mathematical equalities with the existing operations of the CORDIC. The Math class uses the same units as the CORDIC, and add new units for valus higher than one. These units are binded to a few operations, and will be mentioned on those.

This is the updated list of working methods:

sin: receives an angle and returns an unitary. Just calls CORDIC.
cos: works the same way as sin
tg: receives an angle and returns a tg value. tg values have an adjustable range, defined by the number of bits it uses for decimal precision on the constant TG_DECIMAL_BITS. On TG_DECIMAL_BITS 32 it works as an unitary, On TG_DECIMAL_BITS 16 (recomended) goes in the range (-65536,65534) and on lower values it gets less decimal precision and more range (x2 for each extra bit). The way it works is quite simple, it gets the cosine and sine from the CORDIC and divides the sine by the cosine, which results on the tangent. The max positive int value is a special value saved to represent "infinite"
phase : receives the x and y coordinates and returns an angle. Just calls CORDIC.
modulus : receives the x and y coordinates and returns the modulus. To handle the problem of values higher than one it de-escales the x and y to give it to the CORDIC and re-scales it after. As it returns the modulus as unitary and can give a value as high as 1.4 it takes profit of the negative values to indicate when the value goes over one (modulus can only return positive values). To correctly read it, use uint instead of int. (Or handle it when its negative)
atg : receives a tg value and returns an angle. To see how tg values work, read tg just abode this. The calculations uses phase and the relation of x and y on tg returns to calculate the arctangent. Making multiple the arctangent of the tangent can accumulate some error, but properly gives around the same result if not spammed.

There are also fast methods to traduce from tg to unitary, and unitary to tg, that can be used between calculations without a noticeable increase on the computation time.

Warnings and extra information

most methods have less than -+0.0004% error (the difference between given result / expected result is less than that %) on its worst, all of them but module, tg, and atg. The first shares this error in its prefered use case, but some values can send the % of error to the sky, so care while using. tg and atg have greater limitations on their error caused by their representation (simply put, more precission can be fit in their bits unless we make some sacrifices)

tg: For the standart TG_DECIMAL_BITS, when the angle is in between of (89.99825155, 90.00174845) (and (269.99825155, 270.00174845)) it will return the max value instead of calculating it (to avoid division by 0), thus why the maximun int value its saved as a special value for "infinite". The range of actual values narrows and the range of special value widens as we increase the TG_DECIMAL_BITS, so for 32 bits the range should be (45,135) (225,315) //not tested

module: The module method is sensible on some ranges. The lower the input numbers, the lesser the precission. To be exact, the lower the x+y the lower the precision, losing 1 decimal of precision per each exponent of ten it is reduced. The bare minimun for it to give a reasonable result is around 2 000 000, where the deviation is -+0.01% over the real result. For very small numbers (like 2000) error always runs down, so for near 0 results it will give 0. Have in mind this works on the (-2147483648,2147483647) range.

Also, have in mind that unlike other methods, module only works in uint and not int, as it needs to returns values higher than one in unitary units, which isn t posible on int. If you save the value on an int and then cast it to uint it still works perfectly, just have in mind that -180000000 isn t a negative result, but a result higher than one represented in the wrong format.

Efficiency of each method and tests

tg: One benchmark test on a stm H723ZG at 550 MHz that gone on a for around all possible int values (-2147483648,2147483647), having inside a % operation, two asignations, and a tg call needed around 101 minutes to process, so around 708740 tg calculations per second on practice is expected. The code test was very simple:

Some samples of the calculations were taken to check if the values were actually the ones expected, and all of them gave positive results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly