tags |
---|
scroll documentation |
code: https://github.com/scroll-tech/zkevm-circuits/blob/develop/zkevm-circuits/src/keccak_circuit.rs develop
branch
link to the original HackMD file: https://hackmd.io/@dieGzUCgSGmRZFQ7SDxXCA/H1vMF0_u2
We shall first explain the Keccak hash function to be proved by our Keccak Circuit. This follows NIST Keccak Spec, Keccak Team Keccak Spec and the Keccak256 Implementation in our codebase.
Any instance of the Keccak sponge function family makes use of one of the seven Keccak-f permutations, denoted Keccak-f[b], where
We focus on Keccak-f[1600] which has
-
$\theta$ -step:
-
$\rho$ and$\pi$ -steps:
$$B[y,2x+3y]=\text{ROT}(A[x,y], r[x,y]) \ , \ x=0,...,4, y=0,...,4 \ ,$$ -
$\chi$ -step:
$$A[x,y]=B[x,y]\oplus ((\text{NOT} \ B[x+1,y]) \ \text{AND} \ B[x+2,y]) \ , \ x=0,...,4, y=0,...,4 \ ,$$ -
$\iota$ -step:$$A[0,0]=A[0,0]\oplus RC[k] \ .$$
In the above all the operations on the indices are done modulo
Cyclic shift offset constants
25 | 39 | 3 | 10 | 43 | |
55 | 20 | 36 | 44 | 6 | |
28 | 27 | 0 | 1 | 62 | |
56 | 14 | 18 | 2 | 61 | |
21 | 8 | 41 | 45 | 15 |
Round constants
The final Keccak hash makes use of the sponge function, with arbitrary bit-length input and 256-bit output. Implemented using Padding, Absorb and Squeeze as below:
NUM_WORDS_TO_ABSORB
= 17, NUM_BYTES_PER_WORD
= 8, then RATE
= 17 NUM_BITS_PER_BYTE
= 8 , then RATE_IN_BITS
= 17
Given an input with length counted in bytes, compute padding_total=RATE-input.len()%RATE
, this gives total number of padding bytes. Note that if input length in bytes is a multiple of RATE
, the number of padding bytes is RATE
but not 0. This means we always do padding.
If the above padding_total
is 1, then we pad a byte for padding_total
.
In terms of bits, if padding_total
is 1, we pad padding_total
. Here the bit representations are understood in big-endian form, so padding_total
. Here
After padding, the input is extended to be with length in bytes a multiple of RATE
, so in bits a multiple of RATE_IN_BITS
. This result represented in bits is divided into chunks of size RATE_IN_BITS
.
Following the result of padding represented in bits, in each chunk further divide this chunk into 17 sections with length 64 for each section. At the first chunk, initialize from all
At the end of the above absorb process, take
We are given the Keccak original input in bytes but without the padding bytes in the following order
Then RLC using Keccak input FirstPhase
) to obtain
In the Keccak Circuit, it corresponds to the column
We are given the Keccak hash output being the FirstPhase
) to obtain
The whole of zkevm-circuits at the level of super circuit uses the following structure of the Keccak table which contains the following 4 advice columns
-
is_final
: when true, it means that this row is the final part of a variable-length input extended with padding, so all Keccak rounds including absorb and permutation and squeeze are done, and the row's corresponding output is the hash result; -
input_rlc
: each row takes one byte of the input, and add it to the previous row'sinput_rlc
multiplied by the challenge for Keccak input. At the row for the last byte of the input (without padding, sois_final
may not be true), this row is theinput_rlc
for the whole input. After that row, if there are padding rows untilis_final
becomes true, thisinput_rlc
keeps the same; -
input_len
: similarly toinput_rlc
, it adds up$8$ bits for each row until the last byte (without padding) which gives the actual input length in bytes; -
output_rlc
: it is zero for all rows except the row corresponding to the last byte of input (with padding, so thatis_final
is true), where result is theoutput_rlc
as we defined in the previous section.
The multi-packed implementation for the Keccak Circuit uses special arithmetic called (Keccak) sparse-word-representation, this splits a 64-bit word into parts.
Each bit in the sparse-word-representation holds BIT_COUNT
(=3) number of 0/1 bits, so equivalent to BIT_SIZE
(=8) as the base of bit in the sparse-word-representation.
A 64-bit word in sparse-word-representation can thus be expressed as a single field element since 64 x 3=192<254.
A pack_table
indicates the relation between the standard bit-representation and the Keccak-sparse-word-representation, e.g. map 3 = pack
. The reverse operation is unpack
. So
These operations have to go back-and-forth between input (to Keccak internal) and output (from Keccak internal).
It must be noticed that the packing related functions/modules pack
, unpack
, into_bits
, to_bytes
etc used inside Keccak Circuit are all in terms of little-endian order (in bits).
The underlying reason why we use these packing and unpacking operations is because of the simple fact that for two bit variables BIT_SIZE
is chosen to be 8 because in the
Each part
has num_bits
indicating how many bits it contains in a sparse-word-representation. Each bit holds a base of BIT_SIZE
(=8). So if
This is a util struict that returns a description of how a 64-bit word will be split into parts. If
uniform
is true, thentarget_sizes
consists of dividing 64 bit positions into evenpart_size
plus a possible remainder;uniform
is false, thentarget_sizes
consists of dividing both therot
(rotation) and (64-rot
)-bit words into evenpart_size
but namedpart_a
andpart_b
plus a possible remainder for each.
The parts splitted from the word is then determined bit-by-bit from target_size
, so that each part will consist of a collection of bit positions that it represents, starting from 0-th bit position. The way bit positions are determined ensures that if
uniform
is true, then the remainingrot
-sized bits [63-rot
+1,...,63] are divided bypart_size
plus a remainder, and first 64-rot
bits are determined by a section compensating previous remainder, plus divide bypart_size
, and plus the remainder fromtarget_size
division;uniform
is false, then the remainingrot
-sized bits [63-rot
+1,...,63] are divided bypart_size
plus remainder, and first 64-rot
bits determined bypart_size
plus a remainder.
The way we do the above split when uniform
is true enables an optimization shown below, where after rotation the front and tail remainders combined together becomes an inverval of length part_size
:
Such a property is useful in split_uniform
, because in that case it can use this specific partition to split an input word to parts and then rotation transform it to an input that will have the same partition layout as the output after rotation operation in Keccak permutations. This enables same input-output shape so that we can do lookup uniformly on input-output pairs without re-partition (uniform_lookup = true
), i.e, splitting up the words into parts, then recombining and then splitting up again. This saves a significant amount of columns.
The module splits a parts
according to a given part_size
using Word_Parts
with uniform=false
. Each part
consists of a collection of connected bit positions (bit_pos
from 0-63) that it represents, so it corresponds to a field element (done via pack_part
).
input
at the part
.
The module splits a parts
according to a given part_size
using Word_Parts
with uniform=true
.
When this module is called, it will use CellManager
to allocate a region to store the input_parts
, which are to be the partition returned by WordParts
. It then merges the splitted part_size
interval that contains 0 into one part with size part_size
in the output_parts
, while stores the original splitted input_parts
that are divided by 0 as two separate parts.
The returned parts from this module are the output_parts
, which are used for inputs in uniform lookup to check relations constrained by Keccak-permutation operations that involve rotations, such as
It calls transform_to
module to bitwise transform values to cells using a function and constrain the input-output relation using a lookup table.
It bitwise transforms input
expressed in parts
to output_parts
that are in cells using a function f: fn(&u8) -> u8
. It constrains the input-output part relation using transform_table
, i.e., the (input_part, output_part)
must be found in (transform_table[0], transform_table[1])
.
Circuit uses Keccak specific CellManager
, which layouts advice columns in SecondPhase
. They form a rectangular region of Cell
's.
CellManager.rows
records the current length of used cells at each row. New cells are queried and inserted at the row with shortest length of used cells. The start_region()
method pads all rows to the same length according to the longest row, making sure all rows start at the same column. This is used when a new region is started to fill in witnesses.
It is worth to notice here that the Keccak specific CellManager
behaves very differently from EVM Circuit's CellManager
(see here), since the latter aligns new cells in a horizontal way instead of vertical.
Following this link is a figure showing the Keccak Circuit layout. This figure only shows a
Each row in the table is a KeccakRow
, with the following items
-
KeccakRow
:-
q_enable
: Fixed Column, bool; -
q_first
: Fixed Column, bool, the first row; -
q_round
: Fixed Column, bool; -
q_absorb
: Fixed Column, bool, absorb row; -
q_round_last
: Fixed Column, bool, last round; -
q_padding
: Fixed Column, bool, selector of padding row; -
q_padding_last
: Fixed Column, bool, last padding row; -
round_cst
: Fixed Column, the round constant$RC[k]$ ; -
is_final
: bool, previous hash is done; -
Cell_Values
: values in cells which are used in Keccak Circuit's configuration, such asabsorb_fat
,absorb_res
,input_bytes
, etc.; -
length
: the actual input length in each round throwing away padding; -
data_rlc
: RLC of input data, after absorb; -
hash_rlc
: RLC of hash output, after squeeze;
-
The first round of rows must be initialized with dummy values, in particular, set is_final
= false.
In the current set-up, the number of rows that each round will occupy is DEFAULT_KECCAK_ROWS = 12
. This means CellManager
at each region aligns cells vertically to reach a height of 12 before it starts to align cells at the next consecutive column.
The number of columns consumed by each region (with different color in the figure) shown in the figure is just schematic. In reality this is determined by the size of data to be processed and the part_size
, so that CellManager
will insert each part into a cell and (vertically) align them.
Spread input, absorb data and squeeze data along with the rounds of Keccak permutation in the circuit layout
Unlike the standard Keccak function, circuit spreads 17 absorb data and 4 squeeze data along with the 24 rounds of Keccak permutation. It uses selectors q_absorb
and q_round_last
to control absorb and squeeze data related constraints. In the figure the 4 squeeze data are attached at the end of diagram, but in reality this only happens at the last block of the input hash (with padding) so that squeeze will be applied (i.e. is_final
is true), and then the final hash result will be obtained. However, each 24-round will record squeeze_from_prev
, though they are only needed when the final hash result is obtained.
The Keccak circuit has to process variable length input which is divided into chunks, and for each chunk there will be allocated 24-rounds of Keccak permutation. So the Keccak Circuit configuration frequently uses negative rotations to fetch results from previous rounds (such as squeeze_from_prev
). This of course cannot be done at the initial round, so in the witness generation function multi_keccak
we will assign dummy first rows for 1 round (12 Keccak rows).
The internal lookup tables in Keccak Circuit are used mainly in transform
related modules. These tables include
-
normalize_3
: [TableColumn; 2], parity normalize table with each bit in range$[0,2)$ ; -
normalize_4
: [TableColumn; 2], parity normalize table with each bit in range$[0,4)$ ; -
normalize_6
: [TableColumn; 2], parity normalize table with each bit in range$[0,6)$ ; -
chi_base_table
: [TableColumn; 2], for$\chi$ -step lookup,$[0,1,1,0,0]$ ; -
pack_table
: [TableColumn; 2], indicates the pack relation, i.e. 0..256 encoded as Keccak-sparse-word representation;
Since these tables can be used in multiple bits where each bit will do lookup according to the table, e.g. with number of bits equals part_size
, Keccak Circuit needs to determine the optimal part_size
for each table. This is done in the function get_num_bits_per_lookup_impl
, which determines the maximum number of bits by the condition KECCAK_DEGREE
=19.
In some other circuits such as the aggregation and compression circuits, there is a need to exactly compute the number of rows consumed by one hash. This can be done by first compute the number of bytes in the input plus padding, and then divide into chunks of size RATE
=136 bytes. The number of Keccak rows that each chunk will consume is equal to
DEFAULT_KECCAK_ROWS=12
, NUM_ROUNDS=24
this number will be 300. The number of chunks is determined by the input length plus padding and is equal to
normalize_3
, normalize_4
, normalize_6
are normalization tables that are used to record the parity transformation of a sequence of bits with length part_size
with each bit in [0u64,...,range)
(range
=3,4,6). Parity is taken as 0 for even and 1 for odd.
State data (used to be denoted as [0,...,7]
).
Witnesses filled in a newly started region by CellManager.start_region()
.
First bitwise add
[0,5]
partitioned into parts with Keccak's split
module with rot
=1, then use normalize_6
table to reduce it part by part to
After that, calculate
-
Soundness: Use the symbols in the previous section on Keccak-f permutation function, it can be checked that
$C[x]$ is the same as the parity of$A[x,0]+A[x,1]+...+A[x,4]$ . So this is what$bc[i]$ checks at thenormalize_6
table lookup step. In a same rationale,$os[i][j]$ after normalization stands for the parity of$A[x,y]\oplus D[x]$ . This normalization is postponed to$\rho/\pi$ -step usingnormalize_4
table lookup. -
Completeness: Since
$C[x]$ is the same as the parity of$A[x,0]+A[x,1]+...+A[x,4]$ , any selection of witnesses that satisfy original$\theta$ -step in the Section on Keccak-f permutation function will pass the constraints.
Witnesses filled in a newly started region by CellManager.start_region()
.
There are 3 consecutive regions for rho_pi_chi_cells
:
-
rho_pi_chi_cells[0]
: position$[j][2i+3j]$ is rotation of$s[i][j]$ by$r[i][j]$ (as in the$\rho/\pi$ -step) saved as parts usingsplit_uniform
withrot=r[i][j]
. Each 5-cells row counted as 1-column; -
rho_pi_chi_cells[1]
: transformrho_pi_chi_cells[0]
using normalize tablenormalize_4
; -
rho_pi_chi_cells[2]
: record the transform$3-2a+b-c$ used for$\chi$ -step, will explain in that step.
Lookup in normalize_4
table already gives a constraint. This is for the split_uniform
will combine small parts that are seperated by normalize_4
of these small parts in a uniform way.
-
Soundness: The
normalize_4
table lookup is used for the$os[i][j]$ -parity check that is responsible for the$A[x,y]\oplus D[x]$ substep in the$\theta$ -step. The other parts of this step are mostly to record values and do lookup range check. -
Completeness: Same as
$\theta$ -step, using equivalence of$A[x,y]\oplus D[x]$ with parity of$A[x,y]+D[x]$ .
Witnesses filled in a newly started region by CellManager.start_region()
.
Do
- Soundness: We have
1 | 1 | 1 | 1 | 1 |
1 | 1 | 0 | 1 | 2 |
1 | 0 | 1 | 0 | 0 |
1 | 0 | 0 | 1 | 1 |
0 | 1 | 1 | 0 | 3 |
0 | 1 | 0 | 0 | 4 |
0 | 0 | 1 | 1 | 2 |
0 | 0 | 0 | 0 | 0 |
So we lookup to table generated from mapping
-
Completeness: From the above table we see that any array of 0/1 bits
$(a,b,c,d)$ that satisfy$a\oplus (\text{NOT } b \text{ AND } c)=d$ must satisfy the$3a+2b-c$ lookup table constraint.
Witnesses filled in a newly started region by CellManager.start_region()
.
Do, for round normalize_3
table.
-
Soundness: This is simply because the parity of
$s[0][0]+\verb"round_cst"[k]$ gives the xor result$s[0][0]\oplus \verb"round_cst"[k]$ . -
Completeness: Same as
$\theta$ -step, using equivalence of$s[0,0]\oplus \verb"round_cst"[k]$ with parity of$s[0,0]+\verb"round_cst"[k]$ .
Verify the absorb data in parallel with each round of permutation, so in the first normalize_3
table.
-
Soundness: Clear from fact that
$a\oplus b$ has the same parity as$a+b$ (i.e. normalize to${0,1}$ ). - Completeness: Same rationale.
When the previous hash is done:
(1) Verify that the hash RLC result from squeeze_from_prev
is the same as hash_rlc
.
(2) Verify that squeeze_from_prev
are indeed filled in with the same values as state words
The Rationale of above constraints are pretty straightforward
Use length
to record the actual length of input bytes, and pad until total length is an integer multiple of RATE_IN_BITS
. On padding rows set a padding selector is_padding
to be
(1) is_padding
is boolean;
(2) is_padding
is zero on absorb row;
(3) is_padding
changes from 0 to 1 only once;
(4) padding input byte starts with
The Rationale of the above constraints are also straightforward following the definition of padding. Only one thing that needs to pay attention: Keccack internal uses byte representation together with Keccack-sparse-word-representation.
RLC uses keccak input
Update data_rlcs
on each of the
and
data_rlc
of the current round, and it is passed to the next round so that after all 17 absorb rounds, we get the final data_rlc
for the current chunk data. This is then passed to the next chunk until the last byte of input data, where data_rlc
must exclude padding data. So in the final part of the input byte data_rlc
will be equal to the corresponding input_rlc
in the Keccak table.