Padding-kk state machine
All Keccak-related state machines are accessed through the Padding-KK state machine. It is therefore responsible for handling queries from the Main state machine. The common queries are requests for digests of messages, together with validation of these digests.
In this document, the workings of the Padding-KK SM are described. How it validates the validity of hash values, input string lengths, and input string reading to ensure that padding requirements are followed.
First, keep in mind that the Padding-KK SM’s operations are byte-based, whereas the Keccak-F SM’s actions are bit-based. This discrepancy in string formats is handled by the Padding-KK-Bit SM.
Padding and the Keccak-F SM¶
Although the hashing of messages is carried out by the Keccak-F SM, the padding happens in the Padding-KK SM. Messages are presented to the Padding-KK SM as byte-strings in hexadecimal form. But the Padding-KK-Bit SM ensures that these are presented as bit-strings to the Keccak-F SM.
Even though Keccak-F SM receives strings of any length as inputs, each input-message to the Keccak-F SM is first split into blocks of 1088 bits (i.e., 136 bytes), called the bit rate (or rate). If the tail-end of the splits is shorter than 136 bytes, or if the original message is shorter than 136 bytes, a specific string is appended to it in order to form a full 136-byte string. The appended bits (or bytes) are referred to as the padding.
Keccak’s first padding-rule is to append the string
The second padding rule: If the input-message is exactly 136 bytes long, or a multiple of 136 bytes, then a block of 136 bytes consisting of just the padding
It is crucial to emphasise that the Polygon zkEVM follows the Keccak construction used in the Ethereum. So then the Padding-KK SM does not append any other ‘fixed’ bits to the padding, such as appending “
Input strings and padding¶
Consider the strings
Note that the bits (or bytes) of these strings cannot be simply combined and presented as one stream of bits (or bytes) as though they belong to one long string. But each string
The idea here is to map each string to one or several blocks of 136 bytes (1088 bits), and include the proper padding at the tail-end part.
Observe that, as shown in the first string in the Figure above; Whenever the length of a certain string is a multiple of the block length of 136 bytes, a new block containing only the padding must be appended. Once this is done, the new resulting string can be provided to the Keccak-F SM for hashing.
Dealing with Main SM queries¶
The Padding-KK SM is in charge of validating that the padding rule is correctly performed, as well as validating each of the prescribed operations;
- Validate the lengths of given strings
. (i.e., length check), - Validate the hashes of certain strings
. (i.e., digest check), - Validate reads from 1 to 32 bytes of string
. (i.e., read check).
Next is to prepare the machinery that will enable the Padding-KK SM achieve these three checks.
Setting up columns for verification purposes¶
We now design a series of columns that will enable us to completely verify correctness of every state transitions.
: This register will store every byte of the padded input (as commented before), one byte per row.-
: This register will store an increasing sequence of integers starting from 0, changing its value at the beginning of a new string. Of course, an address completely determines the corresponding string of a certain byte. -
: This register represents the connection between two blocks. If is 1, the actual block is in the same string with the previous block. Otherwise, the last block is in the previous string. -
: This register flags the last row of every block. If is 1, the next block will start in the next row of the table. -
: For each string, this register is a decreasing sequence of signed integers. Starting at the length of the current string and it keeps decreasing until the last block of the string is reached. Observe that this value can be negative since a padding may be present in the string. ( is short for ‘remaining’.) -
: For each string, this register stores original length (i.e., the length of the string before padding was appended). Therefore, this register remains constant for all rows of each string. Observe that, at the first row of each string, coincides with . -
: A computed register which is 1 whenever is zero, and 0 otherwise. -
: The register is 1 just after the byte , corresponding to the appearance of padding bits. Observe that it can happen that is constantly 0 among a full string. This is because we can have the situation where the padding only consists of the byte . -
: This register is actually computed from the registers , and as follows,
This means
Observe that
Example: Padding check using columns¶
Let us illustrate this design with a table, using the columns defined above. Suppose the following strings have been padded and are ready for hashing:
where “|” indicates the end of a 136-byte block. In the below table, these 136-byte splits between blocks are indicated with a horizontal line.
The above table illustrates how the columns can be used to record the executional trace of the Padding-KK State Machine. As it is our general approach, a strategy towards achieving verifiable computations, the next task is to describe the state transitions of the Padding-KK SM in terms of polynomials.
Description of state transitions in PIL¶
By capturing the relationships between and among the columns (registers defined above) in terms of equations, amounts to translating the execution of the padding into a verification code written in PIL.
-
In order to guarantee that the value recorded in the
register decreases until is 1 (that is, the end of the string), use the relation, -
How can we validate the fact that
was constructed as expected?First observe that
changes to 1 immediately after a 1 was recorded in the register. (i.e., immediately when the padding starts, and when .)Secondly, notice that after this point,
remains 1 until (and including when) equals 1. After which, becomes 0.Hence, these behaviour can be captured as, $$ \texttt{spare}’ \mathtt{ = (spare + remIsZero)\cdot (1 - lastHash)} \tag{Eqn.3} $$
-
Verifying correctness of the
register requires two constraints;→ Checking that
is constant in each block→ Checking two specific situations,
-
When
is 1 and is 0 : in this case, the next value of should be 0, because of the block change but within the same string. -
When both
and are 1 : in this case, the next value of should be 0, due to the string change.
These two scenarios are verified with the following constraint,
-
-
The
register is constant within each string. It must therefore satisfy this relationship, -
Checking that
and coincide at the first state of each string, use the constraint,where
is a committed column and it is such that .In fact,
is a shifted version of , which is used to ensure that, when starting a string (and therefore, ), then . -
Let us now specify the relations that satisfy the
register. As commented on before, is constant within each string. Hence, if and only if .The constraint for the
register is therefore,Note that going from one string to the next, the values of the
register form an increasing sequence (increasing by steps of 1 from one string to the next).However, since the polynomials utilized in this scheme are all cyclic (due to evaluations on roots of unity), there is a need to ensure that the
register resets to whenever the reading returns to the first row.For this purpose, a register called
is added. And is 1, if is 1 and the string, that the last block belongs to, is not the last one.Similarly, another register called
is added, and it is defined such thatSee below table for a comparison between the
and non- registers. -
In order to grapple with the increasing but cyclic sequence of
, the following constraint is used, -
Now, checking whether
is 1 if and only if is 0, is done reversely by first committing the column , the inverse of , and computing as: $$ \mathtt{remIsZero = 1 − rem · remInv} \tag{Eqn.11} $$And then, as usual, check the relation,
. -
Next is the
register which stores the input byte if and only if the current row does not corresponding to the padding. is computed from the , and registers. This ensures loading the padding bytes at their correct positions.In fact, this register will be used in the Plookup of the next state machine.
Observe that
can be computed asLet us carefully analyze this equation:
- If
, and are all , then equals . As mentioned above, at the non-padding rows, we just store . - If
is 1, is 0 and is 0, then equals , which is the first byte of the padding. - If
is 0, is 1 and is 0, then equals , which are the intermediate bytes of the padding. - If
is 0, is 1 and is 1, then equals , which is the last byte of the padding. - Lastly, we should consider the special case where the padding is only the byte
. In this case, is 0 at the last row meanwhile and are both 1. Therefore, equals , as we wanted.
See below table for all the above cases when computing
. - If
Hash output check¶
We have thus far only dealt with correct inputs of the padding. Now, we will introduce several columns to deal with the hash itself.
Since KECCAK-256’s output is 256 bits long, we use eight (8) registers each of 32 bits to store the result of the hash function. Denote these registers by
As columns, these
A combination of other KECCAK-related state machines will be used to verify correctness of the output hash. The reason for this is that, for compatibility with the KECCAK-256 hash function, we first need to describe all inputs in terms of bits.
Length and read check operations¶
In this section we give a description of the design that will allow us to verify the lengths of input strings and the read operations.
Length check¶
The length check is almost trivial because the
Suppose one is given a length
The strategy is to use Plookup to verify that the given table contains a row with
The PIL code for this Plookup looks like this:
where
Read check¶
Checking reads requires more columns to be introduced. Recall that, given given three parameters; the address
It takes 8 registers, each of 32-bits, to store the output of the read operation. Let us denote these registers by:
Further 8 factor polynomials
Before looking into the roles of the
-
: A register that specifies the number bytes to be read. It is a number register, containing numbers between 1 and 32, and it remains constant in each of the readings we want to perform. -
: A decreasing sequence of values, starting from the length value of the read minus 1 (i.e., ) and ends at 0, for each read. This is important for positioning each byte at the correct power of 2. -
: This is a computed column, using the same inverse technique as before. It records instances when the register is 0. First, the register is committed, allowing to be expressed as $$ \mathtt{crLatch = 1 − crOffset \cdot crOffsetInv} \tag{Eqn.14} $$ which satisfies the constraint, .
Example: Read check using columns¶
Suppose we want to read the first 10 bytes of the address 6, and once finished, read the next two bytes of the same address thereafter. Here is the correct way of constructing the polynomials;
Note that the horizontal line is used to separate every 8 bytes, which are the exact number of bytes stored in a
Several observations
Firstly, note that
Secondly, if we want to express the two read values as an array, the output will be
$$
\mathtt{[0xff00111a6e6e1f02ef10, 0x7355]}
$$
where the first element of the array has 10 bytes, whilst the second has 2 bytes. This coincides with each of the
Thirdly, observe the relationships that these registers;
The registers
Let us now turn to the registers
Consider the previous example, where the first element of the ‘output’ array is
$$
\mathtt{0xff00111a6e6e1f02ef10}
$$
Obviously, this element does not fit in a 32-bit register. So it needs to be split over three registers;
Henceforth, we should reflect this in our state machine using the columns
Observe that registers
The registers
Lastly, observe that the rows corresponding to
Constraints pertaining to
The idea here is to compute a column
and then verify that the next state
So far so good.
Now observe that the tuple
Read operation and Main state machine¶
There are a few subtle details concerning the connection between the Read operation and the Main SM to be taken into account.
For instance, in order to avoid problems, reading from the last (probably smaller) block is not allowed. A constant column
Please note that these quoted code lines will differ with different versions. They should only be taken as examples for a particular code version.
Note that
Let us make use of the following example to clearly see why this linear combination works.
Example: Main SM and Padding-KK connection¶
Suppose we want to read bytes 2, 3 and 4 of the string
Observe that, among the rows with