Skip to content

Latest commit

 

History

History
1492 lines (1227 loc) · 59 KB

jq255.md

File metadata and controls

1492 lines (1227 loc) · 59 KB

The jq255e and jq255s Groups, Key Exchange and Signatures

  • Version: 0.0.1
  • Author: Thomas Pornin

Introduction

This document specifies the jq255e and jq255s groups. These are prime order groups based on elliptic curves, with characteristics suitable for building cryptographic protocols:

  • The order of each group is a prime integer close to 2^254, hence formally offering "127-bit security" against discrete logarithm attacks.

  • Each group element (including the neutral element) can be encoded as a 32-byte sequence. The encoding is canonical: a given element can yield only a single possible sequence of 32 bytes, and the decoding process verifies that the encoding was done properly. Decoding cannot yield an out-of-group element, so that invalid curve attacks are not possible, and there are no cofactor issues.

  • Operations on group elements can be implemented with complete formulas that have no exceptional case, and are amenable to constant-time implementations.

  • Operations are efficient, both on large powerful CPUs and on small microcontrollers. The jq255e group is furthermore defined over a curve that admits an easily computed non-trivial endomorphism, that can be leveraged to further speed up multiplication of group elements by scalars.

On top of these groups, the following primitives are specified by this document:

  • A digital signature scheme. Signatures have size 48 bytes, which is shorter than the more usual 64-byte signatures associated with elliptic curves of similar security (e.g. EdDSA signatures on Curve25519, or ECDSA over P-256 or secp256k1 curves), but still provides the security level expected from the curve. Signature verification is also faster than with the common 64-byte signatures.

  • A key exchange protocol (Diffie-Hellman).

  • Map-to-group and hash-to-group procedures, that allow one-way mapping of arbitrary data into group elements with purely constant-time implementations.

The underlying curves are specific double-odd curves (curves whose order is twice an odd integer) interpreted with a coordinate system such that the curve equation is a Jacobi quartic, as described in the double-odd Jacobi quartic whitepaper. More information on double-odd curves can be found on the double-odd elliptic curves Web site.

Conventions and Notation Used in this Document

  • Uppercase key words such as "MUST" or "MUST NOT" are to be interpreted as described in RFC 2119.

  • +, - and * have their usual respective meanings of addition, subtraction and multiplication when applied to elements of a ring (e.g. integers, or a finite field). / denotes division, i.e. multiplication of the left operand by an inverse of the right operand. + and - are also be applied to group elements to embody the group law.

  • x^y denotes exponentiation of x by the integer y. This is applicable to any type of x for which a multiplication operation is defined (e.g. integers or finite field elements).

  • Each group is defined over the finite field GF(q) consisting of integers modulo q = 2^255 - mq for some small integer mq such that q is prime. For both jq255e and jq255s, mq < 2^15 so that implementations on small microcontrollers remain efficient.

  • The sign of a field element x, denoted sgn(x), is defined as the least significant bit of the binary representation of x as an integer in the 0 to q-1 range. Thus, for any x:

    • sgn(x) is equal to 0 or 1.
    • If x = 0, then sgn(x) = sgn(-x) = 0.
    • If x != 0, then sgn(x) = 1 - sgn(-x).

    A value x is said to be negative if sgn(x) = 1, or non-negative if sgn(x) = 0.

  • The elliptic curve is the set of points (e, u), with both coordinates being elements of GF(q) that fulfill the curve equation e^2 = (a^2 - 4*b)*u^4 - 2*a*u^2 + 1, for two constants a and b that are part of the curve definition. The curve neutral point (the "point-at-infinity" which is not actually "at infinity" in this coordinate system) is I = (1,0). The curve also contains a single point of order 2 denoted N = (-1,0).

  • The underlying elliptic curve contains exactly n = 2*r points, with r being a prime integer. The group itself (jq255e or jq255s) has order r. For both groups, r is close to 2^254.

  • For ease of notation, we define a' = -2*a and b' = a^2 - 4*b. With these constants, the curve equation becomes: e^2 = b'*u^4 + a'*u^2 + 1

  • A group element is defined as a pair { P, P+N } for any curve point P. The points P and P+N are the two possible representants of the group element. The group law is denoted additively: for any two group elements, their sum is computed by simply adding together their representant points over the curve; since N has order 2, it does not matter which representants are used. The encoding process uses, for each group element, a specific representant, and automatically computes it if given the other representant, so that the encoded format is canonical. Similarly, group element equality abstracts away the representants.

  • Integers modulo r are called scalars. Since r is prime, the scalars constitute a finite field. Group elements can be multiplied by scalars: for a scalar s and a group element P, the product is s*P = P + P + ... + P, where the group law is applied si - 1 times (with si being the integer representant of s, in the 0 to r-1 range).

The jq255e group parameters are the following:

jq255e:

  q = 2^255 - 18651
  a = 0
  b = -2
  a' = -2*a      = 0
  b' = a^2 - 4*b = 8
  equation: e^2 = 8*u^4 + 1
  group order: r = 2^254 - 131528281291764213006042413802501683931
  curve order: n = 2*r
  conventional generator: G = (eg, ug)
     eg = -3
     ug = -1

For jq255s, the group parameters are as follows:

jq255s:

  q = 2^255 - 3957
  a = -1
  b = 1/2 = 2^254 - 1978
  a' = -2*a      = 2
  b' = a^2 - 4*b = -1
  equation: e^2 = -u^4 + 2*u^2 + 1
  group order: r = 2^254 + 56904135270672826811114353017034461895
  curve order: n = 2*r
  conventional generator: G = (eg, ug)
     eg = 6929650852805837546485348833751579670837850621479164143703164723313568683024
     ug = 3

For the group order r, we define UNIFORM_SIZE(r) as the minimum size (in bytes) of a uniformly random input string such that interpreting that string as an integer (using the unsigned little-endian convention), and then reducing it modulo r, yields a value which is indistinguishable from a uniformly selected integer modulo r, up to the security level that is normally expected from a group of order r. In practice, we use the following formulas (with log2() being the logarithm in base 2 and abs() the absolute value):

lr = round(log2(r))
usl = max(lr, log2(abs(r - 2^lr)) + lr/2)
UNIFORM_SIZE(r) = ceil(usl/8)

For both jq255e and jq255s, we obtain that UNIFORM_SIZE(r) = 32. In all generality, for groups with order less than 2^256, UNIFORM_SIZE() may range up to 48, but the respective orders of jq255e and jq255s are sufficiently close to a power of two (namely 2^254) that 32 bytes are enough to ensure the required indistinguishability from uniform selection.

A note on constant-time implementations: The expression "constant-time" designates code such that the execution time and the memory access pattern (i.e. the addresses of memory slots that are accessed throughout execution) do not vary in a way that can be correlated with secret data. Some operations described in this document with conditional execution (if clauses in pseudocode) are to be executed depending on a secret Boolean condition, and thus SHOULD in practice use a constant-time implementation strategy: the clause contents should be computed systematically, then their results either kept or discarded through a constant-time selection primitive that typically uses bitwise Boolean operations on machine words. In the text below, we will use the following to denote a constant-time selection of value v (if cond is true) and value u (if cond is false):

CT_SELECT(v IF cond ELSE u)

(This notation is imported from the Ristretto255 draft.)

Group Element Operations

In this section, practical procedures for computing point operations are presented using pseudocode in a syntax compatible with the Python language and the SageMath mathematics system. This syntax closely follows the abstract mathematical notation otherwise used in this document, with the following exceptions:

  • Exponentiation uses the ** operator (instead of ^).
  • Points with extended coordinates (E:Z:U:T) become tuples (E,Z,U,T).

The pseudocode assumes that the following functions are available:

  • GFq(j) converts the integer j to a field element by reducing j modulo q.

  • int(x) converts the field element x into an integer by using its unique representant in the 0 to q-1 range.

  • half(x) divides the field element x by two. It is usually implemented by a right-shift by 1 bit, then the conditional addition of the constant (q+1)/2 if the dropped bit was non-zero. Such an implementation is considered to be a "fast" operation (i.e. at least as fast as an addition in the field).

  • invert(x) returns the multiplicative inverse of x in the field. Zero has no multiplicative inverse, but it is sometimes convenient for constant-time implementations to return 0 as a (formal) inverse of 0; this is obtained naturally if inversion is implemented using Fermat's Little Theorem (i.e. raising x to the power q-2). This feature is not needed for the formulas described in this document.

  • sgn(x) returns the "sign" of x, as defined previously: the sign is the value of the least significant bit of int(x).

  • sqrt(x) returns either a square root of x (if x is indeed a square in the finite field), or False (if x is not a square in the finite field). Important: when x is a square, the returned square root is non-negative.

  • is_square(x) returns True if x is a square in the finite field, or False otherwise.

  • scalar_to_int(s) and int_to_scalar(j) compute the same operations as int() and GFq(), respectively, but for scalars, which are integers modulo the prime r.

  • secure_random(m), for an integer m >= 1, returns a sequence of m random bytes obtained from a cryptographically secure source, with a uniform probability distribution (e.g. /dev/urandom or the getrandom() system call on many Unix-like operating systems). This is used only for private key generation in the specification of high-level protocols.

If an addition, subtraction or multiplication uses as operands a field element and an integer (usually small), then the integer is first implicitly converted to a field element as if by using GFq(). In practice, the implementation may provide a dedicated multiply-by-small-integer function; on usual software platforms, such an operation is about as fast as an addition in the field.

The implementation of sqrt(x) depends on the used field. For jq255s, the field is such that q = 3 mod 4, which allows a simple square root implementation by computing the square root candidate z = x^((q+1)/4). For jq255e, the field is such that q = 5 mod 8, which can be handled with Atkin's algorithm:

  1. Compute: c = (2*x)^((q-5)/8)
  2. Compute: d = 2*x*c^2
  3. Compute the square root candidate: z = x*c*(d - 1)

The obtained candidate MUST be checked (by comparing z^2 with x and finding them equivalent). Moreover, if z is indeed a square root of x, then it MUST be normalized to a non-negative value (i.e. if sgn(z) = 1, then z must be replaced with -z). The use of this fixed convention for square roots is needed for proper interoperability of the map-to-group and hash-to-group operations. If the source data is secret, then these operations should be implemented in constant-time; notably, the selection of the non-negative root should use systematic evaluation of -z, and a constant-time selection, as in: CT_SELECT(-z IF sgn(z) = 1 ELSE z).

is_square(x) can be implemented as a simple wrapper around sqrt(x), though faster implementations are possible with the Legendre/Jacobi symbol. A fast, constant-time implementation of that operation can be obtained as a straightforward derivative of the optimized binary GCD; on some small microcontrollers, this is_square() implementation can be about six times faster than the full square root computation.

Affine Formulas

We recall here the affine formulas applicable to points on double-odd Jacobi quartic curves. Implementations are expected to use the extended representation described later on in this document; the affine formulas are potentially useful if some alternate point representations are explored.

For any point P = (e, u) on the curve, the point P+N (i.e. the other representant of the same group element) is P+N = (-e, -u). The opposite of the point P is -P = (e, -u). The identity (or neutral element) of the group is { I, N } (i.e. I = (1, 0) and N = (-1, 0) are the two representants of the group identity).

Let P1 = (e1, u1) and P2 = (e2, u2) be two points on the curve. Their sum P3 = P1 + P2 = (e3, u3) can be computed as follows:

     (1 + b'*u1^2*u2^2)*(e1*e2 + a'*u1*u2) + 2*b'*u1*u2*(u1^2 + u2^2)
e3 = ----------------------------------------------------------------
                          (1 - b'*u1^2*u2^2)^2

      e1*u2 + e2*u1
u3 = ----------------
     1 - b'*u1^2*u2^2

These formulas are complete: they work on all pairs of curve points, including the points I and N (that represent the identity). We can therefore use them to compute the group law, regardless of which representants are used for the group elements used as operands.

The Jacobi quartic curve e^2 = b'*u^4 + a'*u^2 + 1 is birationally equivalent to the curve with Weierstraß equation y^2 = x*(x^2 + a*x + b) with the following mappings (defined for all points excepts I and N):

x = (e + 1 - a*u^2)/(2*u^2)
y = x/u

e = (x^2 - b)/(x^2 + a*x + b)
u = x/y

On the Weierstraß curve, N has coordinates (0, 0), and I has no defined coordinates (it is then truly "at infinity"). The Weierstraß equation can then be converted to a short Weierstraß equation Y^2 = X^3 + A*X + B by using the following:

A = (3*b - a^2)/3
B = a*(2*a^2 - 9*b)/27
X = x + a/3
Y = y

Converting to Weierstraß coordinates and back is not required for implementing the jq255e and jq255s groups; such coordinates have furthermore special cases for the points I and N. These conversions may be useful in some specific contexts, e.g. to leverage preexisting generic implementations of (short) Weierstraß curves, if only for testing purposes.

Extended Coordinates

In order to reduce the number of field element divisions (which are usually quite expensive in software implementations, relative to additions and multiplications), the following internal representation is recommended: a point P = (e, u) is represented by the quadruplet (E:Z:U:T) of field elements, with the following rules:

  • Z != 0 (in all cases)
  • e = E/Z
  • u = U/Z
  • u^2 = T/Z (i.e. U^2 = T*Z)

These coordinates are called EZUT coordinates.

The points I and N, which represent the identity element in the group, have coordinates (Z:Z:0:0) and (-Z:Z:0:0), respectively, for any field element Z != 0. It does not matter, for correct computations, which of I or N is used to represent the identity; similarly, any non-zero Z can be used.

Extended affine coordinates are a special case of extended coordinates where Z = 1 (in which case, E = e, U = u and T = u^2). Such coordinates are appropriate for use in fixed precomputed tables of points, e.g. small multiples of the conventional generator; setting Z = 1 for such points reduces the table size in memory, and allows for some shortcuts in operations on these points.

Jacobian coordinates are used transiently in point-doubling formulas. They are hereafter called XWJ coordinates. They relate to the (x, y) coordinates of curve points when using the Weierstraß equation y^2 = x*(x^2 + a*x + b). For a point (x, y), its XWJ coordinates are a triplet (X:W:J) such that:

  • W != 0
  • x = W/(J^2)
  • w = y/x = W/J

The points I and N can be represented on XWJ coordinates with the triplets (W^2:W:0) and (0:W:0), respectively, for any W != 0. For all curve points other than I and N, the coordinate J is non-zero. In general, Jacobian coordinates do not lead to efficient and complete point addition formulas, but they offer fast and complete formulas for point doublings. Thus, a strategy for computing a sequence of several successive point doublings is to convert the source point from EZUT to XWJ coordinates, compute the doublings in XWJ, then convert the result back into EZUT coordinates. Such a strategy is used later in doubling formulas.

Encoding and Decoding (Group Elements)

All encodings specified in this document are expressed in terms of bytes. We use the term "byte" to denote what is, formally speaking, an octet (a group of eight bits). A byte has a numerical value in the 0 to 255 range. The least significant bit of the byte has weight 1, while the most significant bit has weight 128.

Encoding formats are fixed-size sequences of bytes. Within a sequence, the first byte has index 0, the next byte has index 1, and so on; thus, the last byte of a 32-byte sequence has index 31.

Field Elements

A field element x is encoded as a fixed 32-byte sequence using the unsigned little-endian convention:

def field_encode(x):
    return int(x).to_bytes(32, byteorder='little')

In other words:

  • The field element is considered as an integer z in the 0 to q-1 range.
  • For j = 0 to 31, output byte of index j is set to the integer value bb[j] = floor(z / (256^j)) mod 256.

Since q is slightly below 2^255, the last output byte (index 31) necessarily has value at most 127, which means that the most significant bit of that byte is necessarily equal to 0.

When decoding, the following rules MUST be observed:

  • The decode MUST reject as invalid any input sequence whose length is not exactly 32 bytes.
  • The integer value z MUST be recomputed by taking into account all input bytes completely. In particular, the most significant bit of the last byte MUST NOT be ignored.
  • The recomputed z MUST be checked to be lower than q. Any implicit or explicit reduction modulo q is incorrect. If z >= q, then the decoder MUST reject the input.

The following function implements these rules:

def field_decode(bb):
    if len(bb) != 32:
        return False
    z = int.from_bytes(bb, byteorder='little')
    if z >= q:
        return False
    return GFq(z)

Group Elements

A group element, represented by the point P = (E:Z:U:T), can be encoded unambiguously into a field element v as follows:

def group_to_field(P):
    (E, Z, U, T) = P
    iZ = invert(Z)
    v = iZ*U
    if sgn(iZ*E) == 1:
        v = -v
    return v

Here, iZ*U and iZ*E are the u and e coordinates, respectively. The group element is really encoded as the u coordinate of the representant point whose e coordinate is non-negative. If the point to encode is secret (e.g. it is the "shared secret" from a Diffie-Hellman key exchange process) then the conditional negation of v SHOULD be implemented in constant-time: v = CT_SELECT(-v IF sgn(iZ*E) == 1 ELSE v).

The decoding process rebuilds either the original P, or the other representant P+N of the same group element:

def field_to_group(v):
    t = v**2
    delta = (a**2 - 4*b)*(t**2) - 2*a*t + 1
    E = sqrt(delta)       # Note: sqrt() returns the non-negative root
    if E is False:
        return False
    Z = GFq(1)
    U = v
    T = t
    return (E, Z, U, T)

The field_to_group(v) function returns a representant of the (unique) group element that encodes to the value v, or False if there is no such element. Take care that sqrt() MUST return the non-negative root.

Note: The decoding process naturally returns a point in (extended) affine coordinates (Z = 1).

The group_to_field() and field_to_group() functions implements encoding/decoding of group elements to/from field elements. Serialization to bytes is then obtained by combining these operations with the serialization rules of field elements, as shown below:

def group_encode(P):
    return field_encode(group_to_field(P))

def group_decode(bb):
    v = field_decode(bb)
    if v is False:
        return False
    return field_to_group(v)

Note: When decoding, both the decoding of the bytes to a field element, and the decoding of the field element to a group element, may fail (i.e. return False). If a strictly constant-time process is required, that does not even leak whether the process succeeded or not, or the cause of the failure, then the test on the output of the field_decode() call can be delayed until the end of the function, invoking field_to_group() on some conventional value in that case.

Point Addition

The sum of two group elements is computed by adding the respective representants of the elements on the curve. In EZUT coordinates, the points P1 and P2 are added as follows:

def point_add(P1, P2):
    (E1, Z1, U1, T1) = P1
    (E2, Z2, U2, T2) = P2
    ap = -2*a              # The constant a'
    bp = a^2 - 4*b         # The constant b'
    e1e2 = E1*E2
    z1z2 = Z1*Z2
    u1u2 = U1*U2
    t1t2 = T1*T2
    tz = (Z1 + T1)*(Z2 + T2) - z1z2 - t1t2
    eu = (E1 + U1)*(E2 + U2) - e1e2 - u1u2
    hd = z1z2 - bp*t1t2
    E3 = (z1z2 + bp*t1t2)*(e1e2 + ap*u1u2) + 2*bp*u1u2*tz
    Z3 = hd**2
    T3 = eu**2
    U3 = half((hd + eu)**2 - Z3 - T3)   # Alternatively: U3 = hd*eu
    return (E3, Z3, U3, T3)

If the point P2 is in extended affine coordinates, i.e. such that Z2 is statically known to be equal to 1, then the multiplication Z1*Z2 can be avoided, and the computation of tz simplifies to Z1*T2 + T1. This specific case is known as a mixed addition and is typically encountered when multiplying the conventional generator by a scalar, using precomputed tables of small multiples of the generator.

Point Negation and Subtraction

Given the point P = (E:Z:U:T), its opposite is obtained as -P = (E:Z:-U:T), i.e. simply negating the U coordinate in the field. The subtraction of P2 from P1 can be obtained by adding -P2 to P1.

def point_neg(P):
    (E, Z, U, T) = P
    return (E, Z, -U, T)

def point_sub(P1, P2):
    return point_add(P1, point_neg(P2))

Point Doublings

Suppose that we have k >= 1 successive doublings to perform on an input point P (i.e. we want (2^k)*P). The general strategy is the following:

  • Perform the first doubling with formulas that receive P in EZUT coordinates, and output 2*P in XWJ coordinates.
  • Perform the other k-1 doublings in XWJ coordinates.
  • Convert the final result from XWJ to EZUT.

Depending on the used curve, it can be beneficial to combine some of these doublings with a switch to the other representant for either the input or the output point (i.e. computing 2*(P+N), (2*P)+N or even (2*(P+N))+N instead of 2*P); this does not change the output as a group element, since we are free to use either of the two representants, but it can reduce the cost when working in XWJ coordinates.

Two distinct functions, described below, are used for computing sequences of doublings on jq255e and jq255s, each leveraging the specific definition of the curve to achieve better performance. The subsequent pseudocode will assume that the correct implementation for the current curve is accessible under the generic function name point_xdouble().

On jq255e

Given a jq255e element represented by point P, and integer k >= 1, the following function computes k successive doublings:

def point_xdouble_jq255e(P, k):
    (E, Z, U, T) = P

    # P (EZUT) -> 2*P (XWJ)
    s = E**2
    X = s**2
    W = 2*(Z**2) - s
    J = 2*(E*U)

    # k-1 times P (XWJ) -> 2*P (XWJ)
    for _ in range(1, k):
        s1 = W**2
        s2 = s1 - 2*X
        s3 = s2**2
        X = s3**2
        J = J*((W + s2)**2 - s1 - s3)   # Alternatively: J = 2*J*W*s2
        W = s3 - 2*(s1**2)

    # Conversion XWJ -> EZUT
    Z = W**2
    T = J**2
    U = half((W + J)**2 - Z - T)        # Alternatively: U = J*W
    E = 2*X - Z
    return (E, Z, U, T)

On jq255s

Given a jq255s element represented by point P, and integer k >= 1, the following function computes k successive doublings:

def point_xdouble_jq255s(P, k):
    (E, Z, U, T) = P

    # P (EZUT) -> 2*P+N (XWJ)
    s = U**2
    X = 8*(s**2)
    W = 2*s - (T + Z)**2
    J = 2*(E*U)

    # k-1 times P (XWJ) -> 2*P+N (XWJ)
    for _ in range(1, k):
        s1 = W*J
        s2 = s1**2
        s3 = (W + J)**2 - 2*s1
        J = 2*s1*(2*X - s3)
        X = 8*(s2**2)
        W = 2*s2 - s3**2

    # Conversion XWJ -> EZUT
    Z = W**2
    T = J**2
    U = half((W + J)**2 - Z - T)        # Alternatively: U = J*W
    E = 2*X - Z - T
    return (E, Z, U, T)

Element Equality

The following function determines whether two points P1 and P2 represent the same group element:

def same_element(P1, P2):
    (E1, Z1, U1, T1) = P1
    (E2, Z2, U2, T2) = P2
    return U1*E2 == U2*E1

This function handles all cases properly, including when one or both of the operands represent the group identity.

A consequence is that a point P = (E:Z:U:T) represents the identity element in the group if and only if U = 0.

Field to Point Map

We define two field-to-point maps (for jq255e and jq255s, respectively) which turn a given field element into a group element (really, a curve point) such that the discrete logarithm of the output relative to the conventional generator is unknown. These maps can be implemented in an efficient and safe (constant-time) way.

The two jq255e and jq255s curves use distinct implementations, shown below. The subsequent pseudocode will assume that the correct implementation for the current curve is available under the name map_to_jq255().

On jq255e

This map uses the constant sqrtm1 = sqrt(-1) on the field. Since there are two such square roots modulo q, we conventionally use the non-negative root, i.e.:

sqrtm1 = 7656063742463026568679823572395325799027601838558345258426535816504372595438

The field element f is mapped to a point in EZUT coordinates as follows:

def map_to_jq255e(f):
    if f == GFq(0):
        return (GFq(1), GFq(1), GFq(0), GFq(0))   # 0 is mapped to the identity
    x1 = 4*(f**2) - 7
    x2 = (4*(f**2) + 7)*sqrtm1   # sqrtm1 is the non-negative square root of -1
    x0 = 4*f
    z1 = 64*(f**7) + 176*(f**5) - 308*(f**3) - 343*f
    z2 = -sqrtm1*(64*(f**7) - 176*(f**5) - 308*(f**3) + 343*f)
    y0 = 8*(f**2)
    if is_square(z1):
        (x, xx, y, yy) = (x1, x0, sqrt(z1), y0)
    elif is_square(z2):
        (x, xx, y, yy) = (x2, x0, sqrt(z2), y0)
    else:
        (x, xx, y, yy) = (x1*x2, x0**2, sqrt(z1*z2), y0**2)
    (u, uu) = (x*yy, xx*y)
    (X, XX) = (-8*(u**2), uu**2)
    (U, UU) = (2*x*xx*uu, u*(x**2 - 8*(xx**2)))
    (E, EE) = (X**2 + 2*(XX**2), X**2 - 2*(XX**2))
    return (E*(UU**2), EE*(UU**2), U*UU*EE, (U**2)*EE)

A constant-time implementation can be achieved by delaying the test f == 0 until the end of the computation, and performing a constant-time conditional overwrite of the result with (GFq(1), GFq(1), GFq(0), GFq(0)) in that case. Similarly, is_square(z1) and is_square(z2) would be systematically computed, and constant-time conditional move operations performed to get the appropriate operand to the (single) sqrt() operation, and also set the x, xx, y and yy variables appropriately.

On jq255s

On jq255s, we use the Elligator2 map. The field element f is mapped to a point as follows:

def map_to_jq255s(f):
    if f == GFq(1) or f == GFq(-1):
        return (GFq(1), GFq(1), GFq(0), GFq(0))
    z1 = -2*(f**6) + 14*(f**4) - 14*(f**2) + 2
    z2 = -z1*(f**2)
    xx = 1 - f**2
    if is_square(z1):
        (x, y) = (GFq(-2), sqrt(z1))
    else:
        (x, y) = (2*(f**2), -sqrt(z2))
    if y == GFq(0):
        return (GFq(1), GFq(1), GFq(0), GFq(0))
    (u, uu) = (x*xx, y)
    (X, XX) = (2*(u**2), uu**2)
    (U, UU) = (2*uu, x**2 + xx**2)
    (s1, s2) = (X*(2*X - XX), XX*(X - XX))
    (E, EE) = (s1 + s2, s1 - s2)
    return (E*(UU**2), EE*(UU**2), U*UU*EE, (U**2)*EE)

As with jq255e, a constant-time implementation is achieved by delaying the handling of exceptional cases until the end of the function, and using constant-time conditional moves to get the right operand for the sqrt() operation and set the x and y values.

Point Multiplication by a Scalar

A group element P can be multiplied by a scalar s through the usual double-and-add and variant algorithms that leverage the point addition and doublings exposed above.

An efficient and constant-time implementation can be obtained with a window optimization and recoding of the scalar:

  • A given window width w is chosen; w is expressed in bits. Usual values are 4 or 5 (the optimal value depends on the relative performance of the various operations in a given implementation; w = 5 is typically best, though w = 4 may be preferred in small microcontrollers with severe constraints on RAM usage).

  • The scalar is converted to an integer in base 2^w, with signed digits in the -(2^(w-1)-1) to +2^(w-1) range (nclusive). This is easily done as follows (a process known as Booth recoding):

    1. Split s into w-bit chunks, each with an unsigned integer value between 0 and (2^w)-1.
    2. Modify each digit in least-to-most significant order: if a digit has a value greater than 2^(w-1), subtract 2^w from it, and add 1 to the next digit.

    Since r is very close to 2^254, and assuming that the source scalar was properly reduced (i.e. it was converted to an integer in the 0 to r-1 range), it can be shown that Booth recoding yields at most ceil(255/w) digits, and the most significant digit is necessarily non-negative.

  • Compute a window (table) of small multiples of the source point P, containing j*P for j = 1 to 2^(w-1).

  • Perform a double-and-add algorithm over the signed digits in most-to-least significant order:

    1. Set an accumulator element R corresponding to the most significant digit, i.e. to the group identity if the digit is zero, or by looking up the value in the window otherwise.

    2. Process all other digits in most-to-least significant order. For each digit d, first multiply R by 2^w (using the point_xdouble() function); then, look up point Q = d*P (if d = 0, then Q is the group identity; if d > 0, then the point Q is found in the window; if d < 0, find (-d)*P in the window and negate it to get Q) and add Q to R.

  • R contains the result (s*P) once all digits have been processed.

def point_mul(P, s):
    # Build window: win[i - 1] = i*P  (with i = 1 to 16)
    win = []
    win.append(P)
    for i in range(2, 16, 2):
        P2 = point_xdouble(win[(i >> 1) - 1], 1)
        P3 = point_add(P, P2)
        win.append(P2)
        win.append(P3)
    win.append(point_add(P, win[14]))

    # Booth recoding of the scalar with a 5-bit window
    j = int(s)
    sd = []
    cc = 0
    for i in range(0, 51):
        nd = (j & 31) + cc
        j >>= 5
        if nd > 16:
            sd.append(nd - 32)
            cc = 1
        else:
            sd.append(nd)
            cc = 0

    # Point multiplication itself
    if sd[50] == 0:
        R = (GFq(1), GFq(1), GFq(0), GFq(0))
    else:
        R = win[sd[50] - 1]
    for i in reversed(range(0, 50)):
        if sd[i] > 0:
            Q = win[sd[i] - 1]
        elif sd[i] < 0:
            Q = point_neg(win[(-sd[i]) - 1])
        else:
            Q = (GFq(1), GFq(1), GFq(0), GFq(0))
        R = point_xdouble(R, 5)
        R = point_add(R, Q)
    return R

This process becomes constant-time if all the steps are constant-time. In particular, the lookup process from the window must then scan all elements from the window systematically, and select the correct one with constant-time conditional move operations that use bitwise Boolean operations. With a primitive such as CT_SELECT(), the lookup of point Q in the loop can be expressed as follows:

        Q = (GFq(1), GFq(1), GFq(0), GFq(0))
        abs_digit = CT_SELECT(-sd[i] IF sd[i] < 0 ELSE sd[i])
        for j in range(0, 16):
            Qj = win[j]
            Q = CT_SELECT(Qj IF abs_digit == j + 1 ELSE Q)
        Qn = point_neg(Q)
        Q = CT_SELECT(Qn IF sd[i] < 0 ELSE Q)

Scalar Split and Curve Endomorphism on jq255e

The curve over which the jq255e group is defined is a special type of curve known as a GLV curve: there exists an easily computed non-trivial endomorphism on the curve (which is thus a group homomorphism on the jq255e group). This endomorphism can be leveraged to speed up the generic point multiplication by a scalar (cost of the operation is typically reduced by about 25 to 30%).

The endomorphism is expressed as the function zeta():

zeta(e, u) = (e, u*sqrtm1)

with sqrtm1 being the non-negative square root of -1 in the field:

sqrtm1 = 7656063742463026568679823572395325799027601838558345258426535816504372595438

This is the same value sqrtm1 which was already used in the specification of the map_to_jq255e() function. In (x, y) coordinates on the corresponding Weierstraß equation, the zeta() function returns (-x, sqrtm1*y).

In EZUT coordinates, zeta() is computed as zeta(E:Z:U:T) = (E:Z:sqrtm1*U:-T). It can be shown that applying zeta() to a group element yields exactly the same group element as multiplying by a given scalar mu, which happens to be a square root of -1 in the scalar field (integers modulo r). With the value of sqrtm1 defined above, the value of mu is:

mu = 23076176648693837106500022901799924463072024427516564762134831823525232195341

Note: The value mu is here defined for group elements. When applied to a curve point, we have zeta(P) = mu*P + N. Since we work on the prime order group jq255e and do not care about which representant of the output group element we obtain, we can ignore that extra +N.

Any given scalar s can be split into two half-width signed integers s0 and s1, such that s = s0 + mu*s1, and the absolute values of s0 and s1 are both at most equal to sqrt(r). In particular, they both fit on 128 bits each, as signed integers, including the sign bit. The splitting process can be done efficiently with operations on plain integers; it involves some rounded divisions, but these can be simplified into only multiplications and shifts thanks to the specific value of the group order r. For details, refer to the double-odd elliptic curves whitepaper (section 6.2). Once s has been split into s0 and s1, the double-and-add algorithm described in the previous section can be modified with two windows, the second window containing points zeta(j*P), and only half the number of point doublings.

High-Level Protocols

High-level protocols are signatures, key exchange and hash-to-group operations. We specify here how these operations are performed, and in particular all encoding rules and handling of invalid inputs.

The primitives below use the BLAKE2 hash function, specifically its BLAKE2s variant with a 32-byte (256-bit) output. The pseudocode expects the following API (which is compatible with Python's standard hashlib):

  • sh = hashlib.blake2s() creates an object sh that incarnates a new BLAKE2s computation.

  • The data to hash is provided in one or several chunks of arbitrary length. Each chunk is a sequence of bytes.

  • sh.digest() finalizes the computation and returns the 32-byte BLAKE2s output.

Encoding and Decoding (Scalars and Keys)

Encoding rules for group elements were detailed previously (functions group_encode() and group_decode()). High-level operations use specific encoding rules for scalars, private keys, and public keys.

Scalars

Scalars are integers modulo r. They follow the same rules as field elements, except that they work with the modulus r instead of q:

def scalar_encode(x):
    return scalar_to_int(x).to_bytes(32, byteorder='little')

def scalar_decode(bb):
    if len(bb) != 32:
        return False
    z = int.from_bytes(bb, byteorder='little')
    if z >= r:
        return False
    return int_to_scalar(z)

For jq255e, r is slightly below 2^254, so the two most significant bits of the last byte of the encoding are always zero. As with the decoding of field elements, the scalar decoder MUST NOT ignore either or both of these two bits. For jq255s, only the most significant bit is guaranteed to be zero, since r, for that group, is slightly above 2^254.

Private Keys

A private key is a secret non-zero scalar. Private keys are encoded as scalars, with the additional provision that, upon decoding, a key of value zero is rejected.

def private_key_encode(sk):
    return scalar_encode(sk)

def private_key_decode(bb):
    sk = scalar_decode(bb)
    if sk is False:
        return False
    if sk == int_to_scalar(0):
        return False
    return sk

To generate a private key, a non-zero scalar should be chosen from a cryptographically secure source. If the secure_random(m) function returns m random bytes with uniform probability, then the following function produces a new private key:

def private_key_generate():
    while True:
        kb = secure_random(32)     # 32 = UNIFORM_SIZE(r)
        sk = int_to_scalar(int.from_bytes(kb, byteorder='little'))
        if sk != int_to_scalar(0):
            return sk

Note: This function generates a random 256-bit integer, then reduces it modulo r (through the int_to_scalar() function). The parameter to the call to secure_random() is really UNIFORM_SIZE(r), which is equal to 32 for both jq255e and jq255s.

The private_key_generate() function is organized as a loop so that the returned private key is guaranteed not to be zero. However, the probability that a zero private key value is obtained is negligible; hence, only one iteration of the loop will be used in practice.

Public Keys

A public key is a non-identity group element. The public key Q corresponds to the private key sk such that Q = sk*G, with G being the conventional generator. A public key is decoded and encoded as a group element, with the additional provision that, upon decoding, the identity element is rejected.

def public_key_encode(Q):
    return group_encode(Q)

def public_key_decode(bb):
    Q = group_decode(bb)
    if Q is False:
        return False
    (E, Z, U, T) = Q
    if U == GFq(0):
        return False
    return Q

The public key corresponding to the private key sk can be computed by invoking point_mul(G, sk), with G being the conventional generator element.

Signatures

A signature is computed over a given input message, using a private key; the signature value has size exactly 48 bytes. The correctness of the signature value over a given message can be verified against the public key corresponding to the private key used for signing.

Internally, a Schnorr signature process is used; the challenge value is computed with the BLAKE2 hash function. Specifically, the BLAKE2s function is used, with a 32-byte output (as will be described below, that 32-byte output is then truncated to 16 bytes, but the hash output length 32 is still used in the BLAKE2 parameter block).

In this section, we use the following notations:

  • The signer's private key is sk.

  • The signer's public key is the group element Q = sk*G.

Input Message

The input message can be raw (an arbitrary sequence of bytes) or pre-hashed (a hash value). In all operations below, we use the prepared message M defined as follows:

  • If the input is raw, then M consists of a single byte of value 0x52, followed by the input bytes.

  • If the input is a hash value hv, then M is obtained as the concatenation of, in that order:

    • A single byte of value 0x48.

    • The symbolic name of the hash function that was used to produce hv, encoded in ASCII.

    • A single byte of value 0x00 (which terminates the encoded hash function name).

    • The hash value hv.

The following symbolic hash function names are defined:

Hash function symbolic name
SHA-256 sha256
SHA-384 sha384
SHA-512 sha512
SHA-512/256 sha512256
SHA3-256 sha3256
SHA3-384 sha3384
SHA3-512 sha3512
BLAKE2s blake2s
BLAKE2b blake2b
BLAKE3 blake3

In the table above, "BLAKE2s" and "BLAKE2b" are to be understood as "BLAKE2s with a 32-byte (256-bit) output" and "BLAKE2b with a 64-byte (512-bit) output", respectively. In general, the symbolic name of a hash function is obtained by converting the hash function human-readable name to lowercase and removing all characters which are neither letters or digits.

def prepare_message_raw(msg):
    return b'\x52' + msg

def prepare_message_prehashed(hv, hashname):
    return b'\x48' + hashname.encode() + b'\x00' + hv

The recommended usage is to apply jq255e and jq255s signatures on input messages pre-hashed with BLAKE2s with a 32-byte (256-bit) output. The signature scheme shall be designated, formally, as follows:

  • jq255e and jq255s: signatures are computed over the input message pre-hashed with BLAKE2s-256 (i.e. BLAKE2s with a 32-byte output).

  • jq255e-raw and jq255s-raw: signatures are computed over the raw input message, with no pre-hashing.

  • jq255e-hashname and jq255s-hashname: signatures are computed over the input message pre-hashed with the hash function whose symbolic name is "hashname". Thus, "jq255e-sha256" means "signature with the jq255e group, computed over an input message pre-hashed with SHA-256".

Thus, "jq255e", as a signature scheme name, is equivalent to "jq255e-blake2s".

The following remarks apply:

  • Regardless of the hash function used to pre-hash the message (if any), the internal hash function used to compute the challenge is always BLAKE2s. Within the scope of this specification, the use of BLAKE2s internally is not configurable.

  • Using a raw input conceptually provides additional protection against collision attacks on weak hash functions. On the other hand, raw inputs are not compatible with some kinds of streamed processing, forcing either the sender or the recipient of a signed message to buffer the entirety of the signed data. This is why pre-hashing with a properly collision-resistant hash function is recommended in general.

Signature Generation

A signature is computed with the following general process:

  • A per-signature nonce k is produced. This is a secret scalar that is chosen with uniform probability, and used only for a single signature.

  • A commitment is computed as the group element R = k*G.

  • A challenge is computed by hashing together the commitment R, the public key Q, and the prepared message M, yielding a value c.

  • The response is the scalar s = k + c*sk (interpreting c as an integer which is then converted to a scalar).

The signature value itself consists of c and an encoding of s.

The signature generation process is described as the sign() function below, using the private key (sk), public key (Q) and prepared message (M) as input. That function calls the generate_nonce() function to generate k in a safe and deterministic way; that process uses an extra optional parameter called seed. More on this function is detailed later on.

def generate_nonce(sk, Q, M, seed):
    sh = hashlib.blake2s()   # BLAKE2s output size is at least UNIFORM_SIZE(r)
    sh.update(private_key_encode(sk))
    sh.update(public_key_encode(Q))
    sh.update(len(seed).to_bytes(8, byteorder='little'))
    sh.update(seed)
    sh.update(M)
    bb = sh.digest()
    return int_to_scalar(int.from_bytes(bb, byteorder='little'))

def make_challenge(R, Q, M):
    sh = hashlib.blake2s()
    sh.update(group_encode(R))
    sh.update(public_key_encode(Q))
    sh.update(M)
    return sh.digest()[0:16]  # 32-byte output is truncated to 16 bytes

def sign(sk, Q, M, seed):
    k = generate_nonce(sk, Q, M, seed)
    R = point_mul(G, k)
    c = make_challenge(R, Q, M)
    ic = int.from_bytes(c, byteorder='little')
    s = k + sk*int_to_scalar(ic)
    return c + scalar_encode(s)    # Concatenation of c and encoded s

The make_challenge() function returns a 16-byte value. The signature value is the concatenation of c and the encoded scalar s, for a total length of 48 bytes.

The per-signature nonce k is a scalar. The nonce may conceptually have value zero (though this is very improbable); thus, the commitment R may be the identity element. This is why R must be encoded with group_encode() and not public_key_encode() (public keys cannot be the identity element).

The generate_nonce() function incarnates a process known as derandomization. Generally speaking, the nonce k should be generated randomly and uniformly among all possible scalar values. Any bias in this selection process may lead to private key reconstruction attacks. If a cryptographically secure random generator is available, then k may be obtained by simply getting UNIFORM_SIZE(r) bytes out of that generator, interpreting them as an integer, then reducing that integer modulo r (UNIFORM_SIZE(r) = 32 for both jq255e and jq255s). However, in practical implementations, a properly secure random generator is not necessarily available at signature generation time; the generate_nonce() function shown above instead uses the private key, public key, message and optional extra seed to produce k in a safe way without requiring a random generator. Any cryptographically secure hash function with an output size of at least UNIFORM_SIZE(r) bytes can be used; in this specification, we use BLAKE2s, whose 32-byte output matches the value of UNIFORM_SIZE(r) for both jq255e and jq255s. For reproducibility of test vectors, it is RECOMMENDED to use BLAKE2s.

The seed parameter is optional. If set to a fixed value (e.g. the empty byte sequence, of length 0), then generate_nonce() is fully deterministic (the same signature value is always obtained for a given message and a given private key); this is safe. However, additional protection against some active attacks that introduce glitches in the computation through physical means can be obtained with some non-determinism. Such non-determinism is achieved by adding some varying bytes as seed. The security does not depend on the randomness of the seed; a randomly generated seed with a non-secure generator, or even a fully predictable seed, such as a sequence number or the current time, still yields a fully safe per-signature nonce.

The ability to provide the seed explicitly allows for testing the nonce generation process through test vectors. Since generate_nonce() encodes the seed length (in bytes) over 8 bytes, the seed may be an arbitrary sequence of up to 2^64-1 bytes.

Signature Verification

A signature sig is verified against a public key Q and a prepared message M with the following process:

def verify(Q, sig, M):
    if len(sig) != 48:
        return False
    c = sig[0:16]
    s = scalar_decode(sig[16:48])
    if s is False:
        return False
    cc = int_to_scalar(int.from_bytes(c, byteorder='little'))
    R = point_sub(point_mul(G, s), point_mul(Q, cc))   # R = s*G - c*Q
    c2 = make_challenge(R, Q, M)
    return c == c2       # Comparison of two 16-byte sequences

The make_challenge() function defined in the signature generation process is here used again during the verification. In a nutshell, the process boils down to recomputing the commitment R from the challenge c and response s (found in the signature), and checking that the correct challenge can be recomputed from R, Q and M.

The scalar s can be split into a low and a high halves (s0 and s1, respectively), with s = s0 + s1*(2^128), which allows rewriting the computation of R as:

R = s0*G + s1*((2^128)*G) + c*(-Q)

which is a linear combination of three points (G, (2^128)*G and Q) with coefficients that all fit on 128 bits each. In a double-and-add algorithm, the point doublings for the three multiplications can be mutualized, using Straus's method (also colloquially known as "Shamir's trick"). The points G and (2^128)*G are fixed constants; appropriate tables of small multiples of these points can be generated in advance (and can furthermore be normalized to affine points). This method yields a fast signature verification implementation, even faster than a single generic multiplication of a point by a full-width scalar. Additional speed improvements can be achieved with some non-constant-time techniques such as w-NAF scalar encodings, under the assumption that signature values and public keys are public data and thus can be processed without maintaining a strict constant-time discipline.

Signature Security

The signatures described above formally provide the same security level as the group itself, i.e. "127 bits" (arguably equivalent to "128 bits" in the same sense as, for instance, Curve25519 is argued to offer "128-bit security" with a subgroup order of "only" about 2^252 elements). The use of a half-width challenge c was already suggested by Schnorr in his seminal paper; the security properties and requirements on the hash function (here, BLAKE2s) were thoroughly analyzed by Neven, Smart and Warinschi.

It has been noted that while the construction with a half-width challenge is safe as a signature scheme, some advanced protocols try to also use signatures as commitments on a unique message: these protocols assume that it is infeasible for the signer to produce a public key Q, a signature sig and two distinct messages M0 and M1 such that verify(Q, sig, M0) and verify(Q, sig, M1) both return True. With the half-width challenges used in the signature scheme specified here, it is possible to produce such Q, sig, M0 and M1 with effort about 2^64 operations (i.e. a substantial but not infeasible computation). Formally speaking, the scheme achieves normal unforgeability but not semi-strong unforgeability. This does not contradict non-repudiation: in such a scenario, the signer truly signed both messages M0 and M1, and can truthfully be held accountable to both. However, this lack of message unicity can complicate and even formally weaken some protocols such as distributed threshold signature schemes with some potentially actively dishonest signers. For "normal" usages of signatures (e.g. for authentication or certification purposes), this is a non-issue.

Key Exchange

A Diffie-Hellman key exchange is defined here. Two parties (party 1 and party 2) generate private/public key pairs (sk1 and Q1 for party 1, sk2 and Q2 for party 2). Each party receives a copy of the public key of the other party, and combines it with its own private key to produce a secret key (of size 32 bytes). If the public keys were computed correctly and not altered in transit, then both parties compute the same secret key, and that key is not known to outsiders. It can then be used for further cryptographic operations, typically symmetric encryption.

Warning: Two important caveats apply:

  • The key exchange described here does not, by itself, ensure authentication. An active attacker in position to intercept messages exchanged between the two parties may replace public keys on the wire with values generated by the attacker, and establish distinct secret keys, but both known to the attacker, with each party. The attacker may then decrypt and re-encrypt subsequent messages as they flow. A Diffie-Hellman key exchange may provide security against active attackers only if integrated into an outer protocol that ensures proper authentication (e.g. SSL/TLS uses signatures and certificates on top of a Diffie-Hellman key exchange).

  • While private and public keys used here have the same format as private and public keys used for signatures, it is not recommended to use the same private/public key pair for both key exchange and signatures. In general, the two activities call for key pairs with distinct, incompatible lifecycle properties (simply put, keys used for encryption should normally be backed up to prevent data loss in case of key loss, while keys used for signatures must not be backed up, since their value relies in their exclusive control by the purported signer). Moreover, possible interactions between the two schemes are not entirely explored.

The process below describes how a party uses its own key pair (sk and Q) along with the encoded public key received from the peer (peer_encoded_Q) to compute the shared secret. The function returns two values: the computed secret key (presumably shared with the peer), and a Boolean status. The status is set to False if the bytes sent by the peer cannot be decoded as a proper public key. When the peer bytes are invalid, a secret key is still computed, and it is not guessable by outsiders. The point of producing such a fake secret key is to allow the use of the key exchange in some (rather contrived) scenarios in which attackers cannot see, but can alter public keys in transit, and may obtain some extra information from knowledge of whether the altered key was valid or not. Computing a fake but indistinguishable secret key can provide this extra security feature. In most normal uses of Diffie-Hellman, that feature is not needed.

def ECDH(sk, Q, peer_encoded_Q):
    # Decode the peer public key from the received bytes.
    peer_pk_good = True
    peer_Q = public_key_decode(peer_encoded_Q)
    if peer_Q is False:
        peer_Q = G
        peer_pk_good = False

    # Compute the shared point.
    P = point_mul(peer_Q, sk)
    eP = group_encode(P)

    # Get the two encoded public keys, in lexicographic order.
    pk1 = public_key_encode(Q)
    pk2 = peer_encoded_Q
    ipk1 = int.from_bytes(pk1, byteorder='big')
    ipk2 = int.from_bytes(pk2, byteorder='big')
    if ipk1 > ipk2:
        (pk1, pk2) = (pk2, pk1)

    # Compute the shared secret key.
    sh = hashlib.blake2s()
    sh.update(pk1)
    sh.update(pk2)
    if peer_pk_good:
        sh.update(b'\x53')    # One byte of value 0x53
        sh.update(eP)
    else:
        sh.update(b'\x46')    # One byte of value 0x46
        sh.update(private_key_encode(sk))
    return (sh.digest(), peer_pk_good)

The derivation of the shared secret group element P into a secret key appropriate for further symmetric cryptographic operations uses an invocation of the BLAKE2s hash function. The two exchanged public keys are also used as input to the hashing process; since both parties should obtain the same key, the two public keys are inserted in lexicographic order (lowest key first). The pseudocode applies lexicographic ordering by interpreting both sequences of bytes as integers with the big-endian notation, and then comparing them as integers; when using languages without direct access to big integers, a simple byte-by-byte comparison may also be performed.

Hash-to-Group

A hash-to-group process maps an arbitrary input (sequence of bytes) to a group element, such that the output distribution is indistinguishable from uniform random selection. In particular, no information is obtained about the discrete logarithm of the output with regard to any given fixed generator element.

The input data is first turned into a prepared message using the same process as in the signature scheme. As such, the hash-to-group process is thus implicitly computed over the raw data, or a pre-hashed message. It is recommended to use pre-hashing with the BLAKE2s hash function (with a 32-byte output); unless specified explicitly otherwise, the "hash-to-jq255e" and "hash-to-jq255s" functionalities assume that pre-hashing with BLAKE2s is applied. As will be made apparent below, the prepared message M is itself hashed twice, so that using a raw message can entail some large overhead if the input is very large.

def hash_to_group(M):
    sh = hashlib.blake2s()
    sh.update(b'\x01')      # One byte of value 0x01
    sh.update(M)
    bb1 = sh.digest()
    f1 = GFq(int.from_bytes(bb1, byteorder='little'))

    sh = hashlib.blake2s()
    sh.update(b'\x02')      # One byte of value 0x02
    sh.update(M)
    bb2 = sh.digest()
    f2 = GFq(int.from_bytes(bb2, byteorder='little'))

    Q1 = map_to_jq255(f1)   # Using map_to_jq255e() or map_to_jq255s()
    Q2 = map_to_jq255(f2)   # Using map_to_jq255e() or map_to_jq255s()
    return point_add(Q1, Q2)

Note: Here, the two 32-byte values bb1 and bb2, which are obtained as hash outputs, are interpreted as 256-bit integers which are then implicitly reduced modulo q; this is unlike unlike field_decode(), the latter function rejecting out-of-range inputs instead.

Rationale: Contrary to the previously discussed cases of generating random scalars (in private_key_generate() and in generate_nonce()), it is not required, for proper security, that the f1 and f2 field elements be obtained with a distribution indistinguishable from uniform randomness. However, it is still needed that each of f1 and f2 be obtained through reducing an integer obtained from a pseudorandomly generated byte sequence at least as large as the group order r itself. For both jq255e and jq255s, this means obtaining two 32-byte sequences, for a total of 64 bytes. These 64 bytes could be obtained with a single invocation of a hash function with a 64-byte output (such as SHA-512 or BLAKE2b); however, we prefer to use BLAKE2s, which maps better to the abilities of small 32-bit microcontrollers. BLAKE2s has an output size of 32 bytes; this is why there are two invocations of BLAKE2s in the hash_to_group() function specified above.