Class Linear4SystemSolver

java.lang.Object
it.unimi.dsi.sux4j.mph.solve.Linear4SystemSolver

public class Linear4SystemSolver extends Object
A class implementing generation and solution of a random 4-regular linear system on F2 using the techniques described by Marco Genuzio, Giuseppe Ottaviano and Sebastiano Vigna in “Fast Scalable Construction of (Minimal Perfect Hash) Functions”, 15th International Symposium on Experimental Algorithms — SEA 2016, Lecture Notes in Computer Science, Springer, 2016. It can be seen as a generalization of the 3-hypergraph edge sorting procedure that is necessary for the Majewski-Wormald-Havas-Czech technique.

Instances of this class contain the data necessary to generate the random system and solve it. At construction time, you provide just the desired number of equations and variables; then, you call generateAndSolve(Iterable, long, LongBigList) providing a value list; it will generate a random linear system on F2 with four variables per equation; the constant term for the k-th equation will be the k-th element of the provided list. This kind of system is useful for computing a GOV4Function. The number of elements returned by the provide Iterable must be equal to the number of equation passed at construction time.

To guarantee consistent results when reading a GOV4Function the method signatureToEquation(long[], long, int, int[]) can be used to retrieve, starting from the signature generated by a bit vector, the corresponding equation. While having a function returning the edge starting from a key would be more object-oriented and avoid hidden dependencies, it would also require storing the transformation provided at construction time, which would make this class non-thread-safe. Just be careful to transform the keys into bit vectors using the same TransformationStrategy and the same hash function used to generate the random linear system.

Support for preprocessed keys

This class provides two special access points for classes that have pre-digested their keys. The methods generation methods and signatureToEquation(long[], long, int, int[]) use fixed-length 128-bit signatures under the form of pairs of longs. The intended usage is that of turning the keys into such a signature using SpookyHash and then operating directly on the hash codes. This is particularly useful in chunked constructions, where the keys are replaced by their 192-bit hashes in the first place. Note that the hashes are actually rehashed using Hashes.spooky4(long[], long, long[])—this is necessary to vary the linear system whenever it is unsolvable (or the associated hypergraph is not orientable).

Warning: you cannot mix the bitvector-based and the signature-based constructors and static methods. It is your responsibility to pair them correctly.

Implementation details

We use Jenkins's SpookyHash to compute three 64-bit hash values.

The XOR trick

Before proceeding with the actual solution of the linear system, we perform a peeling of the hypergraph associated with the system, which iteratively removes edges that contain a vertex of degree one. Since the list of edges incident to a vertex is accessed during the peeling process only when the vertex has degree one, we can actually store in a single integer the XOR of the indices of all edges incident to the vertex. This approach significantly simplifies the code and reduces memory usage. It is described in detail in “Cache-oblivious peeling of random hypergraphs”, by Djamal Belazzougui, Paolo Boldi, Giuseppe Ottaviano, Rossano Venturini, and Sebastiano Vigna, Proc. Data Compression Conference 2014, 2014.

Rounds and Logging

Building and sorting a large 4-regular linear system is difficult, as solving linear systems is superquadratic. This classes uses techniques introduced in the paper quoted in the introduction (and in particular broadword programming and lazy Gaussian elimination) to speed up the process by orders of magnitudes.

Note that we might generate non-solvable systems, in which case one has to try again with a different seed.

To help diagnosing problem with the generation process, this class will log at debug level what's happening.

Author:
Sebastiano Vigna
  • Field Details

    • solution

      public long[] solution
      The vector of solutions.
    • unsolvable

      public int unsolvable
      The number of generated unsolvable systems.
    • numPeeled

      public long numPeeled
      The number of peeled nodes.
  • Constructor Details

    • Linear4SystemSolver

      public Linear4SystemSolver(int numVariables, int numEquations)
      Creates a linear 4-regular system solver for a given number of variables and equations.
      Parameters:
      numVariables - the number of variables.
      numEquations - the number of equations.
  • Method Details

    • signatureToEquation

      public static void signatureToEquation(long[] signature, long seed, int numVariables, int[] e)
      Turns a signature into an equation.

      If there are no variables the vector e will be filled with -1.

      Parameters:
      signature - a signature (two longs). Note that if a longer vector is provided, only the first two elements will be used.
      seed - the seed for the hash function.
      numVariables - the nonzero number of variables in the system.
      e - an array to store the resulting equation.
    • generateAndSolve

      public boolean generateAndSolve(Iterable<long[]> iterable, long seed, LongBigList valueList)
      Generates a random 4-regular linear system on F2 and tries to solve it.

      The constant part is provided by valueList.

      Parameters:
      iterable - an iterable returning signatures (two longs). Note that if a longer vectors are returned, only the first two elements will be used.
      seed - a 64-bit random seed.
      valueList - a value list containing the constant part.
      Returns:
      true if a solution was found.
      See Also:
    • generateAndSolve

      public boolean generateAndSolve(Iterable<long[]> signatures, long seed, LongBigList valueList, Codec.Coder coder, int m, int w)