public class HypergraphSorter<T> extends Object
Bohdan S. Majewski, Nicholas C. Wormald, George Havas, and Zbigniew J. Czech have described in “A family of perfect hashing methods”, Comput. J., 39(6):547−554, 1996, a 3hypergraph based technique to store functions (actually, the paper uses the technique just to store a permutation of the key set, but it is clear it can be used to store any function). More generally, the procedure first generates a random 3partite 3hypergraph whose edges correspond to elements of the function domain. Then, it sorts the edges of the random 3hypergraph so that for each edge at least one vertex, the hinge, never appeared before in the sorted edge list (this happens with high probability if the number of vertices is at least γ times the number of edges).
Instances of this class contain the data necessary to generate the random hypergraph
and apply the sorting procedure. At construction time, you provide just the desired number
of edges; then, each call to generateAndSort()
will generate a new 3hypergraph using a 64bit seed, an iterator returning the key set,
and a corresponding TransformationStrategy
. If the method returns true, the sorting was
successful and in the public field stack
you can retrieve hinges in the opposite
of the desired order (so enumerating hinges starting from the last in stack
you
are guaranteed to find each time a vertex that never appeared in some 3hyperedge associated with previous hinges). For each hinge, the corresponding values
in vertex1
and vertex2
complete the 3hyperedge associated with the hinge, and the corresponding
value in edge
contains the index of the 3hyperedge (i.e., the index of the key that generated the hyperedge).
The computation of the last value can be disabled if you do not need it.
The public fields numEdges
and numVertices
expose information about the generated
3hypergraph. For m edges, the number of vertices will be ⌈ γm ⌉ + 1, rounded up
to the nearest multiple of 3, unless m is zero, in which case the number of vertices will be zero, too.
Note that index of the hash function that generated a particular vertex of a 3hyperedge can be recovered
dividing by partSize
, which is exactly numVertices
/3.
To guarantee consistent results when reading a MajewskiWormaldHavasCzechlike structure,
the method bitVectorToEdge()
can be used to retrieve, starting from
a bit vector, the corresponding edge. While having a function returning the edge starting
from a key would be more objectoriented and avoid hidden dependencies, it would also require
storing the transformation provided at construction time, which would make this class nonthreadsafe.
Just be careful to transform the keys into bit vectors using
the same TransformationStrategy
used to generate the random 3hypergraph.
This class provides two special access points for classes that have predigested their keys. The methods
generateAndSort(Iterator, long)
and tripleToEdge(long[], long, int, int, int[])
use
fixedlength 192bit keys under the form of triples of longs. The intended usage is that of
turning the keys into such a triple using SpookyHash and
then operating directly on the hash codes. This is particularly useful in chunked constructions, where
the keys are replaced by their 192bit hashes in the first place. Note that the hashes are actually
rehashed using Hashes.spooky4(long[], long, long[])
—this is necessary to vary the associated edges whenever
the generated 3hypergraph is not acyclic.
Warning: you cannot mix the bitvectorbased and the triplebased constructors and static methods. It is your responsibility to pair them correctly.
We use Jenkins's SpookyHash to compute three 64bit hash values.
Since the list of edges incident to a vertex is accessed during the peeling process only when the vertex has degree one, we can actually store in a single integer the XOR of the indices of all edges incident to the vertex. This approach significantly simplifies the code and reduces memory usage. It is described in detail in “Cacheoblivious peeling of random hypergraphs”, by Djamal Belazzougui, Paolo Boldi, Giuseppe Ottaviano, Rossano Venturini, and Sebastiano Vigna, Proc. Data Compression Conference 2014, 2014.
We push further this idea by observing that since one of the vertices of an edge incident to x is exactly x, we can even avoid storing the edges at all and just store for each vertex two additional values that contain a XOR of the other two vertices of each edge incident on the vertex. This approach further simplifies the code as every 3hyperedge is presented to us as a distinguished vertex (the hinge) plus two additional vertices.
Building and sorting a large 3hypergraph may take hours. As it happens with all probabilistic algorithms, one can just give estimates of the expected time.
There are two probabilistic sources of problems: duplicate edges and nonacyclic hypergraphs. However, the probability of duplicate edges is vanishing when n approaches infinity, and once the hypergraph has been generated, the stripping procedure succeeds in an expected number of trials that tends to 1 as n approaches infinity.
To help diagnosing problem with the generation process, this class will log at debug level what's happening.
Modifier and Type  Field and Description 

int[] 
edge
For each vertex, the XOR of the indices of incident 3hyperedges.

static double 
GAMMA
The mythical threshold (or better, a very closed upper bound of): random 3hypergraphs
are acyclic with high probability if the ratio vertices/edges exceeds this constant.

int 
numEdges
The number of edges in the hypergraph.

int 
numVertices

int 
partSize
numVertices / 3. 
int[] 
stack
The hinge stack.

int[] 
vertex1
For each vertex, the XOR of the values of the smallest other vertex in each incident 3hyperedge.

int[] 
vertex2
For each vertex, the XOR of the values of the largest other vertex in each incident 3hyperedge.

Constructor and Description 

HypergraphSorter(int numEdges)
Creates a hypergraph sorter for a given number of edges.

HypergraphSorter(int numEdges,
boolean computeEdges)
Creates a hypergraph sorter for a given number of edges.

Modifier and Type  Method and Description 

static void 
bitVectorToEdge(BitVector bv,
long seed,
int numVertices,
int[] e)
Turns a bit vector into a 3hyperedge.

static void 
bitVectorToEdge(BitVector bv,
long seed,
int numVertices,
int partSize,
int[] e)
Turns a bit vector into a 3hyperedge.

boolean 
generateAndSort(Iterator<? extends T> iterator,
TransformationStrategy<? super T> transform,
long seed)
Generates a random 3hypergraph and tries to sort its edges.

boolean 
generateAndSort(Iterator<long[]> iterator,
long seed)
Generates a random 3hypergraph and tries to sort its edges.

static void 
tripleToEdge(long[] triple,
long seed,
int numVertices,
int[] e)
Turns a triple of longs into a 3hyperedge.

static void 
tripleToEdge(long[] triple,
long seed,
int numVertices,
int partSize,
int[] e)
Turns a triple of longs into a 3hyperedge.

public static final double GAMMA
public final int numVertices
public final int partSize
numVertices
/ 3.public final int numEdges
public final int[] vertex1
public final int[] vertex2
public final int[] edge
public final int[] stack
public HypergraphSorter(int numEdges, boolean computeEdges)
numEdges
 the number of edges of this hypergraph sorter.computeEdges
 if false, the index of the edge associated with each hinge will not be computed.public HypergraphSorter(int numEdges)
numEdges
 the number of edges of this hypergraph sorter.public static void bitVectorToEdge(BitVector bv, long seed, int numVertices, int partSize, int[] e)
The returned edge satisfies the property that the ith vertex is in the interval
[i·partSize
..i+1·partSize
). However, if there are no edges
the vector e
will be filled with 1.
bv
 a bit vector.seed
 the seed for the hash function.numVertices
 the number of vertices in the underlying hypergraph.partSize
 numVertices
/3 (to avoid a division).e
 an array to store the resulting edge.public static void bitVectorToEdge(BitVector bv, long seed, int numVertices, int[] e)
bv
 a bit vector.seed
 the seed for the hash function.numVertices
 the number of vertices in the underlying hypergraph.e
 an array to store the resulting edge.bitVectorToEdge(BitVector, long, int, int, int[])
public static void tripleToEdge(long[] triple, long seed, int numVertices, int partSize, int[] e)
triple
 a triple of intermediate hashes.seed
 the seed for the hash function.numVertices
 the number of vertices in the underlying hypergraph.partSize
 numVertices
/3 (to avoid a division).e
 an array to store the resulting edge.bitVectorToEdge(BitVector, long, int, int, int[])
public static void tripleToEdge(long[] triple, long seed, int numVertices, int[] e)
triple
 a triple of intermediate hashes.seed
 the seed for the hash function.numVertices
 the number of vertices in the underlying hypergraph.e
 an array to store the resulting edge.bitVectorToEdge(BitVector, long, int, int, int[])
public boolean generateAndSort(Iterator<? extends T> iterator, TransformationStrategy<? super T> transform, long seed)
iterator
 an iterator returning numEdges
keys.transform
 a transformation from keys to bit vectors.seed
 a 64bit random seed.