- All Implemented Interfaces:
@Deprecated public class MWHCFunction<T> extends AbstractObject2LongFunction<T> implements Serializable, Size64An immutable function stored quasi-succinctly using the Majewski-Wormald-Havas-Czech 3-hypergraph technique.
Instances of this class store a function from keys to values. Keys are provided by an iterable object (whose iterators must return elements in a consistent order), whereas values are provided by a
LongIterable. If you do nost specify values, each key will be assigned its rank (e.g., its position in iteration order starting from zero).
For convenience, this class provides a main method that reads from standard input a (possibly
gzip'd) sequence of newline-separated strings, and writes a serialised function mapping each element of the list to its position, or to a given list of values.
Optionally, it is possible to sign an
MWHCFunction. Signing is possible if no list of values has been specified (otherwise, there is no way to associate a key with its signature). A w-bit signature will be associated with each key, so that
getLong(Object)will return a default return value (by default, -1) on strings that are not in the original key set. As usual, false positives are possible with probability 2-w.
If you're not interested in the rank of a key, but just to know whether the key was in the original set, you can turn the function into a dictionary. In this case, the value associated by the function with a key is exactly its signature, which means that the only space used by the function is that occupied by signatures: this is one of the fastest and most compact way of storing a static dictionary. In this case, the only returned value is one, and the default return value is set to zero.
Building a function
This class provides a great amount of flexibility when creating a new function; such flexibility is exposed through the builder. To exploit the various possibilities, you must understand some details of the construction.
In a first phase, we build a
ChunkedHashStorecontaining hashes of the keys. By default, the store will associate each hash with the rank of the key. If you specify values, the store will associate with each hash the corresponding value.
However, if you further require an indirect construction the store will associate again each hash with the rank of the corresponding key, and access randomly the values (which must be either a
LongBigList). Indirect construction is useful only in complex, multi-layer hashes (such as an
LcpMonotoneMinimalPerfectHashFunction) in which we want to reuse a checked
ChunkedHashStore. Storing values in the
ChunkedHashStoreis extremely scalable because the values must just be a
LongIterablethat will be scanned sequentially during the store construction. On the other hand, if you have already a store that associates ordinal positions, and you want to build a new function for which a
LongBigListof values needs little space (e.g., because it is described implicitly), you can opt for an indirect construction using the already built store.
Note that if you specify a store it will be used before building a new one (possibly because of a
ChunkedHashStore.DuplicateException), with obvious benefits in terms of performance. If the store is not checked, and a
ChunkedHashStore.DuplicateExceptionis thrown, the constructor will try to rebuild the store, but this requires, of course, that the keys, and possibly the values, are available. Note that it is your responsibility to pass a correct store.
Implementation DetailsAfter generating a random 3-hypergraph, we sort its 3-hyperedges to that a distinguished vertex in each 3-hyperedge, the hinge, never appeared before. We then assign to each vertex a value in such a way that for each 3-hyperedge the XOR of the three values associated to its vertices is the required value for the corresponding element of the function domain (this is the standard Majewski-Wormald-Havas-Czech construction).
Then, we measure whether it is favourable to compact the function, that is, to store nonzero values in a separate array, using a ranked marker array to record the positions of nonzero values.
A non-compacted, r-bit
MWHCFunctionon n keys requires γrn bits, whereas the compacted version takes just (γ + r)n bits (plus the bits that are necessary for the ranking structure; the current implementation uses
Rank16). This class will transparently choose the most space-efficient method.
- Sebastiano Vigna
- See Also:
- Serialized Form
Fields Modifier and Type Field Description
dataDeprecated.The final magick—the list of modulo-3 values that define the output of the minimal hash function.
globalSeedDeprecated.The seed used to generate the initial hash triple.
LOG2_CHUNK_SIZEDeprecated.The logarithm of the desired chunk size.
mDeprecated.The number of vertices of the intermediate hypergraph.
nDeprecated.The number of keys.
offsetDeprecated.The start offset of each block.
rankDeprecated.The ranking structure on
seedDeprecated.The seed of the underlying 3-hypergraphs.
signatureMaskDeprecated.The mask to compare signatures, or zero for no signatures.
protected TransformationStrategy<? super T>
transformDeprecated.The transformation strategy to turn objects of type
Tinto bit vectors.
widthDeprecated.The data width.
Constructors Modifier Constructor Description
MWHCFunction(Iterable<? extends T> keys, TransformationStrategy<? super T> transform, int signatureWidth, LongIterable values, int dataWidth, File tempDir, ChunkedHashStore<T> chunkedHashStore, boolean indirect)Deprecated.Creates a new function for the given keys and values.
Modifier and Type Method Description
getLongByTriple(long triple)Deprecated.Low-level access to the output of this function.
numBits()Deprecated.Returns the number of bits used by this structure.
size64()Deprecated.Returns the number of keys in the function domain.
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
public static final int LOG2_CHUNK_SIZEDeprecated.The logarithm of the desired chunk size.
- See Also:
- Constant Field Values
protected final long nDeprecated.The number of keys.
protected final long mDeprecated.The number of vertices of the intermediate hypergraph.
protected final int widthDeprecated.The data width.
protected final long globalSeedDeprecated.The seed used to generate the initial hash triple.
protected final long seedDeprecated.The seed of the underlying 3-hypergraphs.
protected final long offsetDeprecated.The start offset of each block.
protected final LongBigList dataDeprecated.The final magick—the list of modulo-3 values that define the output of the minimal hash function.
protected final LongArrayBitVector markerDeprecated.
protected final TransformationStrategy<? super T> transformDeprecated.The transformation strategy to turn objects of type
Tinto bit vectors.
protected final long signatureMaskDeprecated.The mask to compare signatures, or zero for no signatures.
protected final LongBigList signaturesDeprecated.The signatures.
protected MWHCFunction(Iterable<? extends T> keys, TransformationStrategy<? super T> transform, int signatureWidth, LongIterable values, int dataWidth, File tempDir, ChunkedHashStore<T> chunkedHashStore, boolean indirect) throws IOExceptionDeprecated.Creates a new function for the given keys and values.
keys- the keys in the domain of the function, or
transform- a transformation strategy for the keys.
signatureWidth- a positive number for a signature width, 0 for no signature, a negative value for a self-signed function; if nonzero,
widthmust be -1.
values- values to be assigned to each element, in the same order of the iterator returned by
null, the assigned value will the ordinal number of each element.
dataWidth- the bit width of the
values, or -1 if
tempDir- a temporary directory for the store files, or
nullfor the standard temporary directory.
chunkedHashStore- a chunked hash store containing the keys associated with their ranks (if there are no values, or
indirectis true) or values, or
null; the store can be unchecked, but in this case
transformmust be non-
indirect- if true,
chunkedHashStorecontains ordinal positions, and
LongIterablethat must be accessed to retrieve the actual values.
public long getLong(Object o)Deprecated.
public long getLongByTriple(long triple)Deprecated.Low-level access to the output of this function.
This method makes it possible to build several kind of functions on the same
ChunkedHashStoreand then retrieve the resulting values by generating a single triple of hashes. The method
TwoStepsMWHCFunction.getLong(Object)is a good example of this technique.
triple- a triple generated as documented in
- the output of the function.
public long size64()Deprecated.Returns the number of keys in the function domain.
@Deprecated public int size()Deprecated.
public long numBits()Deprecated.Returns the number of bits used by this structure.
- the number of bits used by this structure.
public boolean containsKey(Object o)Deprecated.