Class MWHCFunction<T>
 java.lang.Object

 it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction<T>

 it.unimi.dsi.sux4j.mph.MWHCFunction<T>

 All Implemented Interfaces:
Function<T,Long>
,Object2LongFunction<T>
,Size64
,Serializable
,Function<T,Long>
,ToLongFunction<T>
@Deprecated public class MWHCFunction<T> extends AbstractObject2LongFunction<T> implements Serializable, Size64
Deprecated.Please aGOV3Function
or aGOV4Function
.An immutable function stored quasisuccinctly using the MajewskiWormaldHavasCzech 3hypergraph technique.Instances of this class store a function from keys to values. Keys are provided by an iterable object (whose iterators must return elements in a consistent order), whereas values are provided by a
LongIterable
. If you do nost specify values, each key will be assigned its rank (e.g., its position in iteration order starting from zero).For convenience, this class provides a main method that reads from standard input a (possibly
gzip
'd) sequence of newlineseparated strings, and writes a serialised function mapping each element of the list to its position, or to a given list of values.Signing
Optionally, it is possible to sign an
MWHCFunction
. Signing is possible if no list of values has been specified (otherwise, there is no way to associate a key with its signature). A wbit signature will be associated with each key, so thatgetLong(Object)
will return a default return value (by default, 1) on strings that are not in the original key set. As usual, false positives are possible with probability 2^{w}.If you're not interested in the rank of a key, but just to know whether the key was in the original set, you can turn the function into a dictionary. In this case, the value associated by the function with a key is exactly its signature, which means that the only space used by the function is that occupied by signatures: this is one of the fastest and most compact way of storing a static dictionary. In this case, the only returned value is one, and the default return value is set to zero.
Building a function
This class provides a great amount of flexibility when creating a new function; such flexibility is exposed through the builder. To exploit the various possibilities, you must understand some details of the construction.
In a first phase, we build a
ChunkedHashStore
containing hashes of the keys. By default, the store will associate each hash with the rank of the key. If you specify values, the store will associate with each hash the corresponding value.However, if you further require an indirect construction the store will associate again each hash with the rank of the corresponding key, and access randomly the values (which must be either a
LongList
or aLongBigList
). Indirect construction is useful only in complex, multilayer hashes (such as anLcpMonotoneMinimalPerfectHashFunction
) in which we want to reuse a checkedChunkedHashStore
. Storing values in theChunkedHashStore
is extremely scalable because the values must just be aLongIterable
that will be scanned sequentially during the store construction. On the other hand, if you have already a store that associates ordinal positions, and you want to build a new function for which aLongList
orLongBigList
of values needs little space (e.g., because it is described implicitly), you can opt for an indirect construction using the already built store.Note that if you specify a store it will be used before building a new one (possibly because of a
ChunkedHashStore.DuplicateException
), with obvious benefits in terms of performance. If the store is not checked, and aChunkedHashStore.DuplicateException
is thrown, the constructor will try to rebuild the store, but this requires, of course, that the keys, and possibly the values, are available. Note that it is your responsibility to pass a correct store.Implementation Details
After generating a random 3hypergraph, we sort its 3hyperedges to that a distinguished vertex in each 3hyperedge, the hinge, never appeared before. We then assign to each vertex a value in such a way that for each 3hyperedge the XOR of the three values associated to its vertices is the required value for the corresponding element of the function domain (this is the standard MajewskiWormaldHavasCzech construction).Then, we measure whether it is favourable to compact the function, that is, to store nonzero values in a separate array, using a ranked marker array to record the positions of nonzero values.
A noncompacted, rbit
MWHCFunction
on n keys requires γrn bits, whereas the compacted version takes just (γ + r)n bits (plus the bits that are necessary for the ranking structure; the current implementation usesRank16
). This class will transparently choose the most spaceefficient method. Since:
 0.2
 Author:
 Sebastiano Vigna
 See Also:
 Serialized Form


Nested Class Summary
Nested Classes Modifier and Type Class Description static class
MWHCFunction.Builder<T>
Deprecated.A builder class forMWHCFunction
.

Field Summary
Fields Modifier and Type Field Description protected LongBigList
data
Deprecated.The final magick—the list of modulo3 values that define the output of the minimal hash function.protected long
globalSeed
Deprecated.The seed used to generate the initial hash triple.static int
LOG2_CHUNK_SIZE
Deprecated.The logarithm of the desired chunk size.protected long
m
Deprecated.The number of vertices of the intermediate hypergraph.protected LongArrayBitVector
marker
Deprecated.protected long
n
Deprecated.The number of keys.protected long[]
offset
Deprecated.The start offset of each block.protected Rank16
rank
Deprecated.The ranking structure onmarker
.protected long[]
seed
Deprecated.The seed of the underlying 3hypergraphs.protected long
signatureMask
Deprecated.The mask to compare signatures, or zero for no signatures.protected LongBigList
signatures
Deprecated.The signatures.protected TransformationStrategy<? super T>
transform
Deprecated.The transformation strategy to turn objects of typeT
into bit vectors.protected int
width
Deprecated.The data width.
Fields inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction
defRetValue


Constructor Summary
Constructors Modifier Constructor Description protected
MWHCFunction(Iterable<? extends T> keys, TransformationStrategy<? super T> transform, int signatureWidth, LongIterable values, int dataWidth, File tempDir, ChunkedHashStore<T> chunkedHashStore, boolean indirect)
Deprecated.Creates a new function for the given keys and values.

Method Summary
Modifier and Type Method Description boolean
containsKey(Object o)
Deprecated.long
getLong(Object o)
Deprecated.long
getLongByTriple(long[] triple)
Deprecated.Lowlevel access to the output of this function.static void
main(String[] arg)
Deprecated.long
numBits()
Deprecated.Returns the number of bits used by this structure.int
size()
Deprecated.long
size64()
Deprecated.Returns the number of keys in the function domain.
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction
defaultReturnValue, defaultReturnValue

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface it.unimi.dsi.fastutil.objects.Object2LongFunction
applyAsLong, get, put, put, remove, removeLong




Field Detail

LOG2_CHUNK_SIZE
public static final int LOG2_CHUNK_SIZE
Deprecated.The logarithm of the desired chunk size. See Also:
 Constant Field Values

n
protected final long n
Deprecated.The number of keys.

m
protected final long m
Deprecated.The number of vertices of the intermediate hypergraph.

width
protected final int width
Deprecated.The data width.

globalSeed
protected final long globalSeed
Deprecated.The seed used to generate the initial hash triple.

seed
protected final long[] seed
Deprecated.The seed of the underlying 3hypergraphs.

offset
protected final long[] offset
Deprecated.The start offset of each block.

data
protected final LongBigList data
Deprecated.The final magick—the list of modulo3 values that define the output of the minimal hash function.

marker
protected final LongArrayBitVector marker
Deprecated.

transform
protected final TransformationStrategy<? super T> transform
Deprecated.The transformation strategy to turn objects of typeT
into bit vectors.

signatureMask
protected final long signatureMask
Deprecated.The mask to compare signatures, or zero for no signatures.

signatures
protected final LongBigList signatures
Deprecated.The signatures.


Constructor Detail

MWHCFunction
protected MWHCFunction(Iterable<? extends T> keys, TransformationStrategy<? super T> transform, int signatureWidth, LongIterable values, int dataWidth, File tempDir, ChunkedHashStore<T> chunkedHashStore, boolean indirect) throws IOException
Deprecated.Creates a new function for the given keys and values. Parameters:
keys
 the keys in the domain of the function, ornull
.transform
 a transformation strategy for the keys.signatureWidth
 a positive number for a signature width, 0 for no signature, a negative value for a selfsigned function; if nonzero,values
must benull
andwidth
must be 1.values
 values to be assigned to each element, in the same order of the iterator returned bykeys
; ifnull
, the assigned value will the ordinal number of each element.dataWidth
 the bit width of thevalues
, or 1 ifvalues
isnull
.tempDir
 a temporary directory for the store files, ornull
for the standard temporary directory.chunkedHashStore
 a chunked hash store containing the keys associated with their ranks (if there are no values, orindirect
is true) or values, ornull
; the store can be unchecked, but in this casekeys
andtransform
must be nonnull
.indirect
 if true,chunkedHashStore
contains ordinal positions, andvalues
is aLongIterable
that must be accessed to retrieve the actual values. Throws:
IOException


Method Detail

getLong
public long getLong(Object o)
Deprecated. Specified by:
getLong
in interfaceObject2LongFunction<T>

getLongByTriple
public long getLongByTriple(long[] triple)
Deprecated.Lowlevel access to the output of this function.This method makes it possible to build several kind of functions on the same
ChunkedHashStore
and then retrieve the resulting values by generating a single triple of hashes. The methodTwoStepsMWHCFunction.getLong(Object)
is a good example of this technique. Parameters:
triple
 a triple generated as documented inChunkedHashStore
. Returns:
 the output of the function.

size64
public long size64()
Deprecated.Returns the number of keys in the function domain.

size
@Deprecated public int size()
Deprecated.

numBits
public long numBits()
Deprecated.Returns the number of bits used by this structure. Returns:
 the number of bits used by this structure.

containsKey
public boolean containsKey(Object o)
Deprecated. Specified by:
containsKey
in interfaceFunction<T,Long>

main
public static void main(String[] arg) throws NoSuchMethodException, IOException, JSAPException
Deprecated.

