18-adaptive-quotient-filter

I studied adaptive telescoping filter.

*TAF stores the original elements in order to adapt for false positives, making it slower and bloatier (memory-wise) than a hashset.

I feel trolled.

A hash set uses the full hash divided by set size to get the memory location.

A quotient filter uses only a portion of the hash - called the quotient. A tiny portion (e.g. 3 bits) of the hash - called the remainder - is stored in memory. If the remainder size is 3 bits, this basically uses 1/8 of the memory of a hash set.

these bits

is_occupied is set when a slot is the canonical slot for some key stored (somewhere) in the filter (but not necessarily in this slot). is_continuation is set when a slot is occupied but not by the first remainder in a run. is_shifted is set when the remainder in a slot is not in its canonical slot.

are used to limit the range to scan.

the quotient filter can’t adapt because it does not know about false positives.

rank-select quotient filter is a quotient filter optimized for cache locatily. the math stays the same. the “rank-select” comes from the fact that it uses bitset to store occupied/runend bits. the x-th occupied bit corresponds to the x-th runend bit. every block also loops around. the example i used used a block of 64 elements/remainders.

if utaf can do adapt on insert, not on lookup, that might be better. or, if it can turn adapt on and off. the low FP rate of the standard quotient fliter doesn’t seem worth it though.

what i’ve learned

on high collision rate, shake it?