PAQ is a series of lossless data compression archivers that have gone through collaborative development to top rankings on several benchmarks measuring compression ratio (although at the expense of speed and memory usage). Specialized versions of PAQ have won the Hutter Prize and the Calgary Challenge. PAQ is free software distributed under the GNU General Public License.
PAQ uses a context mixing algorithm. Context mixing is related to prediction by partial matching (PPM) in that the compressor is divided into a predictor and an arithmetic coder, but differs in that the next-symbol prediction is computed using a weighed combination of probability estimates from a large number of models conditioned on different contexts. Unlike PPM, a context doesn't need to be contiguous. Most PAQ versions collect next-symbol statistics for the following contexts:
All PAQ versions predict and compress one bit at a time, but differ in the details of the models and how the predictions are combined and postprocessed. Once the next-bit probability is determined, it is encoded by arithmetic coding. There are three methods for combining predictions, depending on the version:
PAQ1SSE and later versions postprocess the prediction using secondary symbol estimation (SSE). The combined prediction and a small context are used to look up a new prediction in a table. After the bit is encoded, the table entry is adjusted to reduce the prediction error. SSE stages can be pipelined with different contexts or computed in parallel with the outputs averaged.
A string s is compressed to the shortest byte string representing a base 256 big-endian number x in the range [0, 1] such that P(r < s) ≤ x < P(r ≤ s), where P(r < s) is the probability that a random string r with the same length as s will be lexicographically less than s. It is always possible to find an x such that the length of x is at most one byte longer than the Shannon limit, −log2P(r = s) bits. The length of s is stored in the archive header.