2025-10

Size predictions based on preset, CRF and the size for another CRF and (faster) preset.

Attempts to predict file size for a given preset and CRF based on the file size for another preset (typically a much faster one) and another CRF. Basically, do 1 <= n <= 4 quick or very quick encodes and determine the CRF to use with a much slower encode in order to reach a given size. That n is to be determined more precisely.

The size = f(crf) function can be approximated well enough by an exponential. The function can be approximated better by exp(a * x³ + b * x² + c * x + d).

In 3D, size = f(crf, preset) is at least C_1, meaning it is continuous, there is a derivative everywhere and that derivative is itself continuous (it looks C_2 or more); this indicates the shape isn't random and there is an underlying structure which can probably be taken advantage of.

At a given preset and with varying CRFS, size is very predictable. Here,
exp(x ** 0 * 7.367942 + x ** 1 * -0.083551 + x ** 2 * -0.000043 + x ** 3 * -0.000004)
is a very good approximation, deviating by a few megs at most.

It's useful to scale file sizes based on the one obtained at e.g. preset=2 and crf=40 which are intermediate and sensible values.

Working with a logarithmic scale gives a much finer picture.

I couldn't find an approximation for the 3D curve because I haven't found any way to approximate non-trivial or symmetric 3D curves, at least not with a usable implementation. But maybe scaling everything based on one data point will work well enough.

At a given CRF and with a varying preset, size is rather predictable but there are changes in the trends, around preset 5 for instance which provides a global minimum. I expect the shape of these curves will not change much with different inputs, and if scaled based on the size at e.g. preset 0, changes are smaller. The three-dimension view helps.

For every preset, polynomial extrapolation lets us compute sizes at any CRF in range and plot relative sizes compared to the one at preset 2. This gives a rather smooth curve. Hope is that it is similar for any input.

The plot above can be re-created in gnuplot using the extrapolated data points and the following command:

splot 'predicted-relative-size-vs-crf-and-preset.dat' using 2:1:3 with linespoints palette

#Updates

#Reports

#Notes